The dirty secret of library metadata is that it is…..DIRTY and as the old saying goes “garbage in, garbage out”. This post focuses on what to do with duplicate records in the metadata cloud, but before we dive in, please note the great little PowerPoint presentation on SlideShare by Heather Gilbert, Digital Scholarship Librarian, College of Charleston, titled “I Bet You Clean Up Real Nice: Makeover Your Metadata for Maximum Interoperability“. If you don’t have time to check it out at least consider the two following slides from Heather’s presentation. The focus on clean is not cataloguing for cataloguing sake, rather it is all about making it quicker and easier for the people who use libraries to find what they are looking for.
Now, onto the matter of duplicate records. My library recently purchased a copy of The Yearbook Committee by Sarah Ayoub. In adding our holdings to WorldCat, and therefore our collection, we noticed there was two records. This book had just been published and already there were duplicate records in the library metadata cloud for the same manifestation. In fairness, the first record (OCLC#: 952957149 created on 4 Jul 2016) was created because at the time there was no catalogue record in either WorldCat or Libraries Australia. OCLC#: 929440271 did not come into WorldCat from Libraries Australia until 18 Aug 2016. See the following screen shot for details.
While the Libraries Australia record, OCLC#: 929440271, is a better record:
- it provides much more in the way of discovery metadata and publication information,
- it includes cover art in worldCat, and
- there are many more library holdings attached to this record,
the first record is a good attempt by the Australian school library to get a catalogue record up into the cloud so they could attach their holdings and lend the item. This raises some interesting questions:
- When adding a bibliographic record from Libraries Australia into WorldCat does Libraries Australia first check to see there is not an existing record?
- If there is an existing record, especially a record created for a new publication, does Libraries Australia have a workflow for deduping WorldCat?
- Now more and more Australian libraries are cataloging directly into worldCat because WorldShare is their library management system, who is responsible to ensure there is unnecessary duplication; Libraries Australia, OCLC, the library that created the original catalogue record, or the broader library community?
Being a good library citizen I used the WorldCat functionality to notify that OCLC#: 952957149 and OCLC#: 929440271 were duplicate records and that the record to keep was the Libraries Australia record OCLC#: 929440271. See the following screen shot for details. It will be interesting to see what happens next.
On the flip side, the following record was first catalogued in WorldCat by an Australian school because at the time there was no record in WorldCat. HOWEVER, a close look shows that there was a record in Libraries Australia, but why would a library look at Libraries Australia when cataloguing off WorldCat and assuming that Libraries Australia and WorldCat are being kept in sync?
As of 23 August there are now three records in Libraries Australia for the same manifestation and two records for the same manifestation in WorldCat. The three Libraries Australia records hang off OCLC#: 949097546. I have used the record report function in WorldCat to de-duplicate the WorldCat records but who is going to de-duplicate the Libraries Australia records, and how come three records for the same manifestation were added into Libraries Australia over one week, 11 through to 16 August? Am I missing something here, where is the data quality control?
- OCLC# 956519728 created on 20160822
- OCLC# 949097546 created on 20160817
The following Libraries Australia records are cross referenced to OCLC# 949097546 created on 20160817:
- 000058458547 created on 20160811
- 000057856988 created on 20160816
- 000058458495 created on 20160816
See the following two images for details.