data mining the largest library database in the world
DESCRIPTION
Presented at the OCLC EMEA Regional Council Meeting, 26 February 2013, Strasbourg, FranceTRANSCRIPT
Data Mining the Largest Library Database in the World
Roy TennantOCLC Research
Leveraging WorldCat
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Worldcat.org/identities/
Algorithmically constructed from WorldCat records
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Viaf.org
A Union database of authority records
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
The Responsible Party
Thom HickeyChief Scientist
OCLC Research
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
290+ million records
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Language Coverage
Percentage of records for non-English materials
30 June 2012
60.2%
274 million
36.5 million
25.5 million11.3
million4.7 million4.3 million3.6 million3.5 million
Total
GermanFrenchSpanishItalianDutch Russian Latin
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Worldcat.org/identities/
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
(J.K. Rowling)
(Diana Gabaldon)
(Galileo)
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Viaf.org
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
VIAF Participants
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
“Super” Authority File
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Our Cataloging Future
“Moving from cataloging to catalinking”
Eric Miller, Zepheira
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Some Lessons• Widespread collaboration is essential• Normalizing the data is essential• Normalizing the data is complicated• Everything is interrelated:
– You can’t bring names together if titles don’t match
– You can’t bring titles together if names don’t match
• Batch mode processing still rules (but we’re getting better and faster at it)
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Conclusions
• Data mining isn’t just useful, it’s essential• Extracting data from MARC that is useful in
other contexts is possible, but will require sophisticated processing
• Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work
• Thankfully, we are doing it, but there is much more to be done
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Roy Tennant
@rtennant
roytennant.com