oclc programs & research prospecting in the library data mines brian lavoie consulting research...
TRANSCRIPT
OCLC Programs & Research
Prospecting in thelibrary data mines
Brian LavoieConsulting Research ScientistOCLC Programs & Research
Annual Partners MeetingWashington, DCJune 4, 2007
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
2
Making data work harder
Data is an asset Informs planning and decision-making Drives new forms of services
Libraries have many data assets Bibliographic, holdings, usage, reference inquiries, …
Opportunities to collect data increase in network spaces … Web site traffic, click-through patterns, e-usage, …
Make data work harder Use library data in innovative ways to create value
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
3
Data mining & OCLC Research
Networks of collaboration and coordination Decisions taken in “system-wide context” Focus on resources of “system” Mass digitization, cooperative print storage, shared
discovery environments, …
As library networks develop and expand, opportunities arise to create value through: Collective action Aligning local collections with system-wide environment Data is context
Research area focused on data mining activities Aggregate collections “System-wide collection” (as represented in WorldCat)
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
4
Managing the collective collection
Mass digitization
“Last copies”
Long tail
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
5
Mass digitization
Google Book Search(aka Google Print for Libraries)
Aggregate collection ofdigitized print books(combined holdings ofHarvard, Michigan, Oxford,NYPL, and Stanford)
Data-mining to provide empirical contextto inform community-wide dialog
http://www.dlib.org/dlib/september05/lavoie/09lavoie.html
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
6
“Rareness is common”
System-wide print book collection:~32 million print books
37%Held by 1
5%Held by > 100
3%Held by 51 - 100
5%Held by 26 - 50
20%Held by 6 - 25
30%Held by 2 - 5
Data-mining to better understand nature of the “collective collection”
Identify rare &unique materialsin system-widecollection(“last copies”)
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
7
The Library Long Tail(using holdings as measure of popularity)
Nu
mb
er
of
Ho
ldin
gs
Items ranked by system-wide popularity
HEAD: Top 10% of WorldCat records (ranked by holdings)account for 80% of total WorldCat holdings
LONG TAIL: Bottom 90% of WorldCat records(ranked by holdings) account for 20% of totalWorldCat holdings
HEAD: Small proportion of items account for lion’s share of collecting activity
LONG TAIL: Everything else spread out across Long Tail of diffuse collecting activity
Data-mining to inform strategies/policies aimed at optimizingsystem-wide supply & demand for library materials
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
8
Others …
Registry of Copyright Evidence
New York Art Museum study
OCLC Programs & Research Prospecting the library data mines
Annual Partners Meeting
9
Shared print storage Use library data to inform decision-making:
Data about library assets (bibliographic) Data about choices involving these assets (holdings, circ., ILL) System-wide aggregation (larger aggregation = richer context)
Shared print storage decision-making: Data about assets (local inventories of print materials) Data about system-wide availability (holdings) Data about usage (local & system-wide)
Role of Research: Data collection Data-mining analysis in support of project needs
Inform community dialog on shared print storage issues Analyze “collective collection” in shared print context Support development of effective print storage strategies
Standardize analysis to maximize applicability/re-use