oclc programs & research prospecting in the library data mines brian lavoie consulting research...

9
OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting Washington, DC June 4, 2007

Upload: jerome-ray

Post on 03-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research

Prospecting in thelibrary data mines

Brian LavoieConsulting Research ScientistOCLC Programs & Research

Annual Partners MeetingWashington, DCJune 4, 2007

Page 2: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

2

Making data work harder

Data is an asset Informs planning and decision-making Drives new forms of services

Libraries have many data assets Bibliographic, holdings, usage, reference inquiries, …

Opportunities to collect data increase in network spaces … Web site traffic, click-through patterns, e-usage, …

Make data work harder Use library data in innovative ways to create value

Page 3: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

3

Data mining & OCLC Research

Networks of collaboration and coordination Decisions taken in “system-wide context” Focus on resources of “system” Mass digitization, cooperative print storage, shared

discovery environments, …

As library networks develop and expand, opportunities arise to create value through: Collective action Aligning local collections with system-wide environment Data is context

Research area focused on data mining activities Aggregate collections “System-wide collection” (as represented in WorldCat)

Page 4: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

4

Managing the collective collection

Mass digitization

“Last copies”

Long tail

Page 5: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

5

Mass digitization

Google Book Search(aka Google Print for Libraries)

Aggregate collection ofdigitized print books(combined holdings ofHarvard, Michigan, Oxford,NYPL, and Stanford)

Data-mining to provide empirical contextto inform community-wide dialog

http://www.dlib.org/dlib/september05/lavoie/09lavoie.html

Page 6: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

6

“Rareness is common”

System-wide print book collection:~32 million print books

37%Held by 1

5%Held by > 100

3%Held by 51 - 100

5%Held by 26 - 50

20%Held by 6 - 25

30%Held by 2 - 5

Data-mining to better understand nature of the “collective collection”

Identify rare &unique materialsin system-widecollection(“last copies”)

Page 7: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

7

The Library Long Tail(using holdings as measure of popularity)

Nu

mb

er

of

Ho

ldin

gs

Items ranked by system-wide popularity

HEAD: Top 10% of WorldCat records (ranked by holdings)account for 80% of total WorldCat holdings

LONG TAIL: Bottom 90% of WorldCat records(ranked by holdings) account for 20% of totalWorldCat holdings

HEAD: Small proportion of items account for lion’s share of collecting activity

LONG TAIL: Everything else spread out across Long Tail of diffuse collecting activity

Data-mining to inform strategies/policies aimed at optimizingsystem-wide supply & demand for library materials

Page 8: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

8

Others …

Registry of Copyright Evidence

New York Art Museum study

Page 9: OCLC Programs & Research Prospecting in the library data mines Brian Lavoie Consulting Research Scientist OCLC Programs & Research Annual Partners Meeting

OCLC Programs & Research Prospecting the library data mines

Annual Partners Meeting

9

Shared print storage Use library data to inform decision-making:

Data about library assets (bibliographic) Data about choices involving these assets (holdings, circ., ILL) System-wide aggregation (larger aggregation = richer context)

Shared print storage decision-making: Data about assets (local inventories of print materials) Data about system-wide availability (holdings) Data about usage (local & system-wide)

Role of Research: Data collection Data-mining analysis in support of project needs

Inform community dialog on shared print storage issues Analyze “collective collection” in shared print context Support development of effective print storage strategies

Standardize analysis to maximize applicability/re-use