rlg programs measuring uniqueness in system-wide book holdings: implications for collection...
TRANSCRIPT
RLG Programs
Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management
Constance MalpasProgram OfficerRLG Programs
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
2
This presentation
Summarizes recent data-mining efforts by OCLC Programs and Research System-wide sample (Summer 2007 – Spring 2008) ARL unique print books (Autumn 2007)
Suggests implications for collection managers Outlines next steps for RLG Programs An opportunity to discuss what additional
evidence and analysis is needed
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
3
What we mean by ‘last copy’
Monographic title uniquely-held by a single WorldCat contributor Cf. ‘single copy’ repositories, where ‘last copy’ is relative
to local/group holdings May represent a last manifestation, expression or
work Bibliographic records describe manifestations, not
copies; unique manifestations are the point of departure for analysis
Some are intrinsically unique; others are rendered unique by erosion of system-wide holdings Historical data may help document increased copy or
work-level availability, but weren’t included in the studies presented here
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
4
Distribution of uniquely-held print booksin ARL member institutions
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
LC Yale
Alberta
Colum
bia
U Chic
ago
UCLA
McG
ill
Penn
Uva
Hawaii
U Md
San D
iego
SUNY Buf
falo
Rutge
rs
Dartm
outh
Notre
Dam
e
Orego
n
GA Tec
h
Delawar
e
Flor
ida S
tate
So Illi
nois
Alabam
aIrv
ine
GWU
Way
ne S
tate
York
Virgin
ia Tec
h
WA S
tate
Case
Wes
tern
Man
itoba
Howar
d
ARL member institution
Un
iqu
e ti
tles
Distribution of wealth: ARL unique books
A classic Pareto distribution
20% of the population holds >75% of unique titles
Median institutional holdings = 19K titles
institutional excellence?
(or) a “network effect?”
N = 6.95 M titles
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
5
Why focus on uniquely-held titles?
“Scarcity is common” limited redundancy in holdings = limited preservation
guarantee, limited opportunity to create economies of scale by aggregating supply
Research institutions bear the brunt of responsibility for long-term preservation and access of unique titles Academic and independent research libraries hold up to 70%
of aggregate unique print book collection Continuing costs of managing (storing, providing access to)
print collections are high; use is generally declining Space pressure on physical plant (on-campus, remote) is high;
understanding distribution and characteristics of unique holdings can inform decisions about disposition of physical collection
Increased attention to stewardship of special collections ARL SCWG, CLIR, LC Task Force on Bibliographic Control – new
attention to what constitutes ‘special’ collections, appropriate standards of care, modes and metrics of use
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
6
Challenges
Identification requires group / network view of holdings WorldCat provides a reasonably proxy for system-wide collection
Some materials (MSS, theses and dissertations, etc.) are intrinsically unique; not all can be algorithmically identified in MARC records
hybrid approach combines computational and manual analysis of bibliographic data
Sparse bibliographic records impede efficient work/title matching, may introduce spurious measure of uniqueness
external sources (including Google) sometimes helpful in filling gaps
Non-English titles (especially transliterated non-roman scripts) are especially difficult to match
we resisted the temptation to exclude these
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
7
Study I: System-wide Sampling
250 randomly selected, uniquely-held titles Limited to printed books (including theses) published
before 2005 English-language cataloging only Iterative re-sampling required to fill gaps
Independently reviewed by three project staff Level of uniqueness Material type
Results periodically collated for group analysis Compare results of individual analysis for consistency Seek consensus on difficult cases – relatively few of
these Re-sample as necessary to fill gaps
White paper anticipated March 2008
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
8
Study II: ARL uniquely-held books
Ad hoc analysis by RLG Programs, prompted by IMLS Connecting to Collections grant announcement How might the existing evidence base be used to focus
regional preservation investments? Based on January 2007 snapshot of WorldCat database:
13M records for titles (6.95M print books) uniquely held by ARL institutions; 300+ OCLC symbols; 123 institutions Iterative analysis examined relative impact of
theses/dissertations and recent imprints on system-wide uniqueness; regional and institutional distribution of holdings
Findings shared with ARL Special Collections Working Group (October 2007) and selected RLG partner institutions (UC; CIC; ReCAP; Harvard; ASU; NYU) Heritage Preservation willing to share Heritage Health survey
data for cross-tabulation on as-needed basis
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
9
Limitations
Current studies limited to printed books – excludes serials, special collections; only a partial measure of uniqueness in system-wide collection
Incomplete representation of world book collection; for non-English titles especially, uniqueness of North American holdings is only relative
Cataloging backlogs of up to 5 years mean that holdings for recent acquisitions are imperfectly reflected
Incomplete coverage of rare books and special collections prior to (ongoing) integration of RLG Union Catalog
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
10
Our findings – distribution of unique titles
Research and academic libraries hold >70% of aggregate unique print book collection while value and utility of these holdings may be widely
distributed across the library community, holdings are concentrated at institutions with a research / teaching / learning mandate
limited data on aggregate use, sources of demand Institutional distribution of unique holdings is
highly skewed, with a handful of libraries holding a majority share of collective assets ARL unique print book holdings range from 400 – 600K
titles per institution; median holdings = 19K titles generally, institutions with large collections hold more
unique materials – but absolute size of collection is not an indicator of relative uniqueness
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
11
Based on a randomly selected sample of 250 uniquely-held print book titles in WorldCat (Jan. 2007)
Unique titles by library type
50%
27%
6%
6%
4%4% 2% 1%
ARL
Academic (non-ARL)
Gov't
State and National
Special
Public
Unknown
Networks
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
12
National libraries and institutions with deep collections and an aggressive approach to collecting and cataloging new monographs – LC, Harvard, Libraries & Archives Canada – have an exceptional range of unique holdings
Unique Print Books in ARL Institutions
CRL’s focus on theses and dissertations is evident – most uniqueness is attributable to these holdings
Institutions with younger collections, actively seeking to increase scope of coverage - NCSU, Temple – are building uniqueness in new titles
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
13
Content-type Distributions: CRL and ARL
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Center forResearch Libraries
ARL aggregatecollection
Unique theses
Unique print books pub'd2000 and after
Unique print books pub'dbefore 2000
Intrinsically unique content, “only copies”
May include “first copies” in cataloging queue; uniqueness subject to rapid erosion
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
14
Our findings – levels of uniqueness
~60% of titles represent unique works Ex: Report and recommendation … on a proposed loan … equivalent
to US$70 million to the … Islamic Republic of Pakistan for a power plant efficiency improvement project (1987) – World Bank report held by George Washington University
~15% of titles represent unique manifestations Ex. Gallipolis … an account of the French five hundred and of the town
they established … compiled by Workers of the Writers' program of the Work projects administration (1940) – microform pamphlet held by Yale University; related manifestations at 40 libraries
~5% of titles represent unique expressions Ex: E.J. Luck. A pedigree of the families Luck, Lock and Lee (1908) –
book held by Masssanutten Regional Library, VA; similar title (Luck, Lock) by same author, pub’d in 1900, held at LC
~20% of titles not unambiguously unique: duplicate or near-duplicate records can be found in WorldCat Ex: K. Kimura. Edo no akebono (1956) – book held by Harvard
Yenching; apparent duplicate (cataloged with original scripts) held by Waseda, Yale
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
15
Our findings – content characterization
Material types ~35% are books (>50pp)
most appear to be non-fiction titles, less likely to have additional manifestations
~20% theses and dissertations many at Master’s level – unlikely to be held beyond issuing
institution ~15% government documents
mostly federal and state, may be duplicated in depositories ~10% pamphlets
unique content, but rarely useful in isolation ~10% analytics; single articles or issues bound as a
separate volume non-unique content
<5% early imprints lost treasures?
Small numbers of by-laws, scripts, legal briefs, minutes, etc.
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
16
Implications
Institutions with significant unique holdings may benefit from ‘splitting the difference’ between unique works and manifestations
unique manifestations and analytics should be judged with an eye to provenance history; unless they contribute to local distinctiveness, immediate action may not be warranted
A preliminary sort by material type may help guide local decision-making regarding the physical disposition of unique holdings
pamphlets and technical reports may be candidates for cataloging enhancement and storage transfer; books may be short-listed for digitization and/or transfer to special collections
Institutions with smaller unique print book collections may benefit from collective action to aggregate supply (through effective disclosure) and demand (through special resource-sharing and digitization initiatives) around specific topical and disciplinary interests
local collections gain in significance when presented in context with related holdings
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
17
Recommendations
Adopt a nuanced understanding of ‘relative uniqueness’ when assessing local holdings
Unique manifestations may not represent unique intellectual content, but may have other value As artifacts special collections As a networked resource increased availability
Unique works may gain relevance and value when presented as part of a larger disciplinary or topical collection Theses and dissertations may benefit from special discovery
tools, integration in local scholarly communications initiatives Pamphlets and technical reports may be virtually aggregated
for specific communities of use Maximize disclosure of unique holdings to increase their
impact and value Focus on use and utility of unique holdings to ensure
long-term preservation, enduring value to parent institution
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
18
What’s Next . . .
Holdings validation study will examine a sample of scarcely-held (<5 copies) US imprints in North-American research libraries Compare current WorldCat holdings to historical
holdings – looking for signs of collection erosion; elimination of local backlogs (diminishing uniqueness)
Compare local holdings to current WorldCat holdings – location changes/storage transfers, withdrawals
Assess impact of local preservation actions on system-wide holdings (availability, condition) and potential value of ‘full disclosure’
Collaborative effort with RLG partner institutions anticipated Spring/Summer 2008
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
19
Some closing observations
Opportunities Large research libraries hold a wealth of unique materials –
long tail resources with broad potential audience Aggregated bibliographic data supports programmatic
analysis and enrichment – work-level clustering, identification of duplicates
Largest institutions, with enduring commitments to retention and access, hold majority of potential ‘at risk’ titles
Challenges Libraries ill-equipped to measure potential demand for
unique holdings Technical and social infrastructure for aggregating supply is
lacking University presses are potential distribution partners, but
alliances are weak
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
20
Questions, Comments?
‘Managing the Collective Collection’ work agenda Data-mining for management intelligence Shared print collections
http://www.oclc.org/programs/ourwork/collectivecoll
Midwinter RLG Update Session1:30-3:30 Marriott 302-304
Contact: Constance MalpasProgram [email protected]
RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
21
N=5.9M titles
Median institutional holdings =96k unique titles