smarter data for smarter libraries
TRANSCRIPT
Smarter Data for Smarter
LibrariesRACHEL FRICK, OCLC MEMBERSHIP & RESEARCH
JEFF MIXTER, OCLC MEMBERSHIP & RESEARCH
Collections as Data
• Recognizing collections data as
research asset
– About the collections
– Digital humanities
– Changing social norms
• Power in the aggregate
• Ben Schmidt, A Brief Visual History of MARChttp://sappingattention.blogspot.com/2017/05/a-brief-visual-history-of-marc.html
Collections as Data:
Library of Congress
• National Digital Initiatives
• Experiments
• Tutorials and Data Sets
• https://labs.loc.gov/
Collections as Data:
Always Already Computational• IMLS supported effort
• "foster a strategic approach to developing, describing, providing
access to, and encouraging reuse of collections that support
computationally-driven research and teaching"
• Team: T.Padilla (UNLV); L.Allen (UPenn); S.Varner (UNC-CH):
S.Potvin (Texas A&M); E. Russey Roke (Emory); H. Frost
(Stanford)
• Data Facets: https://collectionsasdata.github.io/facets/
Collections as Data: OCLC Researchd
Not Scotch But Rum: The
Scope and Diffusion of
the Scottish Presence in
the Published Record
by Brian Lavoie
What is the most popular
Irish Book?
by Lorcan Dempsey
Data & The Challenge of Discovery
• Leveraging the data we have
• Providing interfaces that support serendipity
Good Form and Spectacle
• DateRanger : tool to normalize data range data– https://goodformandspectacle.wordpress.com/2017/08/14/dateranger-a-new-tool-to-share/
• Moma Exhibit Spelunker
– http://spelunker.moma.org
Measuring IMPACT
• Europeana Impact Playbook
https://pro.europeana.eu/what-we-do/impact
• Framework for Measuring Reuse of Digital Objects
– IMLS funded
– Digital Library Federation open working group
– https://reuse.diglib.org/
Analyzing Institutional
Repository DataProviding intelligence on how library
materials are being used
Importance of Analytics
• Analytics can measure and highlight
impact and importance
– Visitors
– Citations
– Downloads
– Users
Evaluating Institutional Repository Analytics
• OCLC Research partnered with
Montana State University, University
of New Mexico and ACRL in an IMLS
funded grant project to evaluate IR
analytics
– “Measuring Up: Assessing Accuracy
of Reported Use and Impact of Digital
Repositories”
– http://scholarworks.montana.edu/xmlui/handle/1/8924
Initial Findings
• Institutional Repository usage analytics are often way off
– Either over-counted or under-counted
• It is very difficult to determine accurate Institutional
Repository usage
Page Type Definition Examples
Citable Content
Downloads
Non-HTML scholarly
content that may be
formally cited in the
research process
● Publication (.pdf)
● Presentation (.ppt)
● Data Sets (.csv)
Item SummaryHTML pages to help user
decide to download the full
publication
● Title & Abstract
● Item Metadata
AncillaryHTML pages that provide
general information or
navigation
● Search Results
● Browse by Author
● Statistics
Current Analytics methods
• Two classes of analytics
– Page Tagging
– Log File analysis
• Page Tagging misses clicks that do not originate from the
hosted website (i.e. direct links to material)
– Google Scholar, Twitter, Email, Facebook, etc.
• Log File data is polluted by robot traffic
– IRUS-UK found that 85% of repository traffic is from robots
Testing a Different Method
• We determined that Google Search Console API can be
used to accurately identify human traffic to IR material
RAMP – Repository Analytics and Metrics Portal
• The initial findings from the project led to the development
of RAMP
– Cloud-based Web service
– No installation
– Minimal training and configuration
– Consistent method and terminology
– Benchmarking across time and organization
Citable
Content
Click
ThroughURL Country Device Position Date
Impress
ionsClicks
No 0 http://scholarworks.montana.edu/xmlui/handle/1/9348 hrv DESKTOP 31 3/8/17 1 0
Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/8705/White
nS0814.pdf;sequence=1pan MOBILE 6 3/8/17 1 1
Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/3670/3176
2001131281.pdf;sequence=1fra DESKTOP 24 3/8/17 1 0
Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/7215/3176
2101989810.pdf?sequence=1chn DESKTOP 13 3/8/17 2 0
Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/11518/15-
002_Surface-attached_cells_biofilms_A1b.pdf?sequence=1gbr DESKTOP 10 3/8/17 1 1
Yes 0http://scholarworks.montana.edu/xmlui/bitstream/1/1091/1/ColemanT
1212.pdfkwt MOBILE 3 3/8/17 1 1
No 0 http://scholarworks.montana.edu/xmlui/handle/1/9049 gbr DESKTOP 9 3/8/17 1 0
No 0 http://scholarworks.montana.edu/xmlui/handle/1/2567 egy DESKTOP 44 3/8/17 1 0
Yes 0http://scholarworks.montana.edu/xmlui/bitstream/handle/1/7546/3176
2102468723.pdf;sequence=1twn DESKTOP 14 3/8/17 1 1
No 0 http://scholarworks.montana.edu/xmlui/handle/1/1854 tur DESKTOP 128 3/8/17 1 0
No 0 http://scholarworks.montana.edu/xmlui/handle/1/11498 usa DESKTOP 7 3/8/17 2 0
Daily Statistics
RAMP user activity
• 20 Institutional Repositories using
RAMP
• Current support for 5 IR Application
Stacks
• Tracking over 250,000 digital Items
• Capturing 19,000 CCD per day that
were previously invisible
IIIF International Image Interoperability Framework™
• The IIIF is an emerging standard for sharing
image data on the Web
• The IIIF standard normalizes technical and
structural data to help improve
interoperability across systems
http://iiif.io/
IIIF Application Programming Interfaces (APIs)
• The IIIF standard has 4 primary APIs:
– Image
– Presentation
– Search
– Authentication
IIIF Application Programming Interfaces (APIs)
• OCLC is actively supporting two IIIF APIs:
– Image
– Presentation
– Search
– Authentication
IIIF Image API
• A standard way to provide "technical" metadata about
images
• A IIIF Image API compliant image server is used to
transfer image files
• Compliant viewer application can understand and process
the Image API data
IIIF Presentation API
• Provides structural data about images
• Managing annotations
• There is no need for systems to understand various
metadata schemas
OCLC Research and IIIF
• OCLC Research started to experiment with IIIF in 2016
– Evaluating the standard
– Following development of the APIs
– Experimenting with producing IIIF data using CONTENTdm items
– Involvement in continued development of the APIs and IIIF
standards
Initial experiments
• Set up an Image server that supports IIIF
• Created sample Image API data for CONTENTdm items
• Tested the Image API data and Image Server using an
open source IIIF Image viewer
• This proof of concept led to the implementation of IIIF
Image API support in CONTENTdm
Continued experimentation
• The Presentation API requires more complex
understanding and processing of existing image
data/metadata
• Produced a proof of concept that CONTENTdm data could
be transformed into IIIF Presentation data and used in IIIF
compliant systems
• Developed code to bulk convert CONTENTdm data into
IIIF Presentation data
IIIF Support in CONTENTdm Today• CONTENTdm has implemented the Image and
Presentation APIs
– CONTENTdm serves as one of the largest IIIF Image servers in
the IIIF Community: ~20 Million Images
– The Presentation API is currently being implemented for all
CONTENTdm users: ~4 Million Presentation Manifests in
production
• This work has been a close collaboration between OCLC
GPM, GTECH and Research
OCLC Support of IIIF
• OCLC is a member of the IIIF Consortium and contributes
to the continued development and promotion of the IIIF
Community and the IIIF APIs