collection assessment in a collaborative environment: biodiversity heritage library
TRANSCRIPT
Collection Assessment in a Collaborative Environment: BHL
Connie Rinaldo, Bianca Crowley, Trish Rose Sandler & William Ulate
The BHL is…
• A consortium of 15 natural history, botanical libraries and research institutions
• An open access, full-text digital library for legacy biodiversity literature.
• An open data repository of taxonomic names and bibliographic information
• An expanding global effort• Mission: The Biodiversity Heritage Library improves &
makes more efficient the methodology of research in biodiversity studies by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.
BHL Goals• Goal 1: Relevant Content: Build & maintain the BHL as the largest reliable,
reputable, & responsive repository of biodiversity literature & archival materials.
• Goal 2: Tools & Services: Develop services & tools which facilitate discovery & improve research efficiency of BHL content.
• Goal 3: User Engagement: Increase global awareness about the BHL through outreach, learning & education, & branding through engagement & collaboration with existing & new user communities.
• Goal 4: Membership & Partnerships: Grow BHL consortia membership & partnerships while fostering cross-institutional collaboration that continues to serve as a model for digital library development
• Goal 5: Financial Sustainability: Ensure sustainability & relevance by being flexible, adaptable, & financially sound while the content & services remain openly & freely available.
Core BHL Member Institutions
Global Partners
http://biodiversitylibrary.org
Now online64,188 titles
120,461 volumes
42 million+ pages
• New user interface launched in March• Search by title, author, article, subjects and scientific
names• Various download options, including high resolution• Taxonomic name finding algorithm• Machine-to-machine services• Full-text search being tested
BHL Overview
• Open access• Open data• Deconstruct the silo and deliver content where users are
already working– Via other biodiversity websites and taxonomic
resources– Via social media platforms: blog, flickr, Facebook,
Twitter, Pinterest, &etc.• Involve users in collection and technical development
activities
Core Principles
Scanning Locally, Coordinating Globally
Vols. 7, 9, 11-21
Issue Tracking Software
Vols. 6, 8, 10 Vols.
1-5
Beyond the Silo: Open Data
Open Data Policy
APIsApplication
Programming Interfaces
Stable URLs
OAI-PMHOpen Archive
Initiative – Protocol for Metadata Harvesting
Data Exports
User Feedback is CriticalGeneral feedback form Scan request form
http://biodiversitylibrary.org/contact
Impact• “BHL came to the rescue when a planned trip to work in the Mertz Library at The New
York Botanical Garden had to be cancelled due to Hurricane Sandy. Thanks to the online resources available through BHL I was able to source most of the key works I needed, with their supporting bibliographic information. Further use of BHL occurred when building work at the Linnean Society of London limited access to some of the book I had been able to use from that collection."
• “I would like thank you all very much for invaluable work and support you do. I just got a pdf-file from more than century old (1893) journal paper (regional naturalist society paper, published in Finland), to get copy I should take 500 mile drive to our university library. Now I am got it fastly in high-quality pdf-copy. Cordial thanks and all success in continuing your highly valuable mission.” [conservation biologist from Estonia]
• “You are a wonderful resource. I maintain a Website that describes the plant genus Opuntia (prickly pear cacti). There is no way I could maintain such a site without access to literature from 100-200 years ago. Most of the cactus species were discovered long ago; I find it invaluable to put up PDF files to document each species in the literature as I document them photographically. I am a botanist, but I work in the pharmaceutical field (not so many botanical jobs out there). Your library makes it possible for me to continue working with plants in a meaningful and scientific manner.”
Stewardship: Our Role in the Biodiversity Research “Ecology”
NameIndexes
Nomen-clators
Species Checklists
Phylogenies
Collecting Events
Localities
Researchers
Field Notes
Content Aggregators
EOL
Datasets
Biodiversity Literature
Publications
Scientific Names
BHL
Specimens
Questions about BHL Content
• How many books in BHL are there about....?• How can we identify areas of weakness in BHL
in order to prioritize what materials to scan next?
• Rod Page has one suggestion: http://iphylo.blogspot.com/2013/10/which-taxonomic-journals-should-be.html
Questions about BHL Content
• What are scalable solutions to content analysis?
• Can we provide creative & meaningful visualizations?
Why do we care about taxonomic names?
• Scientists use taxonomic names to organize their research
• Biodiversity literature breaks down by discipline & by specific taxon
Extracted Scientific Names
What is “Taxonomic Intelligence”?
• Global Names Recognition & Discovery tool– Locate, verify, record scientific names from each
page– Text is uncorrected OCR
Overview of available BHL (meta) data http://biodivlib.wikispaces.com/Data+Exports
• Title metadata: contributed from MARC records of hundreds of library catalogs (BHL consortium libraries & non-BHL IA contributors)
• Volume/item metadata: provides information about the actual objects & pieces digitized
• Subject• Creator/author data• Segment/part/”article” metadata (separate table for
segment/part creators?)• Page metadata which includes our algorithmically identified
scientific name data• OCR text available at the item/volume level but not overall for
corpus of BHL
Data Exports
Visualization of BHL Data for Pinus banksiana
Source Data Sample
Sample BHL & Nomenclatural Data
• Google Refine reconciled list of BHL subject keywords• List of vetted BHL subject targets from collection
development policy• Taxonomic name data set for trees of North America
(link out)• http://www.fs.fed.us/database/feis/plants/tree/in
dex.html• http://www.treesofnorthamerica.net/
• Subject terms associated with BHL titles where Pinus banksiana occurs
OtherTools & Process• Bibliographies (discipline & more)
• Index Animalium: identifies first appearance of 400,000 animals from 1758-1850
• Researcher supplied specific taxon bibliographies• Zoological Record: Taxonomic references back to 1864.• Taxonomic Literature II: a selective guide to botanical publications with
dates, commentaries and scientific types
• Compare universe of biodiversity literature to BHL• Unknown dataset for full universe• Compared BHL member collections to BHL content for gap-filling before
content expanded (lists automated but gap identification manual)• REST especies: a way to collate species metadata?
http://dopa-services.jrc.ec.europa.eu/services/especies/• DOPA Explorer http://ehabitat-wps.jrc.ec.europa.eu/dopasimple/
SAMPLE VISUALIZATIONS
Core & Supporting Keywords for BHL Collections
Wordle for BHL Content
http://public.tableausoftware.com/views/BHLViz/DigitizedSubjects
Visualization Opportunities• JournalMap (geo tagging scientific literature)
http://www.journalmap.org/• Visualizing article performance http://bit.ly/1c4TJfn• Better Life Index
http://www.oecd.org/statistics/datalab/bli.htm • Altmetric: http://www.altmetric.com/
• Tableau http://www.tableausoftware.com/public/• Worth it: http
://www.wired.com/wiredscience/2013/11/wired-data-life-martin-krzywinski/?viewall=true
Taxon Data Manipulation Opportunities
• Euler Project: Reasoning with Taxonomies: http://euler.cs.ucdavis.edu/
• REST & Taxonomy: https://drupal.org/project/taxonomy_api
SUMMARY
• Metadata reconciliation• Gap analysis• Visualizations• All automated!