bhl tech report

48
Technology Review: BHL Institutional Council Mtg 22 Mar 2010

Upload: chris-freeland

Post on 26-May-2015

1.035 views

Category:

Technology


0 download

DESCRIPTION

Technical Report to the Biodiversity Heritage Library Institutional Council on 22 Mar 2010 at American Museum of Natural History

TRANSCRIPT

Page 1: BHL Tech Report

Technology Review: BHL Institutional Council Mtg

22 Mar 2010

Page 2: BHL Tech Report

Stats

Page 3: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Now online

40,000 titles 76,000 volumes 28.7 million pages

70 million name strings 58 million confirmed names 1.4 million unique names

Page 4: BHL Tech Report
Page 5: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Size of BHL content *today*

Page 6: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

http://biodiversitylibrary.org/page/5225013

Bigger than a breadbox, smaller than a sperm whale

Page 7: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Usage

Page 8: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

1.1mil visits from 231 countries since launch

Page 9: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Referrers: 2008 - 2009

Page 10: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Referrers: 2010

Jan 1 – Mar 15, 2010

Page 11: BHL Tech Report

Stats unique to our tools

Page 12: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

PDF Articlizing stats

Page 13: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

# Items by Library

Items   Institution Name11,476   University of California Libraries (archive.org)11,244   MBLWHOI Library9,537   Smithsonian Institution Libraries6,461   New York Botanical Garden5,129   Harvard University, MCZ, Ernst Mayr Library4,932   Gerstein - University of Toronto (archive.org)3,882   Natural History Museum, London3,350   Missouri Botanical Garden2,821   Library of Congress (archive.org)2,509   University of Illinois Urbana Champaign2,029   American Museum of Natural History Library1,996   NCSU Libraries (archive.org)1,692   UMass Amherst Libraries (archive.org)1,296   Webster Family Library of Veterinary Medicine (archive.org)1,216   Robarts - University of Toronto (archive.org)1,100   Canadiana.org (archive.org)

621   Boston Public Library (archive.org)579   University of New Hampshire Library (archive.org)516   Montana State Library (archive.org)282   Prelinger Library (archive.org)

Page 14: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

# Names by Library

Names   Institution Name14,109,080   MBLWHOI Library12,241,186   Smithsonian Institution Libraries

9,105,969   New York Botanical Garden7,860,553   Missouri Botanical Garden5,323,730   University of California Libraries (archive.org)4,818,365   Harvard University, MCZ, Ernst Mayr Library4,776,527   Gerstein - University of Toronto (archive.org)3,050,242   Natural History Museum, London2,387,731   American Museum of Natural History Library2,292,570   NCSU Libraries (archive.org)2,106,182   UMass Amherst Libraries (archive.org)1,836,281   University of Illinois Urbana Champaign

532,635   Earth Sciences - University of Toronto (archive.org)518,695   Robarts - University of Toronto (archive.org)225,357   Canadiana.org (archive.org)177,283   Boston Public Library (archive.org)

97,663   Library of Congress (archive.org)83,089   Prelinger Library (archive.org)75,113   University of Connecticut Libraries (archive.org)71,512   The Field Museum

Page 15: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

“Taxonomic Density” by Library

Tax. Density Names Items   Institution Name

2,346.4 7,860,553 3,350   Missouri Botanical Garden

1,409.4 9,105,969 6,461   New York Botanical Garden

1,283.5 12,241,186 9,537   Smithsonian Institution Libraries

1,254.8 14,109,080 11,244   MBLWHOI Library

1,244.8 2,106,182 1,692   UMass Amherst Libraries (archive.org)

1,176.8 2,387,731 2,029   American Museum of Natural History Library

1,148.6 2,292,570 1,996   NCSU Libraries (archive.org)

968.5 4,776,527 4,932   Gerstein - University of Toronto (archive.org)

939.4 4,818,365 5,129   Harvard University, MCZ, Ernst Mayr Library

785.7 3,050,242 3,882   Natural History Museum, London

731.9 1,836,281 2,509   University of Illinois Urbana Champaign

463.9 5,323,730 11,476   University of California Libraries (archive.org)

Simple: avg. # names / item

Page 16: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Q. How many species have been reported only once? [Taxacom]

As of March 1, 2010, BHL had identified more than 70 million potential name strings across its 28 million digitized pages using uBio's TaxonFinder. 58 million of those name strings were confirmed as a name with a NameBankID. Of that set, 1,491,000 name strings were unique. 329,000 of those unique names were found on a single page in BHL.

Page 17: BHL Tech Report

Application / Portal

Page 18: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

New since November

New color scheme IA / CDL content

+ names indexing APIs OAI interface Work on Darwin’s Library annotations Primary / Secondary titles enhancements Started testing solutions for “orange bag

problem” Working with EOL on nomenclatural acts

service

Page 19: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Consumers

EarthCape BioGuid BioSTOR JSTOR – in discussion

Research projects BREC - NSF Conjecturator - NSF Darwin’s Library – NEH/JISC Hong Cui @ University of AZ - NSF

Page 20: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

OCR correction using WikiSource

http://biostor.org/wiki/Page:Spixiana1999zool.djvu/293

Page 21: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Partnership Statement

What, if anything, do we need as an agreement between parties for use of BHL materials? Always open access – more a service

agreement Consider: What is true value of $50 we

paid to scan BookX when inserted into other research

Page 22: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Terms of Use / Privacy Policy

Need resolution to move forward on publishing APIs

Page 23: BHL Tech Report

Hardware / Infrastructure

Page 24: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

WH cluster

Transferred 28,000 volumes from IA 22TB

44,000 more in the queue Started Friday

Complete BHL + IA/CDL by May

Need to discuss implications with BHL-Europe

Page 25: BHL Tech Report

Cluster

~$17,ooo USD

Page 26: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

DuraCloud

Pilot has added partners BHL NYPL WGBH More to come

10TB of content uploaded Good test set, not complete, not intended to

be Test download speeds with BHL-E & BHL-Au June 30 deadline for uploading without $$

Page 27: BHL Tech Report

Global BHL

Page 28: BHL Tech Report
Page 29: BHL Tech Report
Page 30: BHL Tech Report

BHL-Europe

Page 31: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

BHL-Europe

http://biodiversitylibrary.eu Hiring WP2 leader Moving bidlist to Vienna Building infrastructure Getting content Submitting metadata to Europeana

Page 32: BHL Tech Report

BHL-China

Page 33: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

BHL-China

http://bhl-china.org Still working out issues for scanning Plan to scan 48,000 books / year

2 shifts 10 Scribes

Excited about Global Tech meeting Will come prepared with ideas for change

Page 34: BHL Tech Report
Page 35: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

BHL-Australia

Page 36: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

BHL-Australia

http://ec2-75-101-224-221.compute-1.amazonaws.com/ Took code & easily ran in EC2

Offered usability assistance Planning workshop in Au in May September 2010 relaunch of ALAu Ready to go

Page 37: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

BHL-Brasil

Page 38: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

BHL-Brasil

SciELO content ready for import Can automate ingest into CiteBank

Page 39: BHL Tech Report

CiteBank

Page 40: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Ingesting content from Publishers Big publishers - auto ingest

Machine to machine Set up, configure & go

Small publishers - need help Niche content Likely to provide some assistance, but will

require it Individual users – need help

Need a lot of individual attention Big community & opportunity, but takes tending

Publishing platform also important

Page 41: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

Similar missions / Staffing issues PubMed Central PLoS JSTOR

All with multiple staff to handle ingest, inquiries

Page 42: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

CiteBank Possibilities

Need 2 years of developer work to make it bigger

Or…

Need 2 years of content assistance to make it better fill data into existing structure

Biblio is a good start, but needs some tuning for biodiversity literature

Page 43: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

TL3: GRIB

Taxonomic Literature 3: The Global Reference Index to Biodiversity

Critical, yet absent: comprehensive list of biodiversity literature, complete with all variants in spelling, known identifiers over time, bibliographic descriptions, and recommendations on how to cite each work.

*Big* job, but doable Modeled on & worked in association with

Taxonomic Literature 2

Page 44: BHL Tech Report

Magic

Unicorn

Syndrome

Fearing

Page 45: BHL Tech Report

Reallocation

Page 46: BHL Tech Report

Radical question: If #BHL could offer you more content or more services, which would you choose? "Both" not an option in this experiment.

Posted to Twitter

http://twitter.com/chrisfreeland/status/10575364681

Page 47: BHL Tech Report

Biodiversity Heritage Library: http://biodiversitylibrary.org

“CONTENT!”

@chrisfreeland Given that I make my own services, content is what I want #bhl #allyourdataarebelongtome

@chrisfreeland at this point of time more people will benefit from more content than more services. unless we treat indexing as service

Page 48: BHL Tech Report