an international cooperative digital library for taxonomic literature: the biodiversity heritage...

43
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008 Martin R. Kalfatovic Smithsonian Institution Libraries 6 June 2008 An International Cooperative Digital Library for Taxonomic Literature

Upload: martin-kalfatovic

Post on 18-May-2015

1.275 views

Category:

Education


0 download

DESCRIPTION

An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library. Martin Kalfatovic. The Catholic University of America, School of Library and Information Science. LSC 715. 6 June 2008. Washington, DC.

TRANSCRIPT

Page 1: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Martin R. KalfatovicSmithsonian Institution Libraries6 June 2008

An International Cooperative Digital

Library for Taxonomic Literature

Page 2: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

The cultivation of natural science cannot be efficiently carried on without reference to an extensive library

Charles Darwin, et al (1847)

Darwin, C. R. et al. 1847. Copy of Memorial to the First Lord of the Treasury [Lord John Russell], respecting the Management of the British Museum. Parliamentary Papers, Accounts and Papers 1847, paper number (268), volume XXXIV.253 (13 April): 1-3. [Complete Works of Charles Darwin Online]

Page 3: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Taxonomic descriptions must be published for the name to be valid

Publications must be available to the public through trusted sources

Libraries have been the traditional place

Taxonomic Literature

Page 4: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

The cited half-life of publications in taxonomy is longer than in any other scientific discipline

* * * The decay rate is longer than in any scientific discipline

~ Macro-economic case for open accessTom Moritz

Taxonomic Literature

Page 5: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Over 250 years of systematic description of life

Systema naturae (10th ed. 1758) by Carl von Linné

Taxonomic Literature

Page 6: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

2003. Telluride. Encyclopedia of Life meeting

February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature

May 2005. Washington. Ground work for the Biodiversity Heritage Library

June 2006. Washington. Organizational and Technical meeting

August 2006. New York Botanical Garden. BHL Director’s Meeting.

October 2006. St. Louis/San Francisco. Technical meetings

February 2007. Museum of Comparative Zoology. Organizational meeting

May 2007. Encyclopedia of Life and BHL Portal Launch. Washington DC.

BHL Timeline

Page 7: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL MembersAmerican Museum of Natural History (New York)

Field Museum (Chicago)

Natural History Museum (London)

Smithsonian Institution Libraries (Washington)

Missouri Botanical Garden (St. Louis)

New York Botanical Garden (New York)

Royal Botanic Garden, Kew

Botany Libraries, Harvard University

Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University

Marine Biological Laboratory / Woods Hole Oceanographic Institution

Page 8: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Members

University of Illinois, Urbana-Champaign (contributing member)

Scheme for addition of European and Asian partners underway

Additional categories of membership under consideration

Page 9: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Focus: Literature

Page 10: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Focus: Literature

Page 11: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

The Internet Archive

• 501(c)(3) organization• Dedicated to “Universal Access to

Human Knowledge”• Founder of the Open Content Alliance• Provides:

– Mass scanning– Archival storage of files– Image processing– Technology development

Page 12: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Scribe Scanner

• Single Scribe Machine– Custom built by the

Internet Archive– Human operated– 3,500 page per shift per

day

Page 13: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Scanning Centers

Northeast Regional Scanning Center 10 Scribe machines MBL/WHOI Harvard

New York Public Library 10 Scribe machines AMNH NYBG

Page 14: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Scanning Centers

University of Illinois 2 Scribe machines

Natural History Museum, London 1 Scribe machine

Missouri Botanical Garden Non-Scribe operation

Page 15: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Scanning Centers

Washington, DC 1 Scribe machine at

Smithsonian Libraries 10 Scribe facility at

Library of Congress with Fedlink (operational Spring 2008)

Page 16: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Scanning Stats: Now

5.5 million plus total pages scanned (and growing daily)

<90,000 Fieldiana (via UIUC)

>100,000 pages each Harvard, New York Botanical Garden,

225,000+ pages from the American Museum of Natural History

400,000+ from Smithsonian Libraries

500,000+ from the Natural History Museum, London

800,000 Missouri Botanical Garden Library

1,000,000+ from the MBL/WHOI library

Page 17: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

But what about ...

Page 18: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Page 19: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL \ Google(the difference between)

Bibliographic accuracy for all materials

Ability to re-purpose and reuse all data as needed

Congruence of original printed materials to digital versions

Page 20: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Persistent Identifiers Stable URL Handle DOI BICI/SICI ISSN ISBN LSIDs

http://www.biodiversitylibrary.org

Page 21: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Structural Markup<article>  <title>A BRIEF CONSIDERATION OF

CERTAIN POINTS IN THE MORPHOLOGY OFTHE FAMILY CHALCIDID^E.*.</title>

  <author>L. O. HOWARD.</author>   <volume>1</volume>   <issue>2</issue>   <start_page>65</start_page>   <end_page>86</end_page>   <start_count_page>85</start_count_page>   <end_count_page>106</end_count_page>  

<start_page_image_file>3908800908001101smthrich_0085.djvu</start_page_image_file>

  <end_page_image_file>3908800908001101smthrich_0106.djvu</end_page_image_file>

  </article>

Page 22: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Semantic Markup

GoldenGATEThe intention of the GoldenGATE editor is to build a bridge between NLP components and XML markup of natural language text according to arbitrary XML schemas. It allows the deployment of NLP components to marking up the bodies of literature they were designed for. In this way, it enables transforming the texts into XML content according to an XML schema that was designed to gain maximum benefit from the knowledge provided in them.

Integrated Open Taxonomic Access (INOTAXA)

Page 23: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

10.7 million name strings in NameBank

Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text

Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition

Taxonomic Intelligence

Page 24: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL & Publishers

Page 25: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Permissions

• Seek permissions from copyright holders

• Opt in Copyright Model: The BHL will actively work with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals

• BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost.

• Will provide a set of files to the publishers for reuse as they see fit

Page 26: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Advantages• Use of the articles will increase

as evidenced by citation upsurge• Long-term management of the

digital assets is provided by the BHL at no cost

• Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century

• Structural markup of backfiles into conformance with NLM DTD (just starting)

Page 27: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Successes

• Entomological News• Journal of Hymenoptera

Research

• Herpetological Review

• Publications of the San Diego Natural History Museum

• California Academy of Sciences publications

• And more ...

Page 28: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

BHL Portal• Library catalog-like interface

to BHL literature• Enhanced structural

analysis to provide volume/issue/article page access to the literature

• Iterative development based on feedback from user community

• Provide access to two key audiences:–Humans–Machines

Page 29: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Initial grant from the MacArthur and Sloan Foundations (as part of the Encyclopedia of Life grant)

Additional support from parent institutions

Additional grants being actively pursued by BHL and individual members

Funding

Page 30: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Page 31: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

• Co-evolving bioinformatics resources produce a rich information ecology:

– Consortium for the Barcoding of Life (CBOL) with gene sequences deposited in GenBank.

– GBIF’s Electronic Catalog of Taxonomic Names

– Herbaria and museum specimen databases

Looking Forward

Page 32: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

• Quick ramp-up high early costs – development, mass scanning, etc.

• Derive some long-term costs from the operating budgets of the member institutions (Examples under consideration: acquisitions budget, staff positions, etc.)

• Integrate functions/tasks with wider efforts where appropriate, e.g. mass storage

Looking Forward

Page 33: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Institutions that are creating the BHL exist to persist through time. That’s an important part of their business

The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly

The Long Now Strategy

Page 34: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned.

Charles Davies Sherborn, Epilogue to Index Animalium,

March 1922

A Global Library for Life

Page 35: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Page 36: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Midrange estimate: 25% of 5 million species = 1.3 million species, or roughly 1 every 20 minutes

Low estimate: 15% of 4 million species = 0.6 million species, or roughly 1 every 44 minutes.

High estimate: 50% of 6 million species = 3 million species, or roughly 1 every 9 minutes

Conservation Internationalhttp://tinyurl.com/3hzkax

Page 37: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Page 38: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Page 39: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Page 40: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Page 41: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Thank You ... for sticking around!

Page 42: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Biodiversity Heritage Libraryhttp://www.biodiversitylibrary.org/

Biodiversity Heritage Library Bloghttp://biodiversitylibrary.blogspot.com

Encyclopedia of Lifehttp://www.eol.org/

Smithsonian Institution Librarieshttp://www.sil.si.edu/

Universal Biological Indexer and Organizerhttp://www.ubio.org/

Biologia Centrali-Americana http://www.sil.si.edu/digitalcollections/bca/

LINKS

Page 43: An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: CUA/SLIS LSC 715 :: 6 JUNE 2008

Thanks to:

Chris Freeland, Missouri Botanical Garden

Tom Garnett, The Biodiversity Heritage Library

The staff at the Internet Archive

Images from

The Galaxy of Images, Smithsonian Libraries (www.sil.si.edu/imagegalaxy)

Martin R. Kalfatovic

Suzanne C. Pilsk

Bernard Scaife

CREDITS