tdwg 2006 conference, st louis digitizing the legacy literature of biodiversity an introduction to...

15
TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson Natural History Museum, London

Upload: juliana-webster

Post on 13-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Digitizing the legacy literature of biodiversity

An introduction to the Biodiversity Heritage Library (BHL)

Neil ThomsonNatural History Museum, London

Page 2: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

BHL origins and objectives

Encyclopedia of Life meeting at Telluride, 2003 Cost and storage possibilities Natural history literature is an ideal digitization candidate Aim: Available at point of use

Page 3: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Scope and IPR

Public domain (pre-1923 in USA) Legacy literature as complement to current material Negotiation with societies and Not-For-Profits Creative Commons licensing – some rights reserved

Page 4: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Partners

10 Library partners American Museum of Natural History Field Museum Harvard University Botany Library Missouri Botanical Garden Museum of Comparative Zoology, Ernst Mayr Library National Museum of Natural History, Smithsonian

Institution Natural History Museum, London New York Botanical Garden Royal Botanic Gardens, Kew Woods Hole Oceanographic Institution

Page 5: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Associates

OCLC http://www.oclc.org/

Internet Archive http://www.archive.org/index.p

hp

Others in negotiation

Page 6: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Structure & funding

BHL is a founder member of the Open Content Alliance www.opencontentalliance.org

/

Charitable status English-language project Register of intentFunding

Page 7: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Digitization phases

Bibliographic record pooling Internet Archive Pod of 10 cameras Boutique scanning of rare, fragile or oversize material Metadata enhancement Service building

Page 8: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Digitization process

Pooled bibliographic records used for selection, matching and status Page images and OCR Addition of identifiers Quality check Return or offsite storage

Page 9: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Metadata repository

Bibliographic record pool

Monographs Serial-titles Article-level metadata

OCLC analysis

Page 10: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Statistics - 1

Initial analysis showed: We have 1.3 million catalogue records 73% are monographs (remainder are

serials at title-level) 63% is English language material. The

next most popular language (9%) is German.

About 30% of material was published before 1923.

Page 11: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Statistics - 2

Overlap analysis Of the 981,000 monograph records

from all institutions 378,000 matching pairs were found

616,000 had no matches at all and were unique to one institution.

After de-duplication of the matching pairs, the final file contains 757,000 records.

Page 12: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Metadata development

Data standards METS

DOIs

LSIDs

Indexes and taxonomic intelligence

Page 13: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

Page 14: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

The future

What do scientists want from a digital library?

What will the BHL look like?

Page 15: TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson

TDWG 2006 Conference, St Louis

http://bhl.si.edu/index.cfm