tdwg 2006 conference, st louis digitizing the legacy literature of biodiversity an introduction to...

Post on 13-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TDWG 2006 Conference, St Louis

Digitizing the legacy literature of biodiversity

An introduction to the Biodiversity Heritage Library (BHL)

Neil ThomsonNatural History Museum, London

TDWG 2006 Conference, St Louis

BHL origins and objectives

Encyclopedia of Life meeting at Telluride, 2003 Cost and storage possibilities Natural history literature is an ideal digitization candidate Aim: Available at point of use

TDWG 2006 Conference, St Louis

Scope and IPR

Public domain (pre-1923 in USA) Legacy literature as complement to current material Negotiation with societies and Not-For-Profits Creative Commons licensing – some rights reserved

TDWG 2006 Conference, St Louis

Partners

10 Library partners American Museum of Natural History Field Museum Harvard University Botany Library Missouri Botanical Garden Museum of Comparative Zoology, Ernst Mayr Library National Museum of Natural History, Smithsonian

Institution Natural History Museum, London New York Botanical Garden Royal Botanic Gardens, Kew Woods Hole Oceanographic Institution

TDWG 2006 Conference, St Louis

Associates

OCLC http://www.oclc.org/

Internet Archive http://www.archive.org/index.p

hp

Others in negotiation

TDWG 2006 Conference, St Louis

Structure & funding

BHL is a founder member of the Open Content Alliance www.opencontentalliance.org

/

Charitable status English-language project Register of intentFunding

TDWG 2006 Conference, St Louis

Digitization phases

Bibliographic record pooling Internet Archive Pod of 10 cameras Boutique scanning of rare, fragile or oversize material Metadata enhancement Service building

TDWG 2006 Conference, St Louis

Digitization process

Pooled bibliographic records used for selection, matching and status Page images and OCR Addition of identifiers Quality check Return or offsite storage

TDWG 2006 Conference, St Louis

Metadata repository

Bibliographic record pool

Monographs Serial-titles Article-level metadata

OCLC analysis

TDWG 2006 Conference, St Louis

Statistics - 1

Initial analysis showed: We have 1.3 million catalogue records 73% are monographs (remainder are

serials at title-level) 63% is English language material. The

next most popular language (9%) is German.

About 30% of material was published before 1923.

TDWG 2006 Conference, St Louis

Statistics - 2

Overlap analysis Of the 981,000 monograph records

from all institutions 378,000 matching pairs were found

616,000 had no matches at all and were unique to one institution.

After de-duplication of the matching pairs, the final file contains 757,000 records.

TDWG 2006 Conference, St Louis

Metadata development

Data standards METS

DOIs

LSIDs

Indexes and taxonomic intelligence

TDWG 2006 Conference, St Louis

TDWG 2006 Conference, St Louis

The future

What do scientists want from a digital library?

What will the BHL look like?

TDWG 2006 Conference, St Louis

http://bhl.si.edu/index.cfm

top related