bhl developments - prague

41
BHL DEVELOPMENTS BHL-EUROPE MEETING NÁRODNÍ MUZEUM, PRAGUE 16 NOV 2009 Chris Freeland Technical Director, BHL

Upload: chris-freeland

Post on 10-May-2015

5.826 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: BHL Developments - Prague

BHL DEVELOPMENTS

BHL-EUROPE MEETING

NÁRODNÍ MUZEUM, PRAGUE

16 NOV 2009

Chris Freeland Technical Director, BHL

Page 2: BHL Developments - Prague

Kai in STL,describing ametadata format

Page 3: BHL Developments - Prague

We like to have fun while BHLing…

Page 4: BHL Developments - Prague

Blame the scotch

Page 5: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Stats: Now Online

Last week: 15,000 titles 40,000 volumes 16.4mil pages

Today: 34,636 titles 66,544 volumes 25.2mil pages

BHL Partner Libraries

BHL + >100 other libraries with open access content at archive.org

Page 6: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Stats: Usage

Jan – Sep 2009 266,000 visitors 436,000 visits 2.1million

pageviews

Daily average 970 visitors 1,600 visits / day 7,700 pageviews /

dayJan – Sep 2009

Launch to 30 Sep 2009

Page 7: BHL Developments - Prague

New Color Scheme: To be released this week

http://github.com/openlibrary/bookreader

Page 8: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Cloud storage & computing

Page 9: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Global, coordinated development Building a community of developers

Funded & volunteer RubyBHL: http://github.com/mjy/rubyBHL

PyBHL: http://linux.softpedia.com/get/Programming/Libraries/pybhl-51612.shtml

Programmers from China & Australia committed to project

New partners, new content, new possibilities

Page 10: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Open Software & Development BHL Bits:

Portal code, utilities, services http://code.google.com/p/bhl-bits/

Taxonomic Literature Group Google Group for discussion of “taxonomic

literature & the services required to make literature interoperable within biodiversity research and biodiversity informatics.”

http://groups.google.com/group/taxonlit

Page 11: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Open Data

Downloads Simple tab-delimited exports of core data http://www.biodiversitylibrary.org/data/BHLExportSchema.pdf

Data model DB schema as ERD

http://bhl-bits.googlecode.com/files/20090930_BHLDataModel.pdf

Page 12: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Services

Names Service Return all occurrences of a name throughout BHL

digitized corpus Documentation: http://bit.ly/2e6sg9

Access to 51million name strings using TaxonFinder

1.4million unique names

OpenURL Facilitate links to citations: protologues, articles,

references Documentation:

http://www.biodiversitylibrary.org/openurlhelp.aspx Useful to Nomenclators, Reference Systems

IPNI Tropicos

Page 13: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Services: OpenURL

http://www.biodiversitylibrary.org/openurl?pid=title:3934&volume=14&issue=&spage=301&date=1879

http://www.biodiversitylibrary.org/openurl?pid=title:3934&volume=14&issue=&spage=301&date=1879

http://www.tropicos.org/Name/1200408

Page 14: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Services: OpenURL Disambiguation Looking for:

BHL returns:

Page 15: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Services: OpenURL Results

Page 16: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Encyclopedia of Life

522,000 species pages linked to BHL #1 referring site

Page 17: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Other Consumers

EarthCape Labs Sort/Search capabilities with harvested names YouTube demo:

http://www.youtube.com/watch?v=qw7qw87JTOs

BioGUID / iPhylo BHL Name Timeline & Comparison

http://bioguid.info/bhl/ http://bioguid.info/bhl/compare.php

New Viewer Tagging So much cool stuff we can’t keep up!

http://iphylo.blogspot.com/search/label/BHL

@rdmpage

Page 19: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Crowdsourced Articles

http://www.biodiversitylibrary.org/pdfgen/17298

Demo: http://youtube.com/watch?v=oidf3b26jVs

Page 20: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Crowdsourced Articles

12,000 PDFs generated through September 2009 4,900 submitted with article metadata Analysis: http://bit.ly/4Jqu9

Page 21: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Great, but how to…

display / manage?

meet community demands for bibliography / citation management?

build from more open source tools?

Page 22: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Development goals re: citations Create a repository for community-vetted

taxonomic bibliographies. Ability to ingest, display, download, and

index articles so that the BHL can operate as an article repository.

Identify article boundaries in BHL digitized content using contributed bibliographies & algorithms.

Build from existing community of work around Drupal / Biblio. In use by collaborators

Page 23: BHL Developments - Prague

“something like GenBank or NameBank for citations…”

So, CitationBank…or CiteBank (savs chars)

Need…

Page 24: BHL Developments - Prague

http://citebank.biodiversitylibrary.org/

Page 25: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Crowdsourced Articles

PDFs from BHL pushed into Drupal/Biblio:

Page 26: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

http://citebank.biodiversitylibrary.org/search

Page 27: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

http://citebank.biodiversitylibrary.org/node/47423

Page 28: BHL Developments - Prague
Page 29: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

CiteBank boundaries

Book

Citation

Pageturning UIPDFOCR

eBook/Kindle

Stored *somewhere* & retrievable via HTTP URI

CitationCitationCitation

Bibliography

CiteBank

Page 30: BHL Developments - Prague

BHL Data Flow – Sep 2009

CiteBank

Page 31: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Points of discussion @ TDWG09…

Linked Literature and the Biodiversity Heritage Libraryhttp://www.tdwg.org/proceedings/article/view/548

Page 32: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Who can upload & edit?

Trusted repositories? Approved specialists? BHL Librarians? People in this session? Citizen scientists? 6th graders? Rod Page?

Discussion: Session participants thought it important that BHL get as many citations as possible, then find ways of implementing trust mechanisms for users such as iSpot (Drupal module), ratings systems, ways of tagging inappropriate materials.

Page 33: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

What about duplicates?

3 Bibliographies had Syst. Nat. All 3 in different reference

manager formats All 3 had variant forms

of title:

Syst. Nat.

Systema Naturae

Systema naturae per regna tria naturae

Library catalogues:Caroli Linnaei...Systema naturae per regna tria

naturae :secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis.

Discussion: Important to have all the ways in which materials have been referred to over time, then have algorithms & people aggregate titles/articles (translations) into reconciliation groups, resulting in a master index.

Page 34: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Accuracy

How clean is clean? How dirty is dirty? What’s good enough?

How to Rank Gold/Platinum?

Dirty Bucket/Clean Bucket?

Discussion: Let users decide which is the “right” form for use; may differ from project to project. BHL should take it all in, then refine using our libraries’ collected knowledge + involvement from domain specialists.

Page 35: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Right technologies?

“But Drupal’s awful…just ask ___ for their bad experience.”

“Drupal’s great!”

“MySQL won’t scale” “MySQL’s great!”

Discussion: Drupal has limitations, but a large community of developers & implementers. There may be a “Montpellier Declaration” to centralize efforts within biodiversity informatics around the framework. Drupal/Biblio is a good starting point for CiteBank, needs further evaluation after more data are loaded & site is used.

Page 36: BHL Developments - Prague

…BHL keeps growing & growing & growing…

New projects

Page 37: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Darwin’s Library

AMNH, NHM, CUL, BHL (MOBOT)

Funded by NEH/JISC Digitization of Darwin’s

personal library, with annotations New interfaces for recording,

indexing, displaying annotations

Review “Dannotate” technology from ALA:http://metadata.net/sfprojects/dannotate.html

Page 38: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

BHL Take Away

Content now available in EPUB format Used by Stanza, transferable to Kindle

Blog post by John Mignault (NYBG): http://john.mignault.net/blog/2009/10/28/first-bhl-e-book-

experiments/

Page 39: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Next steps

Bring hardware online at MBL Have one point of redundancy By Q1 2010

Bring BHL-Europe & other nodes online In conjunction with DuraCloud & other solutions

Release CiteBank for beta & sandbox testing Beta at http://citebank.biodiversitylibrary.org Sandbox at http://sandcite.biodiversitylibrary.org Production release by Q2 2010

Integration of BHL-Europe tools & content

Page 40: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Global BHL Coordination

Page 41: BHL Developments - Prague

Biodiversity Heritage Library: http://biodiversitylibrary.org

Thanks!

Chris FreelandTechnical Director, BHL

Director, Center for Biodiversity Informatics, Missouri Botanical Garden

[email protected]://twitter.com/chrisfreeland