bibliographic references in bhl

Post on 10-May-2015

124 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bibliographic references in BHL

Coordination and routes for cooperation across organizations, projects and e-

infrastructures23rd of May 2013

William Ulate R., Missouri Botanical Garden

Questions to Answer1. Type of content we discuss (e.g., occurrences, genes, behaviour,

morphology, etc.)2. Sources of content (from where)3. Formats of content (formats, standards)4. Methods of gathering information (e.g., harvesting, ftp uploads, protocols)5. Methods of delivery of information (e,g., free searches, API, web services,

automated exports, linking mechanisms, etc.; provide links to API and web services documentation)

6. Identifiers used (type, persistence, dereferencing, resolvability)7. Present or forthcoming interoperability features with other platforms8. Constraints, needs and expectations to:

a) Suppliers of content, and b) Users of content

9. What is needed for Bibliographic References?

A brief history…

The Biodiversity Heritage Library

www.biodiversitylibrary.org

Book Viewer

Sharing

BHL shares data through:

APIsData ExportOpenURLOAI-PMH

Open Data

• Downloads– Simple tab-delimited exports of core data– http://www.biodiversitylibrary.org/data/BHLExportSchema.pdf

• Data model– DB schema as ERD– http://bhl-bits.googlecode.com/files/20090930_BHLDataModel.pdf

Services

• Names Service– Return all occurrences of a name throughout BHL digitized corpus

• Documentation: http://bit.ly/2e6sg9

– Access to 100+ million name strings using TaxonFinder & NetiNeti• 1.5 million unique names

– Algorithm to detect nomenclatural & taxonomic acts

• OpenURL– Facilitate links to citations: protologues, articles, references

• Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspx– Useful to Nomenclators, Reference Systems

• IPNI• Tropicos

Services: OpenURL

http://www.biodiversitylibrary.org/openurl?pid=title:3934&volume=14&issue=&spage=301&date=1879

http://www.tropicos.org/Name/1200408

DOIs

DOIs for Legacy Literature

• BHL member of CrossRef through Smithsonian• Started assigning DOIs to BHL monographs– Low hanging fruit: Easy, non-controversial– 54,856 DOIs Approved to date

• Next, other publication types / articles?– Process of automatically assigning CrossRef DOIs

to articles has a higher potential for collisions.

Article-level metadata

• Disambiguating and locating structural components in the corpus

• Done by automated and crowdsourced means– Thanks Rod Page! Welcome others!

• Greatly increases semantic value of the dataset

• Makes data addressable and thus linkable

Chapter-level metadataTreatment-level metadata Part-level metadata

Genesis: “BHL Article Repository”

• Idea first introduced at TDWG 2008, Fremantle (by BHL, many have discussed for years)

• YouTube for biodiversity articles• Needed (need) a way to access articles in BHL– “BHL has no articles.”– BHL has hundreds of thousands of articles but you

can’t search for them via author, article title search– Can find via “article coordinates” using BHL’s UI &

OpenURL resolver: Journal / Volume / Start Page / Year

CiteBank

• Objectives– Create a repository for community-vetted

taxonomic bibliographies.– Ability to ingest, display, download, and index

articles so that the BHL can operate as an article repository.

– Provide links to content published online through other repositories.

• Launched on December 6th 2010• 185609 bibliographic records to date

Citations today: http://citebank.org

Citations Providers

SpecimenDatabases

CommercialAggregators

Software ToolsOpen Access

Digital Libraries

Indices

Nomenclators

SpecimenDatabases

CommercialAggregators

Software ToolsOpen Access

Digital Libraries

Indices

Nomenclators

Open AccessPublishers

International Collaborative Projects

Lessons Learned

• Biblio/Drupal data model insufficient for mass of data envisioned for all biodiversity, too flat and difficult to expand in collaboration with Biblio development community

• Data providers want their content findable and managed in the Biodiversity Heritage Library, not a system alongside BHL

• Maintaining two platforms for biodiversity literature threatens sustainability of the literature resources over the longer term

Global Names Architecture

What have we done?

• Articles– Extended BHL data model to store article metadata– Built process to harvest data from BioStor

• Created user interfaces for adding article metadata and associated files– Defined functional requirements as improvements to Drupal-based

Citebank– Defined process flow for adding article metadata and associated

files– Implemented UI changes

• Changed BHL UI to accommodate article search• Changed BHL UI to accommodate article display (TOC)

Articles in the BHL UI

Articles

Articles

Articles

Requirements for a citation repository?

Admin. Interface– IMPORT AND MAPPING TOOL• Preview/Accept/Reject/Undo/Report on Import• No standard schema, MODS or Bibtex• Drag & drop GUI or mapped source and target field config.

– USER MANAGEMENT• Self-Registration• Admin. Approval & Deletion• User Roles Assignment

– GLOBAL UPDATES

Requirements for a citation repository?

General User Interface– IMPORT• Upload/Preview/Accept/Reject/Undo/Report on Import

– CREATE CITATION• By filling a Form, via BibTex

– BROWSE• Faceted: title,author,subject, year, contributor, my citations

Requirements for a citation repository?

• CITATION TYPES– Journal Article, Book Chapter, Conference Proceedings,

Conference Paper, Thesis, Government Report, Note, etc.

• OAI HARVESTING– Harvest and serve data through OAI-PMH

• SPECIFICATIONS FOR DATA PROVIDERS PAGE

• CONTRIBUTORS PAGE– Recognize ALL contributions

• REPORTING– Statistics Page by Citation and Publication type– Recent/Latest Uploads

What are we doing?

• Integrate BHL’s Services with ZooBank, IPNI & IF

• Authoritative list of titles in common use for nomenclatural acts (“TL3”)

• Harvest relevant content from Mendeley

• Integrate services and interfaces with the GNUB data model

• Interoperate with citation parsing tools & services

Support citation reconciliation

.

.

.

.

.

.

.L. Sp. Pl. 2: 971. 1753

Linneaus, C. Species Plantarum, vol. 2 p. 971. 1753

Linné, Carl von. Sp. Pl. Vol. 2 Page 971. 1753

Caroli Linnaei, Species Plantarum exhibentes plantas rite cognitas, ad genera relatas, cum Differentis Specificis, Nominibus Trivialibus, Synonymis Selectis, Locis Natalibus, secundum SYSTEMA SEXUALE digestas.. 2:971. 1753

Zea mays

Questions to Answer

1. Type of content - Literature, Images, OCR Text and Bibliographic Citations

2. Sources of content - BHL, CB & other Repositories 3. Formats of content - BibTex, MODS, DC4. Methods of gathering info - Harvesting, FTP Uploads5. Methods of delivery of info - Free Searches, API, web

services, exports, linking mechanisms

6. Identifiers used - CrossRef DOIs for Monographs7. Interoperability with

other platforms - Zoobank, IPNI, IF8. Constraints, needs and expectations to suppliers of content and users of

content

Thank you

pro-iBiosphere Meeting 3Coordination and routes for cooperation across organizations, projects and e-infrastructures Berlin, GermanyMay 23rd, 2013

William.Ulate@mobot.orgGlobal BHL Project ManagerBHL Technical DirectorSenior Project ManagerMissouri Botanical Garden

top related