1october 2006richard white, andrew jones & frank bisby - tdwg - st louis federating taxonomic...

31
1 October 2006 Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist Richard White, Andrew Jones , Computer Science, Cardiff University, UK [email protected] [email protected] Frank Bisby Plant Sciences, University of Reading, UK

Upload: alyson-monica-oliver

Post on 18-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Federating taxonomic databases: progress with the Catalogue of Life

Dynamic Checklist

Richard White, Andrew Jones, Computer Science, Cardiff University, UK

[email protected]@cs.cardiff.ac.uk

Frank BisbyPlant Sciences, University of Reading, UK

Page 2: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

2October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

The Species 2000 programme

Species 2000, together with its partner ITIS, operates a federated environment which:

• gathers data from specialist species data providers• delivers the Catalogue of Life:

• Species 2000 global Dynamic Checklist (species; hierarchy);

• regional species checklist for Europe – (prototype for further regional hubs, etc)

Plan to complete Catalogue in 2011

Page 3: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

3October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Main topics

• The Species 2000 federated environment• Interoperability conventions and standards

adopted

Page 4: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

4October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

The federation: organisation

Species 2000 assembles sectors “side by side”:

Taxonomic hierarchy (or hierarchies)

Species

Global species databases (GSDs) and interim checklists:

the catalogue of life GSDinterim

checklists

Species information sources (SISs): regional faunas and floras, specialist or sectoral

databases, web pages etc.

SIS

Page 5: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

5October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Uses for the system

• on-line reference tool (available)• index to further Web-based species resources

(planned; rudiments implemented for some taxonomic groups)

• “synonymy server”, exposed as a Web service (available, but to be improved)

Page 6: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

6October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Species 2000 home pageSpecies 2000 home page

User about to click on “Dynamic Checklist” …

Page 7: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

7October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Dynamic Checklist search pageDynamic Checklist search page

User interested in Dwarf Gourami; knows its genus …

Page 8: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

8October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Found some speciesFound some species

User interested in Colisa laelia (Dwarf Gourami) and about to click on this name …

Page 9: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

9October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Colisa laelia standard data (1)Colisa laelia standard data (1)

Scroll to bottom …

Page 10: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

10October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Colisa laelia standard data (2)Colisa laelia standard data (2)

Follow further information link …

Page 11: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

11October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Colisa laelia in FishBaseColisa laelia in FishBase

Information from FishBase in this case

Page 12: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

12October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Spice for Species 2000

Currently provides Common Access System (CAS) for Species 2000

• implements a hub

• gathers data from providers via wrappers

• integrates and caches

• makes data available to users and other software

Page 13: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

13October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Recent progress with Species 2000 (1)

EuroCat project:• added many new data providers for further taxonomic

sectors• improved Spice• set up “Species 2000 europa” regional hub (using Spice)• experimented with “cross-mapping”, using Litchi• gained better understanding of the dynamics of developing

and incorporating new GSDs• New wrapper-writing resources made available

Page 14: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

14October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Recent progress with Species 2000 (2)

Current activities include:• Secretariat at Reading• At least 4 new databases have become available in the last

few months: people are busy working on various sectors• Annual checklist:

• Long term plan: snapshot of dynamic checklist• Currently parallel development in Philippines• ≥ 8 new databases being added• 2007 expected ≥ 1,000,000 species

Page 15: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

15October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Components of the architecture

Main data and software components of the Catalogue: • Autonomous species databases (GSDs) • GSD wrappers• “Hubs” (portals) to assemble data from wrappers;

provide data to clients• Interfaces

• for users• for software

• Maintenance and administration software tools(e.g. metadatabase)

Page 16: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

16October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Species 2000 protocols – overview

How (GSDs) interoperate in this federation ... four levels:

1. Organisational model for a federation in which data providers provide data about “taxonomic sectors”; hub assembles complete catalogue (see above)

2. Framework for information exchange based on a number of defined requests

3. Human-readable Common Data Model (CDM): abstract definition for requests; responses; data exchanged

4. Specific computer-readable interface definitions, implementing CDM

Page 17: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

17October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Species 2000 protocols and data standards

Activities at the “federation” level 1 described above.

Levels 2, 3 and 4:• Species 2000 defines internal data standards• Intended to be open standards

Page 18: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

18October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Interoperability level 2: Informal data request and response model

Describes informally how information is exchanged:• between federation components, including:

• data providers, the hub and software clients of the hub

• by means of (currently six) requests defined for specific purposes

• with correspondingly defined response dataThis model:

• avoids need for providers to handle general database queries

• treats GSDs as “black boxes”

Page 19: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

19October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Request types

The request types sent by the CAS to SPICE wrappers:

0: get version of CDM the wrapper implements1: look up species name or ambiguous search string2: get “standard data” for a given species name

includes accepted name, synonym(s), common name(s),distribution data, reference(s), latest taxonomic scrutiny, andlinks to other online resources about the species

3: obtain metadata concerning source database &data provider

4: move one step up taxonomic hierarchy5: move one step down taxonomic hierarchy

Page 20: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

20October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Interoperability level 3: Formal data and request/response model

Human-readable Common Data Model (CDM) for reference purposes

• provides abstract definition for the requests and responses, including parameters, etc

• candidate set of operations for retrieval of species-related data more generally

• defines the components of data transmitted and received• Data model defined specifically for Species 2000 “standard data set”

• doesn’t define programming-language or technology-specific implementations

(Also available: “Species 2000 standard data set”, which summarises CDM briefly)

Page 21: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

21October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Interoperability level 4: Interface definitions

Computer-readable interface definitions, following the CDM, for use with particular implementations,including Corba IDL, XML DTD and XML Schema for:

• requests from hub to wrappers

• requests from external client software to hub

Page 22: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

22October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Requests from hub to wrappers

Spice hub communicates with GSD wrappers using HTTP:

• “CGI” GET requests are sent to a wrapper, which returns an XML document in response

• An XML Schema (XSD) defines the specific XML requests and responses• Corba used within SPICE; corresponding IDL document

• NB CDM 1.20 is being updated to reflect minor modifications recently made to XSD, etc.

Page 23: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

23October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Requests from external client software to hub

A SOAP Web Service to allow programmatic access to dynamic checklist (including by the user interface), to interrogate Spice global & European hubs:http://spice.sp2000europa.org/SPICE/services/CASWebService

(location and definition may change)

CAS Web Service version 1.0 informal definition & WSDL:http://biodiversity.cs.cf.ac.uk/sp2000/protocol/

Page 24: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

24October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Further information

• Species 2000 programme and Species 2000 & ITIS Catalogue of Life:http://www.sp2000.org

• Species 2000 protocols and practices:http://biodiversity.cs.cf.ac.uk/sp2000/protocol/

• Spice:http://biodiversity.cs.cf.ac.uk/spice/

• Biodiversity Software Repository at Cardiff for access to Spice, other software and some wrappers:http://biodiversity.cs.cf.ac.uk/software/

Page 25: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

25October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Collaboration in open standards and software

We would like to see future progress as a community effort for developing

• data standards

• interoperable software• Especially interoperation with emerging standards,

e.g. TCS

Page 26: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

26October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Opportunities for standardisation

We would welcome consideration of the request/response model as a useful data representation-independent basis for interrogating sources of species related information

Page 27: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

27October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Join us in enhancing SPICE & associated software

Areas for work include

• sophisticated management tools• revision of SPICE code-base• reusable software for wrapper writers• addition of new protocols and schemas

Page 28: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

28October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Towards the Species Banks of the future

• Some Species 2000 GSDs currently provide “onward links” to rich species information

• Plan to investigate link-bases in which the Catalogue of Life can play an important part in the species banks of the future

Page 29: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

29October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Date for your diaries

1-day symposium to discuss Species 2000 Phase 2: progressing beyond 1 million species to the target 1.75 million

• The University of Reading, UK • March 2007 (probably 29th)

Page 30: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

30October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Summary

• These protocols and standards are intended to be open and available for others to use when building similar federated information systems

• Our 6 operations are a candidate set for interchange of taxonomic data (possibly needing augmentation)

• They are described further in Species 2000 data standards documents at:http://biodiversity.cs.cf.ac.uk/sp2000/protocol/

Page 31: 1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist

31October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis

Acknowledgements

• Funding: BBSRC, European Commission, GBIF

• Species 2000 Project Team and Directors

• Data providers