beispielbild openup! biocase workshop jörg holetschek, gabriele dröge botanic garden &...

76
Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories Königin-Luise-Straße 6-8 14195 Berlin BioCASe Workshop Berlin, May 30 th / 31 st 2011

Upload: oscar-lambert

Post on 23-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

Beispielbild

OpenUp!

BioCASe Workshop

Jörg Holetschek, Gabriele DrögeBotanic Garden & Botanical Museum Berlin-DahlemDept. of Biodiversity Informatics and LaboratoriesKönigin-Luise-Straße 6-814195 Berlin

BioCASe Workshop Berlin, May 30th/ 31st 2011

Page 2: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

2BioCASe Workshop, Berlin, May 30-31st 2011

Agenda

Monday

11.00 Welcome by Walter Berendsohn, Housekeeping

11.20 – 12.00 The BioCASe Architecture: An Overview

12.00 – 13.00 The BioCASe Provider Software I: An Overview

13.00 – 14.00 Lunch break

14.00 – 15.45 The BioCASe Provider Software II: Installation (Hands-on)

16.00 – 17.00 The ABCD data standard: Intention, Structure, Elements, Use

17.00 – 18.00 Preparing the database for BioCASe/ABCD

19.00 Dinner

Tuesday

09.30 – 12.00 Setting Up Datasources with the BPS (Hands-on):DB connection, Table Setup, Mapping; Testing, Data Backups

12.00 – 13.00 Lunch break

13.00 – 14.30 Setting up Networks with BioCASe (Hands-on)

15.00 – 15.30 A Thematic BioCASe Network: The DNA Bank Network

15.30 – 17.00 Questions (and answers?)

Page 3: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

3BioCASe Workshop, Berlin, May 30-31st 2011

Workshop Presentation

http://www.biocase.org/files/BioCASe_Workshop_Berlin_2011.ppt

WiFi

Network: Conference

Key: g59mn3w2

Page 4: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

Beispielbild

1. BioCASe Technology:

Motivation, Idea and Architecture

Page 5: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

5BioCASe Workshop, Berlin, May 30-31st 2011

Primary Biodiversity Information

© Agnes Kirchhoff, J. Holstein et al.

Page 6: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

6BioCASe Workshop, Berlin, May 30-31st 2011

Primary Biodiversity Data Items

- Living specimen- Preserved specimen- Multimedia document (drawing, photo, video, sound)- Observation

= Primary Biodiversity Data Record

Documentation of the occurrence of one species

at a given location at a certain point in time

Biological Collection Access Service

Page 7: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

7BioCASe Workshop, Berlin, May 30-31st 2011

Data sources worldwide

- Index Herbariorum: 3,293 herbaria, 400 million herbarium sheets- 50-100,000 natural history collections, 1.5-2 billion specimens- With observations added, occurrence records 3+ billion (10b?)

Over 75% of biodiversity information are stored in developed countries.

Est. 75% of all species are found in the developing world.

Source: BARTHLOTT et al. 1999

Page 8: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

8BioCASe Workshop, Berlin, May 30-31st 2011

Accessibility

Stage 0: Only in real world (paper catalogues, just stacks)Only meta information available on the web

Stage 1: Stage 2: Online catalogue Digitalization of specimen

Page 9: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

9BioCASe Workshop, Berlin, May 30-31st 2011

Biodiversity Data

Level 3: Networking the databases

Page 10: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

10BioCASe Workshop, Berlin, May 30-31st 2011

Global Biodiversity Information Facility (GBIF)

Page 11: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

11BioCASe Workshop, Berlin, May 30-31st 2011

Biological Collection Access Service (BioCASe)

Page 12: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

12BioCASe Workshop, Berlin, May 30-31st 2011

Architecture of Biodiversity Networks

2. Wrapper Software: BioCASe Provider Software

1. Protocols/Data Standards:BioCASe Protocol/ABCD

Data Quality CheckerDataMining

3. Applications

Data Portal

Page 13: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

13BioCASe Workshop, Berlin, May 30-31st 2011

BioCASe Design Principles

No central database Data remain in the existing DB systems Data Provider gets full credit Full control over published data by collection holder

Partial publication possible

Collection holder can withhold information from publication (e.g., locality data for endangered species) or exclude records (e.g. until research results are published)

Wrapper principle Data remain in original collection management system No changes in workflow for curator/local users

Page 14: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

14BioCASe Workshop, Berlin, May 30-31st 2011

2: The BioCASe ProviderSoftware

Wrapper: BioCASeProvider Software

Protocols/Data Standards

Data Quality CheckerDataMining

Applications

Data Portal

Page 15: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

15BioCASe Workshop, Berlin, May 30-31st 2011

Software package that „wraps“ around the collection database Equips it with a BioCASe protocol compliant interface

1. Accepts requests from the network

3. Transforms results into ABCD documents and sends them back

BioCASe Provider Software (Wrapper)

Marmotamarmota?

2. Translates queries to the collection database

SELECT *FROM specimenWHERE ScientificName LIKE “Marmota marmota%“

Page 16: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

16BioCASe Workshop, Berlin, May 30-31st 2011

BioCASe Provider Software (Wrapper)

Compatible with several protocols (BioCASe, DiGIR) and data schemas (ABCD, DarwinCore, ABCD-EFG, ABCD-DNA)

Works with most SQL-compliant databases (Access, MySQL, Postgres, SQL Server, ...)

Currently ~95 production installations serving ~1,500 collections with ~33.5m records to GBIF and BioCASe

Platform independent

Support

available!

Page 17: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

17BioCASe Workshop, Berlin, May 30-31st 2011

BioCASe Providers Worldwide

~95 production installationsserving ~1.500 collections

Page 18: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

18BioCASe Workshop, Berlin, May 30-31st 2011

Requirements

1. SQL compliant databasewith existing Python connectivity module:MySQL, SQL Server, Postgres, Access, Foxpro, Excel

2. Webserver (preferrably Apache),allowing the execution of Python scripts

3. Privileges to install additional Python packages

Page 19: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

19BioCASe Workshop, Berlin, May 30-31st 2011

Steps

1. Installing Apache

2. Installing Python

3. Downloading BPS

4. Installing BPS(from repository/archive)

5. Creating the link Apache/BPS

6. Test of Installation

7. Changing directory permissions

8. Setup of additional packages (DB Connectivity Package)

Page 20: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

20BioCASe Workshop, Berlin, May 30-31st 2011

1. Installing Apache

http://httpd.apache.org/download

Page 21: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

21BioCASe Workshop, Berlin, May 30-31st 2011

2. Installing Python

http://www.python.org/download/

Page 22: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

22BioCASe Workshop, Berlin, May 30-31st 2011

3. Downloading BPS

Archive: http://www.biocase.org/products/provider_software/

Subversion repository

Latest stable version: http://ww2.biocase.org/svn/bps2/branches/stable Defined version: http://ww2.biocase.org/svn/bps2/tags/release_2.5.3

Linux:

svn co <url> <path>

Windows: Tortoise client

Page 23: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

23BioCASe Workshop, Berlin, May 30-31st 2011

4. Installing the BPS

Setup.py

No files copies,

only adapted!

Page 24: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

24BioCASe Workshop, Berlin, May 30-31st 2011

5. Linking BPS with Apache

http.conf

Page 25: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

25BioCASe Workshop, Berlin, May 30-31st 2011

6. Testing BPS, Installing Additional Packages

http://localhost/biocase Utilities Library Test

Page 26: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

26BioCASe Workshop, Berlin, May 30-31st 2011

6. Write permissions

…/bps2/configuration

…/bps2/log

Page 27: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

27BioCASe Workshop, Berlin, May 30-31st 2011

7a: mysqldb

http://sourceforge.net/projects/mysql-python/

Page 28: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

28BioCASe Workshop, Berlin, May 30-31st 2011

Changing the Password

... /bps/configuration.ini

Page 29: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

29BioCASe Workshop, Berlin, May 30-31st 2011

3: ABCD Standard

Protocols/Data Standards

Wrapper Software

Data Quality CheckerDataMining

Applications

Data Portal

Page 30: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

30BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Data Schema

Access to Biological Collection Data:

Data schema for all types of primary biodiversity data (living/preserved/observational, botanical/zoological/bacterial/viral, marine/terrestrial)

XML (eXtensible Markup Language) based can be consumed by humans and machines

Highly complex, hierarchical, currently 1,055 data elements almost every data item will fit in

Extendable (plug-in slot for additional information)

standard (currently version 2.06)

Page 31: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

31BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: Structure

Namespace: http://www.tdwg.org/schemas/abcd/2.06

Page 32: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

32BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Metadata: Technical/Content Contact

Page 33: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

33BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Metadata: Description

Page 34: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

34BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Metadata: Coverage

Page 35: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

35BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Metadata: Revision/Version

Page 36: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

36BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Metadata: Ownership

Page 37: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

37BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Metadata: Intellectual Property Rights

Page 38: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

38BioCASe Workshop, Berlin, May 30-31st 2011

ABCD Metadata

Page 39: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

39BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: Triple ID, Record Basis

Page 40: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

40BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: Identification (multiple)

Page 41: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

41BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: Gathering Event

Page 42: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

42BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: Multimedia

OpenUp: Thumbnails will be created

Always provide link to image file!

Page 43: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

43BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: Unit Associations

Page 44: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

44BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: Specialised Portions

Specimen Unit: Acquisition, Accession, Peparation, Duplicate Distribution, Type Status

Herbarium Unit: Loan Information

Botanical Garden Unit: Location in Garden, Hardiness, Lineage, Cultivation, Planting Date

Other Specialised Subtrees forObservationsCulture CollectionsMycological UnitsZoological UnitsPaleontological UnitsPlant Genetic Resources

Page 45: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

45BioCASe Workshop, Berlin, May 30-31st 2011

ABCD: UnitExtension

Own Namespace for Extension http://www.chah.org.au/schemas/hispid/5

Other Extensions: Extension for Geoscienes (ABCD-EFG) DNA Bank Network (ABCD-DNA)

Page 46: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

46BioCASe Workshop, Berlin, May 30-31st 2011

BioCASe Protocol

Biological Collection Access Service Protocol:

Manages data exchange between data providers (collections) and applications (data portals)

Vehicle for transporting requests: data portal collection and responses (ABCD documents): collection database data portal

XML based

Page 47: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

47BioCASe Workshop, Berlin, May 30-31st 2011

BioCASe Protocol: Capabilities request

Page 48: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

48BioCASe Workshop, Berlin, May 30-31st 2011

BioCASe Protocol: Inventory Request

Page 49: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

49BioCASe Workshop, Berlin, May 30-31st 2011

BioCASe Protocol: Search Request

Page 50: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

Beispielbild

4. Preparing the database for BioCASe

Page 51: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

51BioCASe Workshop, Berlin, May 30-31st 2011

4. Reasons for not publishing the live DB

1. Publishing the live DB is not desired creating snapshots for publication

2. DBMS not accessible for the BPS export into another DBMS

3. Performance considerations (too highly normalized) partial, controlled denormalization

4. Repeatable elements kept in columns, not in separate rows Moving repeatable elements to separate records

Page 52: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

52BioCASe Workshop, Berlin, May 30-31st 2011

Each repeatable elements needs its own primary key!

Repeatable elements kept in columns

specimen_id ... class order family

3476 ... Conjugatophyceae Desmidiales Desmidiaceae

3477 ... Conjugatophyceae Desmidiales Desmidiaceae

3478 ... Conjugatophyceae Desmidiales Closteriaceae

specimen_id ...

3476 ...

3477 ...

3478 ...

sp_id ht_entry ht_rank ht_name

3476 456765 class Conjugatophyceae

3476 456766 order Desmidiales

3476 456767 family Desmidiaceae

3477 456768 class Conjugatophyceae

3477 456769 order Desmidiales

3477 456770 family Desmidiaceae

3478 456771 class Conjugatophyceae

3478 456772 order Desmidiales

3478 456773 family Closteriaceae

Page 53: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

53BioCASe Workshop, Berlin, May 30-31st 2011

Example View

CREATE VIEW [dbo].[vwHigherTaxa]

AS

SELECT 'k_' + [EDIT_ATBI_RecordID] AS id, [EDIT_ATBI_RecordID] AS unit_id, [kingdom] AS name, 'kingdom' AS rankFROM unit_dataWHERE [kingdom] IS NOT NULL

UNION

SELECT 'p_' + [EDIT_ATBI_RecordID], [EDIT_ATBI_RecordID], [phylum], 'phylum‚ FROM unit_dataWHERE [phylum] IS NOT NULL

UNION

...

Page 54: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

54BioCASe Workshop, Berlin, May 30-31st 2011

Commonly used repeatable elements

- Identification- HigherTaxon- GatheringSite/NamedArea- Metadata/Scope/GeoecologicalTerms- Metadata/Scope/TaxonomicTerms- MultimediaObjects- MeasurementsOrFacts- ...

Page 55: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

55BioCASe Workshop, Berlin, May 30-31st 2011

Controlled Denormalization

insert into [dbo].[abcd_Object]

SELECT dbo.CollectionObject.CollectionObjectID, ISNULL(dbo.CatalogSeries.SeriesName, '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.SubNumber AS nvarchar(20)), '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.CatalogNumber AS nvarchar(20)), ''), dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID), dbo.f_getCollectingEventID(dbo.CollectionObject.CollectionObjectID), dbo.f_getFieldNumber(dbo.CollectionObject.CollectionObjectID), cast(dbo.CollectionObjectCatalog.CatalogNumber as int), dbo.CollectionObject.PreparationMethod, case when Sex = '<No Data>' then NULL else Sex end, case when Stage = '<No Data>' then NULL else Stage end, case when dbo.CollectionObject.Text1 is null then '' else 'Barcode: ' + dbo.CollectionObject.Text1 + '; ' end + case when dbo.Accession.Number is null then '' else 'Specimen Location: ' + dbo.Accession.Number end + case when DerivedFrom.Remarks is null then '' else ' <br> ' + cast(DerivedFrom.Remarks as nvarchar(2000)) end

FROM dbo.BiologicalObjectAttributes RIGHT OUTER JOIN dbo.CollectionObject ON dbo.BiologicalObjectAttributes.BiologicalObjectAttributesID = dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID)

LEFT OUTER JOIN dbo.CollectionObjectCatalog LEFT OUTER JOIN dbo.CatalogSeries ON dbo.CollectionObjectCatalog.CatalogSeriesID = dbo.CatalogSeries.CatalogSeriesID ON dbo.CollectionObject.CollectionObjectID = dbo.CollectionObjectCatalog.CollectionObjectCatalogID

LEFT JOIN dbo.Accession on Accession.AccessionID = CollectionObjectCatalog.AccessionID

LEFT JOIN dbo.CollectionObject AS DerivedFrom ON CollectionObject.DerivedFromID = DerivedFrom.collectionObjectID

WHERE (dbo.f_hasChildObjects(dbo.CollectionObject.CollectionObjectID) = 0) AND ...

Page 56: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

56BioCASe Workshop, Berlin, May 30-31st 2011

How Do I See Someting is Wrong?

Errors in ABCD documents:

Several datasets (one for each unit)

Reason: Metadata field stored in Units table (no separate PK several datasets need to be created)

Several units for one specimen record

Reason: Several records in DB for non-repeatable elements (several ABCD objects are necessary to create a valid document)

Page 57: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

Beispielbild

5. Setting Up a BioCASe Data Source:Database connection, Table Setup, Schema Mapping

Page 58: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

58BioCASe Workshop, Berlin, May 30-31st 2011

BPS Datasource

URL for a BioCASe protocol compliant webservice:http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=AlgenEngels

<?xml version='1.0' encoding='UTF-8'?><request xmlns='http://www.biocase.org/schemas/protocol/1.3'> <header> <type>search</type> </header> <search> <requestFormat>http://www.tdwg.org/schemas/abcd/2.06</requestFormat> <responseFormat start='0' limit='10'> http://www.tdwg.org/schemas/abcd/2.06</responseFormat> <filter> <like path='/DataSets/DataSet/Units/Unit/Identifications/Identification/ Result/TaxonIdentified/ScientificName/FullScientificNameString'>A*</like> </filter> <count>false</count> </search></request>

Page 59: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

59BioCASe Workshop, Berlin, May 30-31st 2011

BPS QueryForms

Tool for sending Scan, Search and Capabilities Requests to a datasource

Choose Datasource „Test and Debug“

Page 60: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

60BioCASe Workshop, Berlin, May 30-31st 2011

Steps for Setting Up a Datasource

1. Create a new Datasource

2. Configure Datasource:1. Database Connection

2. Table Setup

3. Create new empty Mapping

4. Edit Mapping:

1. Choose root table

2. Edit mandatory ABCD elements (red)

3. Save Configration, test datasource (QueryForms)

4. Add additional ABCD elements, occasional testing

3. Test/Debug Datasource

Page 61: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

61BioCASe Workshop, Berlin, May 30-31st 2011

FloraExsiccataBavarica: Additional Fields

Concept Table/Column

Metadata/…

Description/Representation/Details metadata.description (text) IconURI metadata.logo_url (text) Version/Major metadata.source_version (text)

Metadata/IPRStatements/…

Citations/Citation/Text metadata.citationsText (text) Copyrights/Copyright/Text metadata.copyright (text) Disclaimers/Disclaimer/Text metadata.disclaimer (text) Acknowledgements/Acknolwedgement/Text metadata.acknowledgement (text) TermsOfUseStatements/TermsOfUse/Text metadata.terms_of_use (text)

Units/Unit/Gathering/…

Agents/GatheringAgent/Person/FullName unit.sammler (text) Altitude/MeasurementOrFactText unit.hoehe (text) + “m” Altitude/MeasurementOrFactAtomised/LowerValue unit.hoehe (text) Altitude/MeasuremntOrFactAtomised/UnitOfMeasurement “m” Country/ISO3166Code “DE” Country/Name “Germany” DateTime/DateText unit.datum1 (text) LocalityText unit.fundort (text) NamedAreas/NamedArea/AreaClass “State” NamedAreas/NamedArea/AreaName “Bavaria”

Page 62: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

62BioCASe Workshop, Berlin, May 30-31st 2011

How The BPS performs requests

1. Get an ID list of records matching the filter

2. Loading all details for the matching IDs Joining of ALL tables, beginning with the root table (table with UnitID, one record per Unit)

Page 63: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

63BioCASe Workshop, Berlin, May 30-31st 2011

Typical Mapping Errors

-Incomplete Mappings

-Missing explicit mappings for implicit knowledge (e.g. Country = “Germany” for a German collection)

-Abusing the MultimediaObject for non-multimedia Documents (e.g. Links to taxon pages)

-Providing “0” values for non-existent data

Page 64: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

64BioCASe Workshop, Berlin, May 30-31st 2011

Datasource Loglevel

The lower the loglevel, the more information is logged: Debug < Info < Warning < Error

Datasource Configuration Settings

Page 65: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

65BioCASe Workshop, Berlin, May 30-31st 2011

Datasources folder

... /configuration/datasources/<dsname>

querytool_prefs.xmlJust what its name says.

cmf_xxx.xmlConcept mapping; one for each supported schema.

provider_setup_file.xmlDatabase conncetion, table setup, supported schemas.

Regular backup of configuration folder is highly recommended!

Page 66: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

66BioCASe Workshop, Berlin, May 30-31st 2011

Metadata tables

If metadata differ for each or some of the records: several records in metadata table, linked to unit by foreign key

If metadata is unique for all records possible to hold data in one record no reference key is needed static table

Page 67: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

67BioCASe Workshop, Berlin, May 30-31st 2011

Applications

2. Wrapper Software

1. Protocols/Data Standards

Data Quality CheckerDataMining

3. Applications

Data Portal

Page 68: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

68BioCASe Workshop, Berlin, May 30-31st 2011

Local QueryTool

Page 69: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

69BioCASe Workshop, Berlin, May 30-31st 2011

Distributed Search: BioCASe Simple UI

BioCASe Distributed Search: http://search.biocase.org/simple-ui

Page 70: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

70BioCASe Workshop, Berlin, May 30-31st 2011

Harvesting: GBIF Data Portal

Page 71: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

71BioCASe Workshop, Berlin, May 30-31st 2011

GBIF Registration

Page 72: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

72BioCASe Workshop, Berlin, May 30-31st 2011

GBIF Indexing History

Page 73: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

73BioCASe Workshop, Berlin, May 30-31st 2011

EDIT Specimen Explorer: Interactive filters

Page 74: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

74BioCASe Workshop, Berlin, May 30-31st 2011

Distributed Search vs. Harvesting

Distributed Search

+ No harvesting application/database required

+ No Delay with data updates (instantly visible)

- Dependent on Provider Availability

- Slow

- No data verification

- No maps, taxon lists, …

Harvesting

- Need for a harvester/cache database

- Delays when records get updated/added/removed

+ No heavy dependency on provider availability

+ Fast (as long as your portal is)

+ Data verification/improvements/transformation in harvesting process

+ Maps, suggestion lists, Interactive filters, …

Page 75: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

75BioCASe Workshop, Berlin, May 30-31st 2011

OpenUp! Harvesting

BioC

AS

EB

ioCA

SE

BioC

AS

E

OpenU

p! Harvester

OA

I-PM

H

Harvester

ABCDESEEDM

Page 76: Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics

76BioCASe Workshop, Berlin, May 30-31st 2011

Jörg Holetschek, Gabriele Dröge

Botanischer Garten & Botanisches MuseumAbteilung Biodiversitätsinformatik & Labors

Königin-Luise-Straße 6-814195 Berlin-Dahlem

[email protected]. +49 30 838 50150

0448 831 980

www.bgbm.org/biodivinf

www.biocase.orgsearch.biocase.orgsearch.biocase.de

http://www.biocase.org/files/BioCASe_Workshop_Berlin_2011.ppt