gbif checklist bank and the backbone

Post on 23-Jan-2018

470 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GBIF Checklist BankIndexing & Backbone

Checklist Scope1.846 datasets registered 18 million name records

Plazi (1.131), Pensoft (178), CoL GSDs (156)

Denormalized Checklist

Normalized Checklist

Checklist Challenges• Highly relational taxonomic data, almost all records linked in tree & basionym

• Wrong or missing records destroy dataset integrity, not just a single record! • Different to flat, unrelated occurrence records

• Data Quality • broken referential integrity • bad names or placeholders (e.g. «Unallocated Family») • missing or unused controlled vcabularies, e.g. «art» for rank species

• Name strings can be published in several ways • ScientificName • ScientificName + Authorship • Genus + SpeciesEpitheton + Rank + InfraspecificEpitheton + Authorship

• Classifications can be published in several ways • Normalised via parentNameUsageID • Normalised via parentNameUsage • Denormalised via Kingdom,Phylum,Class,Order,Family,Genus

Checklist Indexing• Basic archive validation

• unique ids

• Checklist Normalizer • resolve relations • create implicit taxa from denormalised classification • interpret controlled vocabularies, e.g. rank • match to backbone • match to previous version to keep GBIF ids stable

• Checklist Importer • Inserts data to PostgresDB and solr index for searches

• Checklist Analyser • generate dataset metrics

Organizing Occurrences

• GBIF needs a single, consistent taxonomy • for metrics, search, maps • considerable variation in higher taxa • synonymies can be very large

• Catalog of Life is largest single source • ~90% of GBIF occurrence records (thanks to birds) • ~50% of GBIF occurrence names (35% in 2010)

• GBIF needs to assemble a taxonomy • originally merged (noisy) names found

in occurrences. Resulted in lots of duplicates • improved by stitching together checklist datasets

Cronquist classification Mimosaceae: 3,200 species Caesalpiniaceae: 2,000 species Fabaceae: 14,000 species

“Modern” classification Fabaceae: 19,200 species

Mimosoideae: 3,200 species Cæsalpinioideae: 2,000 species Faboideae: 14,000 species

Current Backbone Issues• Far too many accepted species (acc/syn)

• Cactaceae: GBIF 12.062 (342 syn), TPL 2.233 (5.422 syn) + 5.500 unknown • Genus Weingartia: GBIF 129 (0 syn), TPL 8 (26 syn) + 68 unknown

• Many accepted names based on the same basionym • Sulcorebutia breviflora Backeb. • Weingartia breviflora (Backeb.) Hentzschel & K.Augustin

• No synonyms with different authors possible • Poa pubescens R.Br. synonym of Eragrostis pubescens (R.Br.) Steud. • Poa pubescens Lej. synonym of Poa pratensis L. • merged all names with exact same canonical name

• list of known homonym genera (IRMNG) used to disambiguate between larger groups

Backbone Building

• Overlay ordered sources • Start with Catalog of Life • Primary source defines status • Create new name if kingdom, canonical name & authorship do not exist in

current nub

• Ignore source name if … • not a major Linnean rank (infraspecifc ranks are included) • higher ranks above family (configurable per source) • status conflicts with already existing status • hybrid formula, cultivar, candidatus or placeholder names !!!

Catalogue of Life

Fauna Europaea

GRIN

MammalSpeciesWorld

Observations

Specimens 8000 Species Lists

10s of taxonomic resources

5M+ namesin Primary Data Index

NUBMerged

Match

Backbone AssemblingAnimalia Archaea Bacteria Chromista Fungi Plantae Protozoa Viruses incertae sedis

• Nub build starts with 8 kingdoms

Backbone AssemblingPlantae

Magnoliophyta Magnoliopsida

Asterales Asteraceae

Helianthus L. Helianthus anuus L.

• Catalog of Life is added • Defines higher classification

Plantae Magnoliophyta

Magnoliopsida Asterales

Asteraceae Helianthus L.

Helianthus anuus L.

Backbone AssemblingPlantae

Magnoliophyta Magnoliopsida

Asterales Asteraceae

Helianthus L. Helianthus anuus L.

Cichorium Cichorium intybus L.

• Missing genera are created • Tribe is ignored

Asteraceae Cichorieae Lam & DC. [tribe]

Cichorium intybus L.

Backbone AssemblingPlantae

Magnoliophyta Magnoliopsida

Asterales Asteraceae

Helianthus L. Helianthus anuus L.

Cichorium Linneaus Cichorium intybus L.

= C. balearicum Porta = C. byzantinum Clementi

• Synonyms respect authors • Author match very loose • Existing genus author updated

Plantae Asteraceae

Cichorium Linneaus Cichorium intybus Linneaus

= Cichorium balearicum Porta = Cichorium byzantinum Clem. = Cichorium byzantinum Clementi

Backbone AssemblingPlantae

Magnoliophyta Magnoliopsida

Asterales Asteraceae

Helianthus L. Helianthus anuus L.

Cichorium L. Cichorium intybus L.

= C. balearicum Porta = C. byzantinum Clem.

• Prefer authors from nomenclators

Asteraceae Cichorium L. Cichorium byzantinum Clem.

Backbone AssemblingAsteraceae

Helianthus L. Helianthus anuus L.

Agoseris Agoseris apargioides (Less.) Greene

= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird

Cichorium L. Cichorium intybus L.

= C. balearicum Porta = C. byzantinum Clem.

• Infraspecifics are included

Asteraceae Agoseris apargioides (Less.) Greene

= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird

Backbone AssemblingAsteraceae

Helianthus L. Helianthus anuus L.

Agoseris Agoseris apargioides (Less.) Greene

= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird

Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld.

Cichorium L. Cichorium intybus L.

= C. balearicum Porta = C. byzantinum Clem.

• Other source treats them as species

• Same canonical maritima allowed twice - author different

Asteraceae Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld.

Final Cleanup - BasionymsAsteraceae

Helianthus L. Helianthus anuus L.

Agoseris Agoseris apargioides (Less.) Greene

= A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz

= Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird

= Agoseris maritima E. Sheld. Cichorium L.

Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem.

• Finally basionyms are detected • by terminal epithet & author

within a family • Only 1 accepted per group

• the most trusted first stays

Final Cleanup - AutonymsAsteraceae

Helianthus L. Helianthus anuus L.

Agoseris Agoseris apargioides (Less.) Greene

= A. maritima Eastw. A. a. var. apargioides A. a. var. eastwoodiae (Fedde) Munz

= Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird

= Agoseris maritima E. Sheld. Cichorium L.

Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem.

• Create missing autonyms

Backbone Building Rules• Create missing genus or species in classification

• only for accepted taxa

• Create missing autonyms for infraspecific

• Detect basionyms based on terminal epithet & authorship • Assumes epithet & authorship in family is unique • Converts all but one accepted to synonyms

• Flag taxa as doubtful • genus or higher taxon without any species (IRMNG) • species (or infrasp.) with a parent genus (or species) considered to be a synonym

• moved to newly accepted genus (or species) • the case for potential children of synonymised basionym combination

Backbone Sources• GBIF Backbone Patch

• Catalogue of Life

• World Register of Marine Species

• Dyntaxa - Svensk taxonomisk databas

• GRIN Taxonomy

• Fauna Europaea

• Integrated Taxonomic Information System

• Euro+Med Plantbase

• Interim Register of Marine and Nonmarine Genera

• The Clements Checklist

• IOC World Bird Names

• Mammal Species of the World

• Paleobiology Database

• Nomenclators

• International Plant Names Index

• Index Fungorum

• ZooBank

• Prokaryotic Nomenclature Up-to-date

• ICTV Master Species List

• Organisations

• Species Files

• Biodiversity Data Journal (Pensoft)

• ZooKeys (Pensoft)

• PhytoKeys (Pensoft)

• Plazi ???

Backbone Matching

• Occurrence • fuzzy name match • classification match • allow higher rank matches

• Checklist • match kingdom • require straight canonical match • incl authorship comparison • no webservice yet, only embedded

NameUsageParsed Name

Backbone Match

Citation

Dataset Metrics

Verbatim Record

Metrics

Extensions

• Checklists & Nubsame structure

• Parent-child hierarchy • normalized classification

• flexible ranks

• synonyms accepted rel.

• Dataset metrics as timeseries

• Basionym relation

Schema

CLB Supported Extensions• Description: human paragraphs about some topic • Distribution: area ranges with statuses • Identifier: additional identifier for the record • Multimedia: image, video, sound • Literature references: bibliography • Occurrence (indexed via occurrence workflows) • Species Profile: extinct, marine, freshwater, terrestrial flags • Types and specimens: (overlaps with Occurrence) • Vernacular names: name with language & region

http://rs.gbif.org/extension/gbif/1.0/

Normalizing Classifications

top related