tony rees: towards a hierarchical classification of all life

57
Towards a Hierarchical Classification of All Life – the IRMNG data assembly project Tony Rees – CSIRO Marine and Atmospheric Research, Australia October 2011

Upload: tony-rees

Post on 18-May-2015

620 views

Category:

Technology


1 download

DESCRIPTION

Presentation at Marine Biological Laboratory (MBL), Woods Hole, October 2011

TRANSCRIPT

Page 1: Tony Rees: Towards a Hierarchical Classification of All Life

Towards a Hierarchical Classification of All Life – the IRMNG data assembly project

Tony Rees – CSIRO Marine and Atmospheric Research, Australia

October 2011

Page 2: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Why a hierarchical classification?

Page 3: Tony Rees: Towards a Hierarchical Classification of All Life

• Hierarchical classifications assist us to organize knowledge

Tony Rees: Hierarchical Classification of All Life

Why a hierarchical classification?

“borrowed” from R. Page presentation, 2011

• Hierarchical classifications allow us to infer information about lower levels from higher ones (don’t have to explicitly re-specify / verify / know everything)

• Hierarchical classifications allow us to make / test predictions based on degree of “relatedness”

Page 4: Tony Rees: Towards a Hierarchical Classification of All Life

• Hierarchical classifications assist us to construct +/- automated “expert systems”

Tony Rees: Hierarchical Classification of All Life

Why a hierarchical classification?

Functional view

Thesystem

Structural view

genus / species

name “X”

useful information

on taxon “X”

Page 5: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

What should “the system” ideally hold? – something like…

(etc.)

Page 6: Tony Rees: Towards a Hierarchical Classification of All Life

• Expanded to information on “all life”:

• Animals, plants, fungi, protists, bacteria + archaea (prokaryotes), viruses

• Both extant and fossil organisms

• Aim for comprehensive coverage – no gaps – to desired level of the hierarchy

• Information held in consistent terminology, machine-readable content

• Either human user, or machine user access point (or both)

• Hyperlinked cross-refs for web users

• Continuously updated & upgraded

• Provenance for all content

(probably plus more…)

Tony Rees: Hierarchical Classification of All Life

What should “the system” ideally hold?

x 50+…

Page 7: Tony Rees: Towards a Hierarchical Classification of All Life

• Taxon Scientific names are preferred units of currency & identity in the world of biology:

• More stable / authoritative than common (vernacular) names

• Indicate the genus to which a species belongs

• Higher classification allows nesting intoprogressively larger taxa, each with definablecharacteristics

• “Linnaean” ranks: kingdom through species(NB some intermediate ranks also important,should handle in due course)

(*taxon = named “taxonomic unit”, a defined unit atany rank, i.e. species, genus, family, etc.)

Tony Rees: Hierarchical Classification of All Life

System is based on scientific names of taxa

2+ million

~250k

~10k

~2k

Kingdoms (5/6/7/8)

~400

~140Phyla

Classes

Orders

Families

Genera

Species

Page 8: Tony Rees: Towards a Hierarchical Classification of All Life

• All life to family level:

• Parker (ed.), 1982, Synopsis and Classification of Living Organisms, print, 2 vols, ~2,300 pp.: ~7k family descriptions in a common hierarchy (extant taxa only)

• Benton (ed.), 1993, The Fossil Record 2, print, ~850 pp.: ~5k family brief treatments, mainly fossil

• Code-specific to genus level:

• Zoology: Nomenclator Zoologicus (to 2004), (print + online) then Zoo. Record / ION, online (NB, Nomen. Zool. has no detailed higher taxonomy)

• Botany: Index Nominum Genericorum (ongoing), online, also IPNI, TROPICOS, etc.

• Bacteriology: List of Prokaryotic names with Standing in Nomenclature (LPSN) (online)

• Viruses: International Committee on Taxonomy of Viruses (ICTV) database, online

Tony Rees: Hierarchical Classification of All Life

Availability of comprehensive treatments

Page 9: Tony Rees: Towards a Hierarchical Classification of All Life

• Taxon specific to species level: Global Species Databases (“GSDs”) exist for specific groups e.g.

• Mammals: Mammal Species of the World (2005, print + online)

• Fishes: Eschmeyer’s Catalog of Fishes (ongoing, online)

• Higher Plants: The Plant List (2010, online) + contributing DB’s

• Fungi: Index Fungorum and Species Fungorum (ongoing, online)

• Algae: AlgaeBase (ongoing, online)

• Others: AntBase, Systema Dipterorum, LepIndex+ many more

• (also viruses and prokaryote lists as per previous slide)

…>100 GSDs aggregated into a singleCatalogue of Life compilation (annual editions2000-current) produced by Sp2000 + ITIS (USA)

Tony Rees: Hierarchical Classification of All Life

Availability of comprehensive treatments – cont’d

Page 10: Tony Rees: Towards a Hierarchical Classification of All Life

• A great project BUT…

• ~30% of extant species (plus relevant higher taxa) still missing

• only a subset of species synonyms included, and no genus synonyms stated

• no fossil taxa (although Paleobiology Database has some / many of these)

Tony Rees: Hierarchical Classification of All Life

Can we use Catalogue of Life as a comprehensive resource?

• a few higher tax. conflicts

• no intermediate ranks (e.g. subphylum, infraorder)

• no genus authors or publication info

• latency for new names (esp. in some groups)

• no target completion date

…GBIF experience: only ~30% of incoming species names are in the Catalogue of Life (not much good for data aggregators).

Page 11: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

What about “names aggregator” activities

• Collect names as used in primary + secondary sources, mix of “clean” (verified) and “dirty” (unverified) names

• Authority portion of the names not standardized (same name may appear on the list multiple times)

• Frequently lacking coherent / any higher taxonomy

… Potentially a useful “superset” of (most) “good” names, but requires work to filter these out.

(etc.)

Page 12: Tony Rees: Towards a Hierarchical Classification of All Life

• Answer “yes” BUT…

• Need to knit them all together across Codes, also no single source is complete, even within a Code

• Need to add family allocations where missing, e.g. from Nomenclator Zoologicus, also taxonomic synonyms, consistent hierarchy information, etc. etc.

• Need to deal with inconsistencies / overlaps between data sources (editorial decisions), also “house style” issues

• Need to back-fill residual data gaps

• As desired, also would like to add non-taxonomic “attributes” e.g. extant / fossil status / geologic range, habitat information, geographic distribution, more ???

• Bonus short cut

• Leverage the hierarchy to avoid having to add attributes at every lower level – e.g. inherit genus / species attributes from higher up where these are unambiguous

• Examples: all dinosaurs are extinct, all cephalopods are marine, etc. etc.

• Similarly, all species of a marine-only genus will also be marine, etc.

Tony Rees: Hierarchical Classification of All Life

Genus level compilations are much more complete, can we use those?

Page 13: Tony Rees: Towards a Hierarchical Classification of All Life

• IRMNG – the Interim Register of Marine and Nonmarine Genera

• Aims to fill the gaps and produce an “interim” hierarchical classification of all life (HCAL), extant + fossil, to at least genus level (species lists to be added as readily accessible)

• Utilizes Parker, 1982 and Benton, 1993 family compilations as starting point for higher classification

• Specific sectors then upgraded through time, also incorporating relevant marine/nonmarine and extant/fossil flags

• Genera added from the most comprehensive available sources (over time)

• “Interim” status used to indicate lesser degree of scrutiny / authoritativeness than e.g. Cat. of Life, however hopefully still useable

• home page: www.obis.org.au/irmng,data access page: www.cmar.csiro.au/datacentre/irmng/

• Will hold more names than valid taxa, due to synonymy:

• Nomenclatural synonyms – add maybe 5% to genera, 300% to species

• Taxonomic synonyms – add maybe 100%-200% to genera and species

Tony Rees: Hierarchical Classification of All Life

The IRMNG concept

Page 14: Tony Rees: Towards a Hierarchical Classification of All Life

• 1 record per every name / publication instance (valid or invalid) including:

• the name itself

• the author and year for the name (1 version only)

• publication details as available

• source/s used, with or without editorial adjustment

• for botanical names, include full (not abbreviated) author name, also year of publication (normally omitted)

• nomenclatural and taxonomic status, as known (plus any relevant comments)

• placement in the tax. hierarchy (every record knows its parent, child records reference this one), plus cross-links as required

• selected attributes, initially:

• Extant/fossil status: Extant / Fossil / both / unknown

• Habitat flag: Marine / Nonmarine / both / unknown

• provenance, degree of verification for all content

Tony Rees: Hierarchical Classification of All Life

IRMNG desired content

Page 15: Tony Rees: Towards a Hierarchical Classification of All Life

Family placement – editorial decisions may be needed

Tony Rees: Hierarchical Classification of All Life

• e.g. for (botanical) genus “Pachydiscus”:

Page 16: Tony Rees: Towards a Hierarchical Classification of All Life

Data aggregation complicated by genus level homonyms e.g.:

Tony Rees: Hierarchical Classification of All Life

• also by variant authority citations e.g.:

• (etc.)

Page 17: Tony Rees: Towards a Hierarchical Classification of All Life

Perseverance produces the following(subset of genus table, 453k names as at Oct 2011):

Tony Rees: Hierarchical Classification of All Life

Page 18: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

A glimpse of the IRMNG “master genus” table(currently 452,827 records)

Page 19: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

A glimpse of the IRMNG “master genus” table(currently 452,827 records)

(Mabberley plant names list)

Page 20: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Detail showing example source/s used

Page 21: Tony Rees: Towards a Hierarchical Classification of All Life

• High-level overview + relevant statistics for “all life” (currently possible for names, in future for valid taxa)

• Navigate the hierarchy in any direction

• Generate hierarchical lists

• Generate alphabetic lists

• Sort / filter by any desired criteria

• Generate lists of homonyms, within or across Codes

• Indicate current tax. hierarchy, nomenclatural / taxonomic status, and attributes (to varying degrees) for any input name

• Indicate near match targets to any input name (“did you mean…”) – using TAXAMATCH fuzzy matching (custom solution for tax. databases)

Tony Rees: Hierarchical Classification of All Life

Services / views this currently supports

Page 22: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

IRMNG-generated statistics for “all life” (web query 6 Oct 2011)

• (Important note – can actually generate these lists as required, by navigating the hierarchy)

Page 23: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Other services / products e.g. full hierarchical lists

however with caveat: some / many genera may still be classified only at higher level (e.g. “Mammalia – unallocated”) at this time (more work to do).

Page 24: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Check batches of entered names

(1,406 genus names…)

Page 25: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Check batches of entered names

(start of IRMNG search result)

Page 26: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Check batches of entered names

Page 27: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Check batches of entered names

?

Page 28: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Query by taxon name (correctly spelled or misspelled)

Page 29: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Check batches of entered names

• Basically this is then a Taxonomic Name Resolution Service (TNRS), similar to the one developed in 2011 by the (U.S.) iPlant team over TROPICOS, but across all groups:

Page 30: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Linking names with literature

Page 31: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

The “microcitation” (Nomenclator’s favourite…)

• Typically just author name, year, page no. in work, e.g.:

• Would prefer full article-level titles / authors / pagination if possible – i.e. a bibliographic module

• Could optionally offer onward links to page views in BHL, abstracts, full text as pdfs, etc. as available (small sample populated in IRMNG at this time)

Name pluspage in work

List of all works as data objects

Page 32: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Expanded citation info in IRMNG - example

Page 33: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Expanded citation info in IRMNG - example

Page 34: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Expanded citation info in IRMNG - example

Page 35: Tony Rees: Towards a Hierarchical Classification of All Life

IP issues regarding bibliographies, etc.

Tony Rees: Hierarchical Classification of All Life

• Many sources assert copyright over bibliographies, potentially an issue

• Does copyright exist in individual references extracted from a third party collection

• What about subsets of the collection

• What about new composite supersets

• Law may be different in different countries

• Licensing / terms of use may be different from law

… still very unclear (to this author) what is / is not permissible with respect to assembling new bibliographies which include content from elsewhere – including copy/paste vs. re-keying…

• Will be a recurring issue for other bibliography-assembly projects e.g. CiteBank, Mendeley… but think of the value (a “bibliography of life”)

Page 36: Tony Rees: Towards a Hierarchical Classification of All Life

IRMNG content – recent missing genera…

Tony Rees: Hierarchical Classification of All Life

Page 37: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

IRMNG content – genus names published by year, 1995-current(as at Oct 2011), excluding virus names (which are undated)

(NB could disaggregate further as desired, e.g. by detailed tax. group, or extant vs. fossil…)… also would expect a small number of residual names missed for ostensibly “complete” years

presumedmissing names

Page 38: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

IRMNG 2011 content cf. Cat. of Life 2011

Note, Chapman, 2009 estimates c.1.9m described extant species (see earlier slide)

On that basis, CoL has 70% of valid extant species names, maybe 70% of valid extant genera (with subset of genus-level synonyms)

IRMNG is missing est. 10k genera from 2004-2011 (from last slide), maybe further 2-3% overall (say 10k-15k), “complete” list would thus be ~475k at this time (increasing at ~2k/year).

Cat. of Life - 2011 edition

% with auth's

IRMNG –Oct 2011 -

extant + fossil% with auth's

IRMNG –Oct 2011 - fossil only

         

Kingdoms 8   7   0

Phyla 111   153   12

Classes 288   509   64

Orders 1,233   2,645   715

Families 8,071 0% 19,639 22.1% 6,542

Subfamilies          

Genera 178,515 0% 452,848 97.1% 90,278

Subgenera          

Species (valid) 1,347,224 ~100% 1,020,519 ~100% 16,792

Species (synonyms) 895,441 ~100% 440,738 ~100% 100

Page 39: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Many unfinished tasks

• Update / standardize the higher classification across groups (start made, much still to do)

• Fill gaps in nomenclatural / taxonomic status, synonym reconciliation, family allocations for significant subset of names

• Legacy names acquisition, where currently missing (i.e., not in major nomenclators)

• New names acquisition (~25k species, 2k+ genera / year…), plus taxonomic reallocations – ongoing task, requires resources or (preferably) automated feeds

• Extension to “all species”… ???

Page 40: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Potential integration / replacement with “GN” components…

• MBL staff and collaborators are currently engaged in constructing components of a “Global Names Architecture” i.e.:

• GNI – Global Names Index

• GNUB – Global Names Usage Bank

• GNITE – Global Names Index Taxonomic Editor

• GNA CLR / GBIF ChecklistBank

• CiteBank – publication citation repository

• ZooBank – register for new / old animal names

• more…

• Some / much of this has potential overlap with IRMNG (present focus of my MBL visit).

Page 41: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Potential integration / replacement with “GN” components…

D. Patterson et al., from 2010-11 NSF proposal

Proposed “Global Names” infrastructure components:

Page 42: Tony Rees: Towards a Hierarchical Classification of All Life

Contact detailsPhone: +61 3 6232 5318

Email: [email protected] Web: www.cmar.csiro.au/datacentre/

Thank you

Thanks to:

- OBIS, GBIF and Atlas of Living Australia for financial support, numerous data providers for data

- CSIRO for salary and in-kind support, 2006-present

- D. Patterson / MBL / NSF (this trip funding + hosting)

Tony Rees: Hierarchical Classification of All Life

Page 43: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Supplementary slides

Page 44: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Where to from here…

The names publishing / discovery landscape:

Page 45: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

New names: potential discovery paths

new virus n

ames

new prokaryote names

new botanical names – algae & fungi (except

fossils)

new botanical names – bryophytes through angiosperms (except fossils)

new zoological names

publication discovery official registers taxon-specific DB’s integrated DB’s “all names”

Botany

Zoology

Newly published names – primary

literature (print,

electronic)

Newly published names – primary

literature (print,

electronic)

ICTV Viruses DBICTV Viruses DB

LPSN(Prokaryote names)

LPSN(Prokaryote names)

ICBN DecisionsICBN Decisions

ICZN DecisionsICZN Decisions

Journal TOC’s, RSS feeds,text mining

Journal TOC’s, RSS feeds,text mining

Abstracting servicesAbstracting services

Subject bibliographies

Subject bibliographies

Reviews, secondary literature

Reviews, secondary literature

Zoological RecordZoological Record ION (Index of Organism Names)ION (Index of Organism Names)

ChecklistBank

GNI

GNUB

ZooBank?

ChecklistBank

GNI

GNUB

ZooBank?

Catalogue of Life annual

editions

Catalogue of Life annual

editions

ITISNCBI Taxonomy

WoRMSetc.

ITISNCBI Taxonomy

WoRMSetc.

CyanoDBCyanoDB

Index FungorumMycoBank

Index FungorumMycoBank

AlgaeBaseAlgaeBase

Plant GSD’sPlant GSD’s

PaleoDBPaleoDB

Animal GSD’sAnimal GSD’s

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

IRMNGIRMNG

Page 46: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

New names: potential discovery paths

new virus n

ames

new prokaryote names

new botanical names – algae & fungi (except

fossils)

new botanical names – bryophytes through angiosperms (except fossils)

new zoological names

publication discovery official registers taxon-specific DB’s integrated DB’s “all names”

Botany

Zoology

Newly published names – primary

literature (print,

electronic)

Newly published names – primary

literature (print,

electronic)

ICTV Viruses DBICTV Viruses DB

LPSN(Prokaryote names)

LPSN(Prokaryote names)

ICBN DecisionsICBN Decisions

ICZN DecisionsICZN Decisions

Journal TOC’s, RSS feeds,text mining

Journal TOC’s, RSS feeds,text mining

Abstracting servicesAbstracting services

Subject bibliographies

Subject bibliographies

Reviews, secondary literature

Reviews, secondary literature

Zoological RecordZoological Record ION (Index of Organism Names)ION (Index of Organism Names)

ChecklistBank

GNI

GNUB

ZooBank?

ChecklistBank

GNI

GNUB

ZooBank?

Catalogue of Life annual

editions

Catalogue of Life annual

editions

ITISNCBI Taxonomy

WoRMSetc.

ITISNCBI Taxonomy

WoRMSetc.

CyanoDBCyanoDB

Index FungorumMycoBank

Index FungorumMycoBank

AlgaeBaseAlgaeBase

Plant GSD’sPlant GSD’s

PaleoDBPaleoDB

Animal GSD’sAnimal GSD’s

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

IRMNGIRMNG

Lots of manual effort

Page 47: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

New names: potential discovery paths

new virus n

ames

new prokaryote names

new botanical names – algae & fungi (except

fossils)

new botanical names – bryophytes through angiosperms (except fossils)

new zoological names

publication discovery official registers taxon-specific DB’s integrated DB’s “all names”

Botany

Zoology

Newly published names – primary

literature (print,

electronic)

Newly published names – primary

literature (print,

electronic)

ICTV Viruses DBICTV Viruses DB

LPSN(Prokaryote names)

LPSN(Prokaryote names)

ICBN DecisionsICBN Decisions

ICZN DecisionsICZN Decisions

Journal TOC’s, RSS feeds,text mining

Journal TOC’s, RSS feeds,text mining

Abstracting servicesAbstracting services

Subject bibliographies

Subject bibliographies

Reviews, secondary literature

Reviews, secondary literature

Zoological RecordZoological Record ION (Index of Organism Names)ION (Index of Organism Names)

ChecklistBank

GNI

GNUB

ZooBank?

ChecklistBank

GNI

GNUB

ZooBank?

Catalogue of Life annual

editions

Catalogue of Life annual

editions

ITISNCBI Taxonomy

WoRMSetc.

ITISNCBI Taxonomy

WoRMSetc.

CyanoDBCyanoDB

Index FungorumMycoBank

Index FungorumMycoBank

AlgaeBaseAlgaeBase

Plant GSD’sPlant GSD’s

PaleoDBPaleoDB

Animal GSD’sAnimal GSD’s

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

IRMNGIRMNG

Lots of automated

feeds + expert

curation

Page 48: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

New names: potential discovery paths

new virus n

ames

new prokaryote names

new botanical names – algae & fungi (except

fossils)

new botanical names – bryophytes through angiosperms (except fossils)

new zoological names

publication discovery official registers taxon-specific DB’s integrated DB’s “all names”

Botany

Zoology

Newly published names – primary

literature (print,

electronic)

Newly published names – primary

literature (print,

electronic)

ICTV Viruses DBICTV Viruses DB

LPSN(Prokaryote names)

LPSN(Prokaryote names)

ICBN DecisionsICBN Decisions

ICZN DecisionsICZN Decisions

Journal TOC’s, RSS feeds,text mining

Journal TOC’s, RSS feeds,text mining

Abstracting servicesAbstracting services

Subject bibliographies

Subject bibliographies

Reviews, secondary literature

Reviews, secondary literature

Zoological RecordZoological Record ION (Index of Organism Names)ION (Index of Organism Names)

ChecklistBank

GNI

GNUB

ZooBank?

ChecklistBank

GNI

GNUB

ZooBank?

Catalogue of Life annual

editions

Catalogue of Life annual

editions

ITISNCBI Taxonomy

WoRMSetc.

ITISNCBI Taxonomy

WoRMSetc.

CyanoDBCyanoDB

Index FungorumMycoBank

Index FungorumMycoBank

AlgaeBaseAlgaeBase

Plant GSD’sPlant GSD’s

PaleoDBPaleoDB

Animal GSD’sAnimal GSD’s

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

other compilations e.g. regional lists, Wikispecies, Wikipedia, more…

IRMNGIRMNG

Lots of automated

feeds + expert

curation

Lots of useful

services

Page 49: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

How many taxa?

2+ million

~250k

~10k

~2k

Kingdoms (5/6/7/8)

~400

~140Phyla

Classes

Orders

Families

Genera

Species

valid extant + fossil taxa (est.)

How many species?estimates according to Chapman, 2009 (valid, extant taxa only);“others” comprise c. 54k protists, 10k prokaryotes, 2k viruses

NB inverts. includes “~1,000,000” for Insects – probably +/- 60k

Fossil species – no published estimates – maybe 500k names, 300k valid

Page 50: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Relevant information domain: all life

PROTISTS

Fig. i-1 in Margulis & Schwartz, 1998

Page 51: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

How many kingdoms…

PROTISTS

Fig. i-1 in Margulis & Schwartz, 1998

7 kingdoms (5 in Margulis & Schwartz, 8 in Cat. of Life…):

Animals, Fungi, Plants: 3 kingdoms

Protists: 1 (or 2 if Stramenopiles [Heterokonts] recognized,= Cavalier-Smith’s Kingdom “Chromista”)

Bacteria + Archaea: 2 (=1 in Margulis & Schwartz)

Viruses: 1 (not in Margulis & Schwartz)

Page 52: Tony Rees: Towards a Hierarchical Classification of All Life

Tony Rees: Hierarchical Classification of All Life

Nomenclature governed by four separate Codes, i.e. Zoological, Botanical, Bacteriological, Viruses

PROTISTSZoo. Code

Bact. Code

Bot. Code

Vir. Code:viruses (not shown) Fig. i-1 in Margulis & Schwartz, 1998

Page 53: Tony Rees: Towards a Hierarchical Classification of All Life

CiteBank as a remote references repository?

Tony Rees: Hierarchical Classification of All Life

Unexplored questions…

• how well populated is CiteBank either now, or in near future

• can third party bibliographies be uploaded into it (with / without infringing IP)

• Zoo. Record and similar operators do this already on a commercial basis – how to reconcile these activities / avoid redundant effort

• would CiteBank IDs / outward links be an adequate substitute to storing / inspecting / displaying this info locally

Page 54: Tony Rees: Towards a Hierarchical Classification of All Life

Parker, 1982 content example

Tony Rees: Hierarchical Classification of All Life

Page 55: Tony Rees: Towards a Hierarchical Classification of All Life

Benton, 1993 content example

Tony Rees: Hierarchical Classification of All Life

Page 56: Tony Rees: Towards a Hierarchical Classification of All Life

Rees TAXAMATCH fuzzy matching poster (start)

Tony Rees: Hierarchical Classification of All Life

Page 57: Tony Rees: Towards a Hierarchical Classification of All Life

Schematic of TAXAMATCH operation

Tony Rees: Hierarchical Classification of All Life