Download - Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
@rdmpage
http://iphylo.blogspot.com
To a first approximation the taxonomy of life is already digital…
doi:10.1126/science.276.5313.734
Data – GenBank
Publications – PubMed
Names – Names4Life
So, we’re done! (aren’t we?)
doi:10.1126/science.276.5313.734
Zoology as microbiology
GenBank DNA barcoding➔
PubMed Digital archives (BHL)➔
Names ION, ZooBank, uBio, …➔
Microbiology Zoology
Images from http://phylopic.org
Why does having a single database of names matter?
Bacterial names linked to literature
http://dx.doi.org/10.1099/ijs.0.035154-0
Paenibacillus polymyxa
• http://dx.doi.org/10.1601/nm.5110 (name)• http://dx.doi.org/10.1601/tx.5110 (taxon)
Image from http://dx.doi.org/10.1128/ AEM.71.11.7292-7300.2005
…still not convinced?
O Lambert et al. Nature 466, 105-108 (2010) doi:10.1038/nature09067
Skull, mandible and tooth morphology of the holotype of L. melvillei MUSM 1676.
Leviathan melvillei
Bugger…
Livyatan melvillei
Two kinds of #fail
We don’t have a list of all names
Publications containing names often not accessible
Leviathan melvillei
Need more convincing?
Dark taxa
http://iphylo.blogspot.co.uk/2011/04/dark-taxa-genbank-in-post-taxonomic.html
Mammals in GenBank
Proper Linnaean names
Aus sp.
Mammals
Proper Linnaean names
Aus sp.
“Invertebrates”
BOLD
Is this a problem?
It’s the norm for Bacteria
Dark taxa will only increase in number
Roth v. Wikipeia
http://www.newyorker.com/online/blogs/books/2012/09/an-open-letter-to-wikipedia.html
Wikipedia says “no”
“I understand your point that the author is the greatest authority on their own work,” writes the Wikipedia Administrator—“but we require secondary sources.”
@quominus
http://quominus.org/archives/981
One of Wikipedia’s core principles, along with things like neutrality, is verifiability: a reader must be able to look at a statement in a Wikipedia article and find out where it comes from.
Taxonomic statements should be verifiable
Literature is the evidence base for taxonomy
Literature online
Museums, universities,and scientific societies
Digital archives
Commercialpublishers
http://iphylo.org/~rpage/itaxon
Animal names per decade
Data from http://www.organismnames.com
Names with a DOI
25%
BioStor (BHL)
©25%
@biostor_org
http://biostor.org
Online(DOI, BioStor, JSTOR,DSpace,PDF, …)
50%
Identifiers
Vast majority of names are in the legacy literature
Zootaxa and Zookeys
XML
My wish list…
Names linked to:
literaturespecimensgeographysequences
phylogeny…
BioNames
(real soon now…)
Computable Data Challenge