fems2006 madrid, spain symposium 20. biodiversity july 8, 2006 knowledge bleed, phenbank, and...
TRANSCRIPT
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Knowledge bleed,Phenbank, and NamesforLife
George M. Garrity, Catherine Lyons & James R. Cole
Michigan State University andNamesforLife, LLC
Funding for this research has been provide by the US Department of Energy, Grants No. DE-FG02-04ER63933 and DE-FG02-99ER62848, the
National Science Foundation Award No. DBI-0328255 and the Michigan University Commercialization Initiative (MUCI) program. Portions of this work are covered under US and foreign patents (pending) and are the intellectual property of the Michigan State University Board of
Trustees. For further information contact [email protected]
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
“…because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.”
Rumsfeld’s axiom and knowledge bleed
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The knowledge gradient
Unkno
wnun
know
ns
Know
n kn
owns
Basic and applied research advances
knowledge
Knowledge bleed results is a loss of
knowledge that has already been gained
Semantic resolution provides a mechanism to combat knowledge
bleed
Unkno
wnkn
owns
Know
n un
know
ns
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
We do quagmires
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
macleodii(T)
communis
Alteromonas
1972
vaga
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktis
Alteromonasmacleodii(T)
1972 1973
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubra
Alteromonas
1972 1973 1976
macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitrea
Alteromonas
1972 1973 1976 1977
macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitreaesperjianaundina
Alteromonas
1972 1973 1976 1977 1978
macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantia
Alteromonas
1972 1973 1976 1977 1978 1979
macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedai
Alteromonas
1972 1973 1976 1977 1978 1979 1981
macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceae
Alteromonas
1972 1973 1976 1977 1978 1979 1981 1982
macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceae
vagacommunis(T)
Marinomonas Alteromonas
commune
vagum
1972 1973 1976 1977 1978 1979 1981 1982 1984
multiglobiferum
japonicumminutium
biejerinckiimarismaris
hiroshimense
pelagicumpusillum
jannaschiikreigii
Oceanosprillum
mariswilliamsae
linum(T) macleodii(T)
Nomenclatural issuesHomotypic synonymyPriorityRule 37(a) 1
Data issuesOne to many relationship
Taxonomic issueWhich one is right?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedai
vaga benthicahanedai
Marinomonas Alteromonasputrifaciens(T)
Shewanella
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagum
Oceanosprillum
mariswilliamsae
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986
luteoviolaceae
communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantia
hanedailuteoviolaceaedenitrificans
vaga benthicahanedai
Marinomonas Alteromonas Shewanella
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagum
Oceanosprillum
mariswilliamsae
putrifaciens
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
vaga benthicahanedai
Marinomonas Alteromonas Shewanella
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagum
Oceanosprillum
mariswilliamsae
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988
colwelliana
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedai
Marinomonas Shewanella
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonis
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990
colwelliana
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
putrifaciens(T)communis(T)linum(T) macleodii(T)
Nomenclatural issueNon-type strains
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktis
putrifacienshanedai
denitrificans
rubracitreaesperjianaundinaaurantia
luteoviolaceae
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
putrifaciens(T)communis(T)linum(T) macleodii(T)
Nomenclatural issuesHeterotypic synonymy
Data issueMany to many relationship
Taxonomic issueWhich one is right?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktis
putrifacienshanedai
denitrificans
rubracitreaesperjianaundinaaurantia
luteoviolaceae
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
haloplanktishaloplanktis(T)
Pseudoalteromonas
undina
haloplanktistetradonis
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
Pseudoalteromonas
undinaantartica
elyakoviii
haloplanktistetradonis
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
Pseudoalteromonas
undinaantartica
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolacea
bacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolytica
haloplanktistetradonis
mediterannea
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
Pseudoalteromonas
undinaantartica
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolacea
bacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
japonica
haloplanktistetradonis
mediterannea
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001 2002
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
Pseudoalteromonas
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolaceajaponicadenitrificanslivingstonensisalleyanna
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubraundinaantarticabacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
haloplanktistetradonis
mediterannea
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001 2002 2004
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
Pseudoalteromonas
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolaceajaponicadenitrificanslivingstonensisalleyanna
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubraundinaantarticabacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
haloplanktistetradonis
12 others
mariniintestinasaireschlegelianagaetbuli
mediteranneaprimoryensis
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
stellipolarislitorea 5 others
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001 2002 2004 2005
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
Pseudoalteromonas
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolaceajaponicadenitrificanslivingstonensisalleyanna
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubraundinaantarticabacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
haloplanktistetradonis
14 others
mariniintestinasaireschlegelianagaetbuli
mediteranneaprimoryensis
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
stellipolarislitorea 8 others2 others
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvaga
haloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001 2002 2004 2005 2006
japonicumminutium
biejerinckiimarismaris
hiroshimensemultiglobiferumpelagicumpusillumcommune
jannaschiikreigiivagumbiejerinckii
pelagicummarishiroshimense
Oceanosprillum
mariswilliamsae
distinctafuliginea
Pseudoalteromonas
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolaceajaponicadenitrificanslivingstonensisalleyanna
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubraundinaantarticabacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
haloplanktistetradonis
14 others
mariniintestinasaireschlegelianagaetbuli
mediteranneaprimoryensis
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
stellipolarislitorea 13 others2 others
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The genus Alteromonas has undergone 18 “emendations” 21 species were added to the genus19 species were reassigned to four genera
3 of which are formed as new combinations of Alteromonas spp.
6 synonyms 2 species reduced to subspecies, then re-elevated to species
50 names, five genera, five families, and two classes but….only five validly published named species of Alteromonas
remain.
Since first being defined
This is not a very complicated example
But wait, there is still more
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Alteromonas
Alteromonadacea
Alteromonadales
Gammaproteobacteria
Alishewanella
Aestuariibacter
Ferrimonas
Colwellia
Idiomarina
Glaciecola
Marinobacterium
Marinobacter
Pseudoalteromonas
Microbulbifer
Incertae sedis
Psychromonas
Teredinibacter
ShewanellaThalassomonas
Ferrimonadacea
Idiomarinaceae
Moritella
Moritellaceae
Pseudoalteromonadaceae
Ferrimonas
Idiomarina
Pseudoalteromonas
Psychromonadaceae
Algicola
Psychromonas
Moritella
ShewanellaceaeShewanella
Incertae sedis
Teredinibacter
Agarvorans
Alishewanella
Marinobacterium
Marinobacter
MicrobulbiferSalinomonas
Colwelliaceae ColwelliaceaeThalassomonas
May 2004 November 2004
1 Family 16 genera -> 8 families 12 genera1 unclassified -> 7 unclassfied
Which is correct?Which is supported by the data?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Nomenclature (the end-user’s perspective)
Wouldn’t it be nice if…
Biological names were really usefulWould link to…
Relevant literatureSequencesOther phenotypic dataSources of strains in Biological Resource CentersAncillary materials
PatentsLaws and regulations
Regardless of where the data residesWithout having to know anything about
SynonymiesOrthographic variantsMisapplications of the name
How could this be accomplished?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Modeling names and taxa…
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Sequence+
Name+
Tax
on
Species+
Authority+
Strain+
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Name+
Tax
on
Literature Governing bodies
GenBankDDBJEMBLothers
CollectionsBRC
Species+
Authority+
Strain+ Sequence+
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Tax
on
Name+
Species+
Literature Governing bodies
GenBankDDBJEMBLothers
CollectionsBRC
Source+
Source+ProposalsSTM
Legal
Databases
PriorityValidity
SynonymyType
direct
indirect
BRC
Public Private
General
Authority+
Strain+ Feature+
GSC Core PhenotypeFAME
Biolog PA
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
However, rules are made to be broken…
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Strain+ Feature+
Name+
Species+
A properly formed species
Feature+
Name+
Species+
Candidatus or exemplar lost
Feature+
Environmental sequence
Strain+
Name+
Species+
Old type strain, not yet sequenced
Name+
Species+
Old type, exemplar based ondrawing or description
Feature+
“Name”+
Misidentified taxon
Strain*
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Name+
Tax
on
Strain+ Feature+
Species+
Name+ Name+
Strain+Strain+
Feature+Feature+T
axon T
axon
Homotypic synonymy Heterotypic synonymy
Differing opinions…
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The impact of “uncontrolled” labeling of environmental sequence and strain data …
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Non-types, clones, environmental sequences
Feature+
Environmental sequence
Feature+
“Name”+
Misidentified taxon
Strain*
ID+
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Top 25 labels on 16S rRNA sequences for type strains
0
200
400
600
800
1000
1200
1 3 4 5 A 6 7 B 8 10
Tanza
nia D C 9RB 11 14 12
I16 17 B2
n = 15232 unique sequences2.74X over defined
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The case of the Verrucomicrobia
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
“Identifiers” on Verrucomicrobia 16S rRNA sequences, n=911
0
100
200
300
400
500
600
uncultured Verrucomicrobia bacterium
uncultured Verrucomicrobium sp.
uncultured Xiphinematobacteriaceae bacterium
uncultured Verrucomicrobiales bacteriumuncultured Verrucomicrobiaceae bacterium
uncultured Spartobacteria bacterium
Prosthecobacter dejongeii
uncultured Verrucomicrobia subdivision 3 bacterium
Verrucomicrobium spinosumProsthecobacter fusiformis
Prosthecobacter vanneervenii
Opitutus terrae
Prosthecobacter debontiiuncultured Opitutus sp.
Chthoniobacter flavus
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Publication field from Genbank record, n=627
0
10
20
30
40
50
60
70
80
90
UnpublishedAppl. Environ.
Int. J. Syst. Evol.FEMS Microbiol.Appl. Environ.
Science 308 (5721),
Appl. Environ.Appl. Environ.Appl. Environ.Unpublished
Appl. Environ.Int. J. Syst. Evol.
Environ. Microbiol. 4Soil Biol. Biochem.
Proc. Natl. Acad.Int. J. Syst. Evol.FEMS Microbiol.Appl. Environ.Appl. Environ.
Microbiology 141,J. Ind. Microbiol. 17,
Int. J. Syst. Evol.FEMS Microbiol.Environ. Microbiol.
Biotechnol. Alia 8, 1-
Appl. Environ.Appl. Environ.Appl. Environ.Appl. Environ.
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Verrucomicrobia, based on annotation (n=444)
Proteobacteria
Verrucomicrobia
Victivalalles &Lentisphaeralles
Optitutus
Xiphinematobact
Unclassified
Unclassified
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Taxonomic structure of the Verrucomicrobia revealed
Lentisphaera
Akkermansia
Verrucomicrobium
Prosthecobacter
Rubritalea
Verrucomicrobium
Xiphenematobact
Chthoniobacter
Verrucomicrobium
Optitutus
Unclassified
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Accessing the NamesforLife information objects
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
How NamesforLife disambiguates biological nomenclature
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The underlying concepts
Persistent identifiers
A name or an identifier for a resource that uniquely identifies that resource and will be forever associated with that resource. It will never be reassigned to any other resource and will not change regardless of where the resource is located or whatever protocol is used to access it.
Use of a well managed persistent identifier rather than a location will ensure that when a document is moved, or its ownership changes, the links to it will remain actionable.
From: Diana Dack. 2001. Persistence is a Virtue Information Online Conference, Sydney.
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The underlying concepts (cont.)
Semantic resolution
The process of identifying the precise meaning of terms or concepts and mapping them into different classifications.
Static conceptsUnaffected by new knowledge
Dynamic conceptsAffected by new knowledge
What’s so important about precise meaning in scientific, technical, or medical fields?
…in commerce?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The underlying concepts (cont.)
Name resolution The process of mapping a persistent identifier to a URL that retrieves a resource. The URL locates the named resource identified by the persistent identifier (the name).
PID URLPID1
PID2
PID3
URL1
URL2
URL3
Resource
Identifies LocatesName resolution
Adapted from: Name Resolution Service: Introduction and Use, Harvard University Library
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The underlying concepts (cont.)
Handle The Handle system is a comprehensive system for assigning, managing, and resolving persistent identifiers, known as ‘handles’ for digital objects and other sources on the Internet. Handles can be used as Uniform Resource Names (URNs).
Digital object identifiers (DOIs)
It is implicit in the Handle design that a digital object has associated metadata (data about data; here: data about the digital object). The core piece of metadata is the Handle itself.
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
DOIdirectory
URLURL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
Content
Content
Assigner
DOIdirectory
DOIdirectory
DOIDOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
doi>doi>doi>
Courtesy of Norman Paskin,International DOI Foundation
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
NamesforLife
A novel technologyAn ontology, metadata model, and a mappingA transparent information layer on the InternetAn application of persistent identifiers
A semantic resolution service for the life sciences
What is it?
What is the purpose? Solve a well known problem
Ambiguity in terminology
Common problem
Pervasive in life sciences
The special case of biological nomenclature
Queries and literature searches
Assertions, assumptions, hypotheses
A content providerWhat isn’t it?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Why DOIs are the preferred GUID
Digital object identifiersStrengths - opaque, actionable, require metadata, identify an object, strong governance, widespread usage, not based on DNS, guarantee of persistence, proposed ISO standard.Weakness - Not free
DOIs
Proven technologyDOIs are layered on top of CNRI’s Handle serverScalable
Widespread use in publishing industry (CrossRef)> 1500 publishers and >1000 libraries subscribing> 22M DOIs assigned> 11M click-throughs (2/15/2006 - 3/15/2006)
Well understood technologyStrong social/legal framework to ensure persistence
Technically robust
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Two componentsA transparent information layer to provide DOI services to the life science communityAn ontology with a schema that produces metadata consistent with requirements of the
International DOI FoundationSeven first class object types
Name, Taxon, Exemplar, Nomos, Practitioner, Feature, Nomenclatural Code
N4L architecture
Name objectTaxon object
HigherTaxon object
Exemplar object
Name DOINameName statusAuthoritySynonymsTaxon DOI
Taxon DOINameRankParent nameParent taxon DOIMethodologyType exemplar DOI
Taxon DOINameRankParent nameParent taxon DOIMethodologyMembers Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name Taxon DOI Name
Exemplar DOIBiodeposit FeatureBiodeposit FeatureTaxon DOISpecies name
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Two components (continued)
The prototype
DOI:10.1601/tx.0
A proof-of-principle application
24,176 first-class objects
Track changes in concepts over time
Based on a nomenclatural taxonomy, but capable of supporting multiple taxonomic views and “time travel”
Initial DOI services conform to AP 0
Released January 17, 2006
Japanese prototype released June 21, 2006
Chinese version under development
Arabic version under consideration
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Easy support of foreign languages
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The mini-monograph
Name DOINameName statusAuthoritySynonymsTaxon DOI
Taxon DOINameRankParent nameParent taxon DOIMethodologyType exemplar DOINontype exemplar DOI
Exemplar DOIBiodeposit FeatureBiodeposit FeatureTaxon DOISpecies name
Preamble Name/Name DOI Name status, Authority Synonyms/Name DOI Member of: Parent Taxon DOI MethodologyType Exemplar DOI Biodeposit+ Feature+ Paired Sequences Genomic Paired phenotypic data Minimal description GSC Core description ImagesNontype exemplar Biodeposit+ Feature+ Paired Sequences Genomic Paired phenotypic data Minimal description GSC Core description ImagesReference DOIs
IJSEM/ICSPTaxonomic authorities
BRCs & CollectionsGenbank/EMBL/DDBJTaxonomic communityGenomics community
Instrument vendorsDatabase providers
Publishers
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
“Test ideas in the marketplace. You learn from hearing a range of perspectives. Consultation helps engender the support decisions need to be successfully implemented.”
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Two components (continued)
Member of the International DOI FoundationSelf-supporting model
Four target groupsEnd-users - access to N4L objects as a DOI service
at no-chargePublishers - hosting N4L enabled content in which
each name becomes actionable. Literature could be traversed based on named entities (organisms, genes, etc).
Database providers, instrument vendors, BRCs rely on curated information for their own businesses
Registrants who wish to provide data or services that are not readily available to the broader community
Service for registration of “not-yet-cultivated” taxa and environmental clones
N4L business
We are soliciting input from the communityas well as potential collaborators and “clients”
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Synergistic activities
Goals of NamesforLife
Collaborators/PartnersATCC - nomenclature support, conduit to new customers for existing goods and services, opportunity for new services (pay-for-view data)
Forsyth Research Institute - further testing and refinement of the N4L model, creation of mini-monographs, extension to nyc and uncultivated species
Midi Inc. - Integration of N4L services with instrument output, pay-for-view data
SGM/ICSP/JC - optimize N4L DOI imbedding, custom web tag libraries for on-the-fly updating of content
Synchronize usage of nomenclature in databases and elsewhere
Establish links between vertically integrated business
Help build new relationshipsStimulate new business opportunitiesBuild useful new tools and servicesBecome a self-supporting service for the community
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Embedding N4L links into web content
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
PhenBank…
The federated database
Associate phenotypic data with emerging 16S sequence data
Potential value to the communityProblems
TechnicalInteroperability and data comparabilityVariable granularityLack of controlled vocabulary
Social issues of the centralized modelWho controls access?Who curates?Who pays?Incentives for participants?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
A decade later Maturation of the 16S data Impact of emerging technologies
Large-scale sequencing effortsWealth of new tools
Predictive modelsOntology developmentPhenotypic arrays
Rapid emergence of web technologiesImpact on traditional publishingXML, DOIs Semantic technology
Impact of emerging social trendsCommunity annotation, social taggingOpen access and supplementary dataIncentives for data sharing
Distributed model, data discovery“Pay-for-view” vs. regeneration
PhenBank…
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Sequencing the type strains
1231480435195329720311188625250
Cumulative genomes sequenced
4271284818981266844563376250Genomes/$5M
1171175626343951592688891333320000Cost/Genome*
3417227815191013675450300200BP per dollar
20142013201220112010200920082007Year
Selection of Targets
TypeCulture Material
JGISequencing
RapidAnnotation(24 Hours)
MetabolicReconstruction
ModelGeneration
PhenotypePrediction
DatabaseRepository
Source – Rick Stevens, Argonne National Laboratory and University of Chicago
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Thanks toMSUJulia BellScott HarrisonJudy Leventhal
Donna McGarrellSiddique MohideenQiong Wang
Univ. MichiganPaul Saxman
Forsyth Inst.Floyd Dewhirst
Midi Inc.Myron SasserGary Jackoway
ATCCTim Lilburn
DSMZBrian Tindall
University of ToulouseJean Euzéby
IDFNorman Paskin
NexusTechVenturesTodd Zahn
NIES JapanJunko ShimuraKaduo Hiraki
Soc. General MicrobiologyRobin DunfordRon Fraser
Int. Com. System. Prok.Aharon OrenHans Juergen Busse
IJSEMPeter Kaempfer
Argonne National LabRick Stevens
FundingUS DOE BER NSFMich Univ. Comm. Init.
Questions?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
AcknowledgementsMSUJim ColeDonna McGarrellAKS MohideenQiong WangScott HarrisonConnie WilliamsJudy LeventhalJulia BellDenise Searles
VPGR and OIPPaul M. HuntLorraine Hudson
ATCCTim Lilburn
DSMZBrian Tindall
University of ToulouseJean Euzaby
IDFNorman Paskin
FundingUS Department of Energy
Office of ScienceNational Science FoundationMichigan University
Commercialization Initiative
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
“I would not say that the future is necessarily less predictable than the past. I think the past was not predictable when it started."
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Insert statement of problem here
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
"I believe what I said yesterday. I don't know what I said, but I know what I think, and, well, I assume it's what I said."
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
“Simply because you do not have evidence that something does exist does not mean that you have evidence that it doesn't exist."
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
"Learn to say 'I don't know.' If used when appropriate, it will be often."
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
"If I know the answer I'll tell you the answer, and if I don't, I'll just respond, cleverly."
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
“I don't do quagmires.”
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Stan Falkow’s observation
“Given a choice, most taxonomists would rather wear each other’s underwear than use each other’s names”
Why is this so?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Consider the followingA simple exchange between a customer and
vendor of “technical” productsDiscussions among physicians, healthcare
providers, and insurers
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The name/taxon disjunction
ImpactAccumulation of dubious names in literature/databases
Effects assertions of:Identity, commonality of pathways, common ancestry, homology, parology, xenology
Legal consequences
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Problems in print publishing
Key requirementProposals and emendations must appear in print
Code specificProkaryotic Code
Effective, legitimate, and valid
Registration
Taxonomies are retrospectiveCan only cite earlier publicationsCannot cite future emendationsIncreasingly based on molecular sequence data
Deposit of sequence data in public databases
Not conveniently referenced in print
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Problems with electronic publishingNo formal publishing mechanisms
Does not fulfill fundamental requirement of the Code(s)
Lack bibliographic informationNot citableNot persistent
Subject to uncontrolled changeMay disappear
Link rot404 Link not found
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Problems in nomenclature
Marking territoryPersonal achievement
Systematic biologists
Everyone else(aka end-users)
Unfamiliar with literatureUnique aspects
Unaware of Codes of NomenclatureLegalistic framework
Formation and assignment of namesCircumscription and emendation of taxaPriority and citationSynonymy and homonymyCorrection of orthographic errorsAdjudication of nomenclatural disputes
ButDo not govern classification or identification
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Primary entry point into STM literature and databases
Prominent role in laws/regulations
Poor identifiersFixed in time and scopeMay not be revisedSynonymies generally not addressPersist, but
obsolesce in relation to taxonAn archival record of a taxonomic definition for a single point in time
Problems in nomenclature (cont.)
Systematic biologists
But…
What are the alternatives?
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Summary of identifier properties
OpaqueGovernancePersistentRegistrationMetadataAccepted standardGlobalWidespread useObjectActionableUniqueInteroperableAccession numbers - - V - V - - + + - -XRI - - - ? - - + - - + +LSID - - ? - V ? V ? - + +Gene names V - - - - + - + + - -PURL/POI - - - - + ? - - + + +Taxid + - - - + - - ? + V +DNS - + - + - + + + - + +Taxonomic names - + + v - + + + + - -OpenURL - + + + + + - + - + +Handle + - + + + - + ? + + +DOI + + + + + + + + + + +
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
The Digital ObjectIdentifier System
The DOI - Handle relationship
Handle System is one component of the DOI SystemGlobal name serviceSecure name resolution over the
Internet and Grid
DOI System uses the Handle System as part of a value-added applicationDOIs provide persistent, semantically interoperable identification of IP resourcesThe DOI system provides a ready to use
Numbering syntaxResolution serviceData modelPolicies and procedures for implementation
Expanded technical infrastructure and features specific to DOI applications
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Persistence
The Digital ObjectIdentifier System
The IDF extends the technical infrastructure of the Handle System by provides a social infrastructure guaranteeing persistenceFunction of organizations, not technologyFederation of Registration Agencies
IDF policies ensure DOIs “live” even if RAs failRAs provide the process of DOI transfer
IDF is persistent as it is self-fundingDOI System is backed by several major public companies, multiple RAs, and a large customer base
Persistence is not requiredNo appropriate social structure is
provided
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Consistency
The Digital ObjectIdentifier System
Adds consistent rules for multiple applicationsIDF set rules for DOI assignment
What DOIs can be applied toRestrictions on arbitrary/temporary assignmentRestrictions on removal
Management by a Directory Manager to enforce QCDOI API defines consistent way of accessing and managing DOI applications and servicesConsistent use of DOI prefix and numbering syntax provides numbering interoperability in the IP sector, brand recognition,
understanding of what a DOI conceptOptimal data model provides semantic consistency for true interoperability
Ensures interoperability for resolution purposes across Handle System implementations
No requirements for interoperability at the application level
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Ease of use
The Digital ObjectIdentifier System
Turn-key applicationIDF and RAs maintain technical support staff Interacts with users, standards community and othersResolve problems of RAs and broader user communityUnderwrites cost of directory manager
Support to RAsGuidance, troubleshooting, etc
DOI HandbookPolicies and procedures for various actorsGuidelines for RAs, developersDeveloped by federation of DOI agencies, guaranteed by detailed legal agreements.
No ongoing technical supportHandle server must be installed and managed by
local technical staffFree, but not without real costs
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Expressing relationships
The Digital ObjectIdentifier System
Provides framework to achieve practical application of multiple resolutionApplication of Handle System that adds the necessary constraintsConstraints provided by metadata, which defines the entities (data
dictionary approach) and expresses the relationships.
Provides support for multiple resolution
Parent-child relationships
Other relationships
No preexisting constraints to make useful relationships
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Technical infrastructure
The Digital ObjectIdentifier System
Adds dedicated and improved technical infrastructureReplication servers for RAs, secondary sites, mirror servers, proxy servers all housed in a secure commercial hosting facility
More robust and scalable databaseDOI Directory Manager to provide technical oversight and evolutionary growth
Provides a shared resolution service Global root servers, local Handle servers, clients, proxy serversScalable and interoperableLicense provides a reference implementation but the database does not scale above a few million handles
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Semantic interoperability
The Digital ObjectIdentifier System
Adds semantic interoperability across application space
Feature of advanced DOI applicationsProvides metadata kernel to specify entity identified by DOI
Optional tool to map existing schema through a structured ontology
Ensures DOI can be the key in building multi component media objects or managing multiple assets
Data dictionary and application frameworkEnsures that DOIs act predictably in applications with defined series
IDF maintains indecs data dictionary and will likely maintain MPEG-21 data dictionary
No requirements as to what is being identified
No assurance of semantic interoperability across resources
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Development activities
The Digital ObjectIdentifier System
Adds to this resource for active development of DOI applications and advanced features
Working groups and technical support staffUse of DOIs in commercial settings
RAs have an incentive to allocate their own resources to develop new features, collaborate with other RAs and share with the wider DOI community
Provides upgrades of the global general-purpose naming system
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Costs to replicate a comparable system
The Digital ObjectIdentifier System
Preceding features are part of a turn-key system
RAs provide value added services to their clients
IDF holds production Handle license with right to sublicense
Cost of DOI assignmentVary across RAs and depend on their business model Can be free as part of a service offering
Need to add all preceding features not included in the general purpose software
Cost of a production Handle licenseOther licenses to enabling technologies
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Governance
The Digital ObjectIdentifier System
Independent not-for-profit organizationCNRI provides services under commercial agreement Elected board and nominated working groupsOpen membership
NamesforLife, LLC is a general member
Independent of IDFHandle System Advisory Committee
Major users and interested partiesIDF is a member
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Relevance of names in content
Current web is designed for human-human communication
Future web will rely on machines for information gathering, filtering, and knowledge discovery
Need for semantic metadata for machine discovery and reasoning
Information retrieval
Keeping pace with the field
Volume of content produced annually*Books - 8 TbyteJournals - 2 Tbyte
Semantic technologiesN4L provides persistent semantic disambiguationRobust and economical
There is a pressing need for automation
*Scherf, et al., (2005) Brief Bioinform. 6: 287
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
<p><b>VIRTUALLY all microorganisms . . .
. . . We report here the first structure determination, to our knowledge, of
the siderophores from an open-ocean bacterium, alterobactin A and B from
<em>Alteromonas luteoviolacea</em>.
<em>A. luteoviolacea</em> is found in oligotrophic<sup>10</sup> and
coastal<sup>11</sup> waters. Alterobactin A has an exceptionally high
affinity constant for ferric ion. We suggest that at least some marine
microorganisms may have developed higher-affinity iron chelators as part of
an efficient iron uptake mechanism which is more effective than that of their
terrestrial counterparts.</b>
</p>
Find Organism Name in Code
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
<p><b>VIRTUALLY all microorganisms . . .
. . . We report here the first structure determination, to our knowledge, of the
siderophores from an open-ocean bacterium, alterobactin A and B from
<n4l:checkupdate doi=“10.1601/nm.2821”>
<em>Alteromonas luteoviolacea</em></n4l:checkupdate>.
<em>A. luteoviolacea</em> is found in oligotrophic<sup>10</sup> and
coastal<sup>11</sup> waters. Alterobactin A has an exceptionally high affinity
constant for ferric ion. We suggest that at least some marine microorganisms may
have developed higher-affinity iron chelators as part of an efficient iron uptake
mechanism which is more effective than that of their terrestrial counterparts.</b>
</p>
Add N4L Tag Around Each Name to be Tracked
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Comparing identifiersA label that identifies an entity
ISBN 0-387-98771-1ATCC 27126L-681,572
A single unambiguous string
A method of providing consistent syntax to denote a class membership of an entity.A formal standard or industry convention
ISBN numbers follow an international industry conventionAn arbitrary internal system
Collection accession numbers and sample tracking numbers are typically institution specific Establishes a 1:1 correspondence between labels and membersEnumeration
The number or label is simply a string
A numbering scheme
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Comparing identifiers (cont.)
A syntax by which an identifier can be expressed in a form suitable for use within a specific infrastructure.
Actionable identifiersURI (URN and URL)ISBN numbers as UPC/EAN identifiers
Does not mandate a method of creating labelsDoes not create a managed environment
An infrastructure specification
Includes Unique identifiers
A formalized infrastructureManagement policies for registration, structured
interoperable metadata, policy, and governance mechanisms.
ExamplesUPC/EAN barcodes and RFID tagsDigital object identifiers (digital identifiers of
objects)
A fully implementedidentifier system
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Globally unique identifiers (GUIDs)
Archival resource keysStrengths - opaque, require metadata, identify an objectWeakness - weak governance, not in widespread use, based on DNS, no guarantee of persistence, not a standard
ARKs
Life science identifiersStrengths - opaque, identify an object, actionableWeakness - no governance, not in widespread use, based on DNS, no metadata requirement, no guarantee of persistence, not a standard
LSID
Persistent URL/Persistent Object IdentifierStrengths - opaque, identify an object, actionable, require metatdataWeakness - no governance, not in widespread use, based on DNS, no guarantee of persistence, not a standard
PURL/POI
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Timeline for next phase of N4L development
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
1.11.21.31.41.51.62.12.22.32.4
3.4
2.53.13.23.3
Database migration
Incorporate partner data
Tag libraries
Upgrade web hub
Automate name captureOntology & schema development
Deploy fault tolerant systemDeploy to broader publications
Determine FRSChange message system
Client-side automated message processing
PDF tools
Partner content linksLong-tail storefront
New targets for N4L implementation
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Ramifications of misunderstanding a name or label
Wrong assumptions, assertions, or hypotheses Misdiagnosis of infectious diseasesMisapplication of public policies
Highly significant
Significant Lost opportunities
Failure to reach potential customers potentially interested in marketed content, goods, and services at point of need.
The long-tail phenomenon*
Names trigger specificresponses
But, the concepts to which names apply are not static
May not always map 1:1
May require expertise for accurate interpretation
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
NamesforLife
Leverages recent development in persistent identifier technology
Unique mechanismOccurrence of a name triggers contextually appropriate retrieval services Uses embedded N4L-DOIs for content discovery
Provides semantic enablement of existing content at minimal cost to data and content providers
The solution
Through A unique combination ofIdentifiersPersistenceResolution
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Competing activities
Technologies
Initiatives TDWG/GBIFFocus on museums and herbariaNot supported by nomenclatural bodiesAdvocating use of LSIDAttempting to duplicate the IDF/RA infrastructure to avoid cost of DOI
LSID - lack governance, persistence, limited usage
Use covered in PCT/US 2005/001688Can use N4L to resolve DOI <-> LSID
CrossRef - bibliographic service, complementary to N4L
TIB - DOIs on earth science data setsNot directly citable, only on supplementary dataNo semantic metadata (Dublin Core)
Semantic web - bio-ontology initiativesNCBO, OBO, HCLS, GO, MGEDFocus on gene and genome annotationNot designed for automated reasoning
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Interactivity and navigation, heatmaps as a GUI
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Accessing the NamesforLife information objects
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Accessing the NamesforLife information objects
FEMS2006 Madrid, SpainSymposium 20. Biodiversity
July 8, 2006
Accessing the NamesforLife information objects