biodiversity information standards: are we going wrong, or just not quite right?

Post on 24-Feb-2016

56 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Biodiversity Information Standards: are we going wrong, or just not quite right?. Jim Croft Australian National Herbarium. Australian National Herbarium Centre for Plant Biodiversity Research Australian National Botanic Gardens Parks Australia Taxonomy Research and Information Network - PowerPoint PPT Presentation

TRANSCRIPT

Biodiversity Information Standards: are we going wrong, or just not quite

right?

Jim CroftAustralian National Herbarium

Australian National Herbarium

Centre for Plant Biodiversity Research

Australian National Botanic Gardens

Parks Australia

Taxonomy Research and Information Network

Parks Australia

Department of the Environment, Water, Heritage and the Arts

TDWG IN AUSTRALIA

Hobart

Devonport

Launceston

Adelaide

Perth

Melbourne

Hobart

Launceston

Townsville

Devonport

Armidale

Darwin

BrisbaneLismore

Orange

SydneyCanberra

Adelaide

Perth

INSTITUTIONS – Northern TerritoryDarwin

Maroochydore

Gosford

v Australian National Insect Collection (CSIRO)v Australian National Herbarium (CSIRO)v Australian National Wildlife Collection (CSIRO)v GAUBA Herbariumv Australian Biological Resources Study

INSTITUTIONS – Queensland

TDWG in Australia

Alice Springs

Australian examples

• Australian Plant Name Index– Australian Plant Census

• Australian Fauna Directory• Australia’s Virtual Herbarium• Online Zoological Catalogue of Australian Museums

• Flora of Australia On-line• Atlas of Living Australia• Identify Life• Taxonomy Research and Information Network

Australian examples

• Australian Plant Name Index– Australian Plant Census

• Australian Fauna Directory• Australia’s Virtual Herbarium• Online Zoological Catalogue of Australian Museums

• Flora of Australia On-line• Atlas of Living Australia• Identify Life• Taxonomy Research and Information Network

HISCOM

• Herbarium Information Systems Committee– Representatives at TDWG 2008

– Ben Richardson, Alex Chapman (PERTH)– Bill Barker (AD)– Alison Vaughan (MEL)– Karen Wilson (NSW)– Donna Lewis (DNA)– Jerry Cooper (CHR, NZ)– Helen Thompson (ABRS)– Greg Whitbread, Jim Croft (CANB)

– The crucible of biodiversity informatics creativity

TDWG principle # 0

• A good idea has a thousand fathers

• A bad one is a bastard

TDWG: making anarchy chaos the standard

TDWG principle # VI-a

“Before the beginning of great brilliance, there must be chaos.

Before a brilliant person begins something great, they must look foolish in the crowd.”

- I Ching

TDWG: the art of herding cats

TDWG: changing standards, or making change the standard?

TDWG: Standardizing stuff...

orstuffing standards?

Outline

• What is TDWG?• TDWG and ‘Standards’• Where TDWG Standards are needed• Some TDWG projects• TDWG Standards compliance• Tensions for TDWG• Future

WHAT IS TDWG?

TDWG Mission

• Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms

• Promote the use of standards through the most appropriate and effective means and

• Act as a forum for discussion through holding meetings and through publications

TDWG Mission

• Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms

• Promote the use of standards through the most appropriate and effective means and

• Act as a forum for discussion through holding meetings and through publications

Who are we?

‘TDWG is us’

Who are we?

• Intersection of specimens, taxonomy, knowledge, information management

• Biologists, taxonomists, computer scientists– Each with an interest in the other’s domains

– Each with something to offer each other’s domains

Who are we?

• If TDWG did not exist, we would have to invent it

• Successful– Enduring– Popular– Moderately well recognized

When are we?

• Phases of TDWG– Phase 0 (1985)

• seemed like a good idea at the time– phase 1 (first decade)

• Data dictionaries, data models– phase 2 (second decade)

• E-R models, DIGIR, DwC, XML, etc.– phase 3 (nowish)

• Schemas, ontologies, RDF– Phase 4 (?)

• ?

Why are we?

• Collaboration and sharing is essential– Taxonomy has become too big– Too diverse– Too complex– No one person can do it all– A ‘complete’ treatment requires collaboration

– Collaboration requires consistency, standards

** notes **Biodiversity Tower of Babel

Why are we?

• Untangle the ‘biodiversity Babel’

• Develop common communication

• Harness efficiency of collaboration

• Economic pressures to reduce duplication

Why are we?

• Science of information meets science of information technology

• Take advantage of new technology

• Taxonomy needs to be seen to be evolving

• “Business as usual is not an option”

Why are we?

• An annual excuse to meet in warm places when it is cold elsewhere?

Where do we fit?

xkcd.com

taxonomistscomputerists

TDWGinformaticists

Where have we come from?

• Frustrated taxonomists– Looking for a better way– Largely self taught

• Bored computer scientists– Looking for excitement, challenge

• Misfits and visionaries– In search of a ‘Brave New World’

• Egomaniacs– In search of glory, fame, power, riches

What are we now?

• Frustrated taxonomists– Looking for a better way– Largely self taught

• Bored computer scientists– Looking for excitement, challenge

• Misfits and visionaries– In search of a ‘Brave New World’

• Egomaniacs– In search of glory, fame, power, riches

Where are we going?

?

Where are we going?

• Did we go wrong?– Where did we go wrong?– Why did we go wrong?

• Lost the plot?– Regain credibility?

• Our community?• Our funders?• Ourselves?

Where are we going?

• Perceptions of TDWG?– First decade

• Taxonomists organizing their domain• Content focused• Understandable by taxonomists

– Second decade• Taxonomists reaching limitations• Engaging technologists• Protocol and systems focussed• Opaque to taxonomists

– Third decade?

Where are we going?

• Perceptions of TDWG?– First decade

• Content• Data dictionaries• Lists, vocabularies

– Second decade• Protocols• Formats, structure • Applications

– Third decade?• Ontologies?• Semantics?

Where are we going?

• What should TDWG be about?

– The data?

– The technology?

– The applications?

– The community?

TDWG Impediments

• Resources, funds• Time• Impetus, will, drive• Complexity, domain knowledge• Conservatism• Rivalry• Intellectual property, revenue advantage

THE TDWG VISION

A vision for TDWG

• Our domain in biodiversity?– Taxonomy?– Systematics?– Collections?– Biodiversity?– Publications?– Knowledge Management?– Knowledge discovery?

– All of the above?

A vision for TDWG

• Our Community?– Herbaria and museums?– Researchers?– Government and policy?– Conservation agencies? NGOs?– Natural resource management?– Education?– Public?

– All of the above?

A vision for TDWG

• Our questions?– What is it? How can I find out?– What does it look like?– Where does it occur?– Was it still there? When?– What occurs there with it?– What might occur there with it?– What is it related to?– Who says so?– How? Why?

– All of the above?

A vision for TDWG

• Our Products?– Data content standards?– Data storage standards?– Data communications protocols?– Data management applications?– Data management infrastructure?– Data visualization applications?– Data analysis applications?

– All of the above?

Knowledge pyramid

The Real World

DataInformation

Knowledge

Samples

Wisdom

TDWG AND STANDARDS

What is a standard?

• In common English:– A flag– An upright pole or beam– A backing for currency– American automobile– A bush on a long stalk– An ideal to be judged against– Model of authority or excellence– A basis for comparison– 1,980 board feet of wood– A newspaper– An established norm

What is a standard?

• Rarely implies:– Requirement– Obligation– Compulsion– Compliance– ‘The law’

• But not so ‘technical standards’– Specify behaviour– Mandate behaviour

What is a standard?

• “an explicit set of requirements to be satisfied by a material, product, or service”

- (ATSM International)

TDWG STANDARDS

TDWG Standards categories

• Technical specification (TS) (3)– Protocol, service, procedure, format

• Applicability statement (AS) (1 draft)– How a tech. spec. might be applied

• Best current practice (BCP) (0)– A description of good behaviour

• Data standard (DS) (0)– Content or controlled vocabularies

TDWG Standards status

• Current standard– (3)

• Current 2005 Standard– (3?)

• Draft Standard– (3)

• Prior Standard– (7 tech specs; 6 data standards)

• Retired Standard– (0)

THE STANDARDS PROCESS

ISO Standards process

• ISO standards are:

– Consensus– Industry wide– Voluntary

ISO Standards process

• 0 preliminary– Study period underway

• 1 proposal– New project under consideration

• 2 preparatory– Working draft(s) under consideration

• 3 committee– Committee draft(s) under consideration

• 4 approval– Final draft standard under consideration

• 5 publication– Standard prepared for publication

TDWG Standards process

• TDWG standards are:

– Consensus– Community wide (+/-)– Voluntary

TDWG Standards Process

TDWG STANDARDS PRESENT

TDWG standards – present

• ABCD– Access to biological collections data

• SDD– Structured Descriptive Data

• TCS– Taxon Concept Schema

Not bad for 22 years work...

TDWG STANDARDS PAST

TDWG standards - past

• ‘Prior Standards’

• Technical Specs (protocol stds):– HISPID 3 (now on v.5)– POSS (Plant Occurrence and Status)– Economic Botany Data Collection Std– Plant Names in Botanical Databases– XDF – language for definition and exchange– ITF – Botanic Gardens Records– DELTA

TDWG standards - past

• ‘Prior Standards’

• Data standards (Content stds)– Authors of Plant Names– World Geographic Scheme for Plant Distributions

– Botanico Periodicum Huntianum– Index Herbariorum– Floristic Regions of the World– TL2 – Taxonomic Literature and suppl.

TDWG STANDARDS FUTURE

TDWG standards – future

• ‘Draft standards’– Real soon now

• Standards documentation spec.– The standard way to do standards

• LSID Applicability Statement– How to do LSIDs

• NCD– Natural Collections Description

TDWG standards – future

• Watch this space?

• Observation data– Occurrence without specimens?– Ecological metadata language

• Phylogenetics data– Phylogeny repositories– Trees of life– Phylocode

TDWG standards – future

• Watch this space?

• SPM – Species Profile Model– Online Journals; On-line Floras– Interactive Keys

• Images and multimedia

• Ethnobotany ontology

TDWG standards – future

• How are we going to manage this?– Activities straddle many standards– Potential for duplication, conflict

• Technical Architecture Group– Ontologies– Vocabularies– Conflict identification, resolution– Evaluation, advice, recommendations

WHERE TDWG STANDARDS ARE NEEDED

Where are TDWG standards needed?

• Nomenclature• Taxonomy• Bibliographic• Specimens• Identification• Description• Images• Multimedia

• Occurrence• Spatial• Observation• Molecular• Phylogeny• People• Institutions• etc.

Where are TDWG standards needed?

• The problem:

• TDWG activities have been activity and discipline based– ABCD as an example

• Names, taxa, specimens, places, people, etc.

• Need to look at data from an ontological perspective– Data based

• Not activity based

TDWG – the 3-legged stool

• (definition of ‘stool’?)

• GUIDs• Ontologies• Exchange protocols

TDWG – the 3-legged stool

• Management cliche

• Planning• Money• Management

---• Production• Marketing• Administration

---• etc

TDWG – the 3-legged stool

TDWG STANDARDS COMPLIANCE

TDWG standards compliance

• Pretty poor– Within institutions / projects– Between institutions / projects

• Partial compliance is not compliance

• Enhancement is not compliance

• Extension is not compliance

TDWG standards compliance

• Why not?– Too complicated?– Inappropriate?– Deficient?– Too costly to implement?

– Conservatism?– Apathy?– Individual arrogance?– Institutional arrogance?

TDWG standards compliance

• Need for stability

• TDWG has a reputation– Pursuing the ‘bleeding edge’– “Keeping up with the Jones’s”– Introducing new recommendations before old ones settled

– Frustrating users• Especially smaller institutions

TDWG standards compliance

• Total cost of ownership– Ultra technical solutions

• Rare specialist skills• Expensive contractors

– Maintenance costs– Upgrade costs– Migration costs

– Users get stuck

TDWG standards compliance

• What can be done?– Rationalization of standards?– More control of standards process?– Seek ‘appropriate technology’?

• Not necessarily the best– Seek cheaper solutions?– Focus on the ontologies, not activities?

– Apply institutional pressure?– Institutional mentorship and support?

THE TENSIONS FOR TDWG

Tensions in TDWG

• Taxonomy / technology• Innovation / stability• Innovation / conservatism• Names / taxonomy• Names / specimens• Names / names• Authority / credit• Ownership / responsibility• Data / metadata

Why not?

• Why not web 2.0 / 3.0?

• Why not annotations?

• Why not Wikipedia?

• Why not microformatting?

Disconnects

• Free access / ownership– Licensing, attribution, IP, credit

• Taxonomy / specimens– The big lie

• Concepts / names– Another big lie

• Linking taxa through basionyms– Another big lie

• Data / metadata• Distributed systems vs cache

Metadata

• So-called ‘data about data’

• “One man’s data is another’s metadata”

• Not a good or inspiring look

• Need a common and agreed understanding in TDWG domain

Metadata

• Problem of LSID byte persistence– Applies to data– Does not apply to metadata– Redefine data as metadata?– Sophistry?– Distorting our ontologies?

• Need to sort this out• Need to communicate the result

Metadata

Yesterday upon the stairMetadata wasn't thereIt wasn't there again todayHow I wish it would go away

The 3 big lies

• Names and specimens– That there is some real connection between specimens bearing the same name

– That distribution maps of specimens bearing the same name are meaningful

– That identifications bearing the same name represent the same taxon

– The ‘taxon concept problem’– Concept not explicit

The 3 big lies

• Names and concepts– That names somehow imply an unambiguous taxon concept

– That a taxon concept can be inferred from a name

– An assumption

– The ‘taxon concept problem’– Concept not explicit

The 3 big lies

• Names and types– That if we are talking about names based on the same type they are the same taxon concept

– That lists of names and synonyms based on the same type can be automatically merged

– The ‘taxon concept problem’– Concept not explicit

The 3 big lies

• What can we do?– Taxon reporting not unambiguous– Our results are at best indicative

• Users assume or infer concepts– Perhaps biggest problem in taxonomy and biodiversity informatics

– Be absolutely rigorous in talking about names and named concepts

– Educate taxonomists– Educate clients

• Limitations of data, applications• Implications of using data, limitations

TDWG value for money

• Are we worth it?– This meeting cost c. $ 1,000,000

• Airfares, accommodation, salaries, etc.– What did we accomplish?

• Tangibles?• Intangibles?

– What have we produced so far?• 3 standards, several +/- standards• Compliance?• A ‘state of mind’?

TDWG value for money

• Can we do it better?– Can we do it cheaper, faster?

• Use the wiki/listserv better– Accomplish more?

• New standards• Better standards

– Produce more?• New standards?• Retire standards?• Rationalize standards?

WHERE TO FROM HERE

Where to from here?

• Tools at our disposal– TWDG Executive– Technical Architecture Group– TDWG working groups– On-line forums, lists– Web and Wiki– On-line Journal

Where to from here?

• Increase TDWG Profile– ‘Market penetration’– Greater implementation, compliance– Attention to smaller institutions

• ‘the long tail’– Multilingual standards

– Strengthen partnerships, collaboration• GBIF, EoL, etc.• National initiatives

Where to from here?

• TAG– Coordination of standards– Ontologies– Resolve metadata issues– Retire or deprecate standards

• ‘Us’– Participation– Implementation– Compliance

Where to from here?

xkcd.com

TDWG – a glass half full

• TDWG has a lot to do• But it has accomplished a lot• Without the foundation of TDWG there could be:– No AVH– No ALA– No GBIF– No EoL– No [name your biodiversity acronym]

TDWG – a glass half full

• TDWG has strong participant support– C. 200 participants in TDWG 2008

• Key institutional engagement– International– National – Regional – Local

• Increasing demand for products– Global change, habitat depletion, etc.

TDWG Mission

• Develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms

• Promote the use of standards through the most appropriate and effective means and

• Act as a forum for discussion through holding meetings and through publications

** notes **

TDWG?

top related