www.ebi.ac.uk/arrayexpress ebi is an outstation of the european molecular biology laboratory....

15
www.ebi.ac.uk/ arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ Arr ayExpress Helen Parkinson, PhD

Upload: jennifer-blair

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpressEBI is an Outstation of the European Molecular Biology Laboratory.

Anatomy ontology evaluation @ ArrayExpress

Helen Parkinson, PhD

Page 2: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Content

• ArrayExpress use cases• Fuzzy matching of ontology terms• Data driven ontology building• Wish list

Page 3: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

ArrayExpress: Overview

Submit Hybs

Experiment queries

Public/Private

ATLASSummarize

Public Only

Re-annotate

Gene queries

Genes

Cross expt/speciesqueries

Page 4: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Fuzzy matching of ontology terms – why?

• Clean up ArrayExpress OE and synonym tables• OE based integration• Constrain OEs on data entry/validation• Improved searches in repository/DW web interface• Data integration across species, experiments and

experimental designs• Automated mapping of free text to ontology terms for data

imporrt

Page 5: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Phonetic Matching

• Precompute phonetic encodings of all terms in the ontology

• Match each target term by comparing these encodings• Soundex: Robert Russell and Margaret Odell (1918), famously

described by Donald Knuth• Double Metaphone: Lawrence Philips (2000)• Metaphone: Lawrence Philips

• Most matches are single• Highest success rate

Page 6: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Algorithm comparisons

So

und

ex

Me

tap

ho

ne

Do

ub

le M

eta

ph

on

e

Le

ve

nsh

tein0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

SAEL vs. AE Organ-ismPart

nonemultiple_badmultiple_okaysingle_badsingle_okayvalid

Page 7: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Percent matches using automated mapping

Page 8: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Failures to match

• Species (or Kingdom)-specific terms (e.g. plant anatomy)• Conflated terms (e.g. diseased cell types)• Compound terms (e.g. "cerebral cortex and

hypothalamus")• Genuinely missing terms

• Esoteric terms less of a priority

• Most trivial misspellings, however, were matched• Dirty input data

Page 9: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Implications

• Need more terms in some commonly-used ontologies• Synonyms are important

• generating less noise • better coverage

• Choice of ontology can limit expressivity - this will be frustrating to biologists

Page 10: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Why?

• Clean up ArrayExpress OE and synonym tables• Add accessions/DB links to these tables• Constrain OEs on data entry/validation• Improved searches in repository/DW web interface• Generate suggestions for new OE terms• Evaluate domain coverage by a given ontology

Page 11: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpressArrayExpress Ontology Development and Future Directions

21.04.2311

Developing the Ontology

• Define Scope: ArrayExpress already has some useful structure given the current database plus rich source of use cases and competency questions.

• Build: Ontology Capture: Identify key concepts and relationships within our domain and give explicit definitions to these features:• Middle-out approach – specify core of basic terms then specialise and

generalise as required

• Mappings – text mining approach to do initial semi-automated mappings to external resources for rapid coverage

• Manual mapping for data warehouse data, and selected data sets

Page 12: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpressArrayExpress Ontology Development and Future Directions

21.04.23

Capture to Code: Definitions and Hierarchy

Page 13: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpressArrayExpress Ontology Development and Future Directions

21.04.23

Semantic Roadmap• Position of the ArrayExpress Experimental Factor

Ontology in the ‘bigger picture’

AE Ontology

Disease Ontology Common Anatomy Reference Ontology

Cell Type OntologyChemical Entities of Biological Interest

(ChEBI) NCI

Various Species Anatomy

Ontologies

• Key is orthogonal coverage, reuse of existing resources and shared frameworks

Page 14: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Wish list

• NOT to build our own anatomy ontology• CARO extension• CARO evaluation • Mapping CARO to relevant multi-species ontologies• Application of CARO to ArrayExpress data• Use of CARO in ArrayExpress tools

Page 15: Www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson,

www.ebi.ac.uk/arrayexpress

Acknowledgments

• Anna Farne• Ele Holloway• James Malone• Margus Lukk ArrayExpress Production Team• Helen Parkinson• Tim Rayner• Faisal Rezwan• Eleanor Williams• Mengyao Zhao• Holly Zheng