mged ontology working group mged4 boston, ma feb. 15, 2002 chris stoeckert, center for...
TRANSCRIPT
MGED Ontology Working Group
MGED4
Boston, MA
Feb. 15, 2002
Chris Stoeckert, Center for Bioinformatics, U. Penn
Helen Parkinson, EBI
Agenda
• Overview of ontologies• Status of MGED Ontology• Incorporating ontologies into microarray
database annotation forms - Helen Parkinson• Discussion
– Annotation experience – Use Cases: needs besides retrieving
experiments?– issues:
• Missing concepts? (quick tour of ontology)• Relationship between MAGE and MGED ontology
What Does an Ontology Do?
• Captures knowledge• Creates a shared understanding – between
humans and for computers• Makes knowledge machine processable• Makes meaning explicit – by definition and
context
From Building and Using Ontologies, Robert Stevens, U. of Manchester
What is an Ontology?
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation Formal
is-aFrames
(properties)
Informalis-a
Formalinstance
Value Restrs. Disjointness, Inverse, part-
of…
From Building and Using Ontologies, Robert Stevens, U. of Manchester
Uses of Ontology
• Community reference -- neutral authoring. • Either defining database schema or defining a common
vocabulary for database annotation -- ontology as specification. • Providing common access to information. Ontology-based
search by forming queries over databases. • Understanding database annotation and technical literature.• Guiding and interpreting analyses and hypothesis generation
From Building and Using Ontologies, Robert Stevens, U. of Manchester
Components of an Ontology
• Concepts: Class of individuals – The concept Protein and the individual `human cytochrome C’
• Relationships between concepts• Is a kind of relationship forms a taxonomy• Other relationships give further structure – is a
part of• Axioms – Disjointness, covering, equivalence,…
From Building and Using Ontologies, Robert Stevens, U. of Manchester
Languages• Vocabularies using natural language
– Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with weak semantics
– Gene Ontology
• Object-based KR: frames– Extensively used, good structuring, intuitive. Semantics defined by OKBC
standard– EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua)
• Logic-based: Description Logics– Very expressive, model is a set of theories, well defined semantics– Automatic derived classification taxonomies– Concepts are defined and primitive
From Building and Using Ontologies, Robert Stevens, U. of Manchester
Microarray Information to be Captured
Figure from:David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14
MGED Ontology Working Group Goals
1. Identify concepts
2. Collect available controlled vocabularies and ontologies for concepts
3. Define concepts
4. Formalize concept relationships
Relationship of MGED Efforts
MAGEMIAMEDB
MIAMEDBExternal
Ontologies/CVs
MGED Ontology
AnnotationFormatOntologies External Internal
Ontologies provide common terms and their definitions for describing microarray experiments.
http://www.cbil.upenn.edu/Ontology/
SpeciesResources
ConceptDefinitions
Usage of Concepts and Resources for Microarrays
• MIAME glossary– Provide definitions for types of information
(concepts) listed in MIAME
• MIAME qualifier, value, source– Provide pointers to relevant sources that can be
used to annotate experiments
sample source and treatment ID as used in section 1organism (NCBI taxonomy)additional "qualifier, value, source" list; the list includes:
cell source and type (if derived from primary sources (s))sexagegrowth conditionsdevelopment stageorganism part (tissue)animal/plant strain or linegenetic variation (e.g., gene knockout, transgenic variation)individualindividual genetic characteristics (e.g., disease alleles, polymorphisms)disease state or normaltarget cell typecell line and source (if applicable)in vivo treatments (organism or individual treatments)in vitro treatments (cell culture conditions)treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)compoundis additional clinical information available (link)separation technique (e.g., none, trimming, microdissection, FACS)
laboratory protocol for sample treatment
MIAME Section on Sample Source and Treatment
External References ©-BioMaterialDescription
©-Biosource Property
©-Organism
©-Age
©-DevelopmentStage
©-Sex
©-StrainOrLine
©-BiosourceProvider
©-OrganismPart
©-BioMaterialManipulation
©-EnvironmentalHistory
©-CultureCondition
©-Temperature
©-Humidity
©-Light
©-PathogenTests
©-Water
©-Nutrients
©-Treatment
©-CompoundBasedTreatment
(Compound)
(Treatment_application)
(Measurement)
MGED Ontology Instances
NCBI TaxonomyNCBI Taxonomy
Mouse Anatomical DictionaryMouse Anatomical Dictionary
International Committee on Standardized Genetic Nomenclature for Mice
International Committee on Standardized Genetic Nomenclature for Mice
Mouse Anatomical DictionaryMouse Anatomical Dictionary
ChemIDplusChemIDplus
Mus musculus musculus id: 39442
7 weeks after birth
Stage 28
Female
C57BL/6N
Charles River, Japan
Liver
22 2C
55 5%
12 hours light/dark cycle
Specified pathogen free conditions
ad libitum
MF, Oriental Yeast, Tokyo, Japan
Fenofibrate, CAS 49562-28-9
in vivo, oral gavage
100mg/kg body weight
An example of microarray sample annotation using the MGED ontology Susanna A. Sansone, Helen Parkinson, Philippe Rocca-Serra,
Chris Stoeckert and Alvis Brazma
MAGE BioMaterial Model
MGED Biomaterial Ontology• Under construction
– Using OILed (Not wedded to any one tool)– Generate multiple formats: RDFS, DAML+OIL
• Define classes, provide relations and constraints, identify instances
• Motivated by MIAME and coordinated with MAGE
http://www.ontoknowledge.org/oil/
Building a Microarray Ontology
http://www.cbil.upenn.edu/Ontology/Build_Ontology2.html
http://mged.sourceforge.net/Ontologies.shtml
Ontology in Browseable Form
Example of Internal Terms
Example of External Terms
Example of Combined Internal and External: Treatment
OWG Use Cases• Make it easier and more accurate to annotate a microarray experiment.
– Build forms that provide menus of terms and links to external resources.– Only ask for relevant terms and fill in terms that can be inferred.
• Return a summary of all experiments that use a specified type of biosource.– Use “age” to select and order experiments– Use Mouse Anatomical Dictionary Stage 28 to pick experiments according to
“organism part”
• Return a summary of all experiments done examining effects of a specified treatment– E.g., Look for “CompoundBasedTreatment”, “in vivo”– Select “Compound” based on CAS registry number– Order based on “CompoundMeasurement”
• ? Use to check if “MIAME-compliant.”– Assess only fields that are relevant– Check for proper use of terms
• ? Build gene networks based on biomaterial description– Generate a distance metric based on biosource and use in calculation of
correlation with gene expression level– Generate an error estimation based on biosample (i.e., even when biosources are
identical, there will be variation resulting from different treatments)
MGED Ontology Plans• More Concepts? Improve definitions?
– Extend to other parts of MIAME
• More instances!• Add identifiers to all classes (facilitate neutral
authoring). Instances?• Add constraints. Prevent nonsense associations
(e.g., only time units for age)• Write a paper describing and explaining MGED
ontology by next meeting with example applications and datasets.– Mechanism to establish a consensus “standard.”