1 outline standardization - necessary components –what information should be exchanged –how the...

19
1 Outline • Standardization - necessary components what information should be exchanged how the information should be exchanged – common terms (ontologies) – common ways of describing data processing – how to query information • ArrayExpress – public repository for microarray data – www.ebi.ac.uk/arrayexpress

Upload: raymond-chandler

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

1

Outline

• Standardization - necessary components– what information should be exchanged– how the information should be exchanged– common terms (ontologies)– common ways of describing data processing– how to query information

• ArrayExpress– public repository for microarray data– www.ebi.ac.uk/arrayexpress

Page 2: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

2

What information should be exchanged?

• MIAME - Minimum Information About a Microarray Experiment– informal specification– paper published in Nature Genetics– goal - to initiate discussion:

• which details are important and which may not be

Page 3: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

3

Ultimate dream

Samples

Gen

es

Gene expression levels (in mRNAcounts/cell)Pointers to (a)

well-establishedgene database(s)

Pointers to a well-establishedsample ontology

Minimuminformationis the followingtable:

Page 4: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

4

Currently: MIAME six parts

1. Experimental design: the set of the hybridisation experiments as a whole

2. Array design: each array used and each element (spot) on the array

3. Samples: samples used, the extract preparation and labeling

4. Hybridizations: procedures and parameters

5. Measurements: images, quantitation, specifications

6. Controls: types, values, specifications

Page 5: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

5

Login

Pending/New Experiment

Sample1 Sample2 Sample3 Samplen Sample protocol

Hybridisations Hyb protocol

Array1 Array2 Array3 Arrayn Scanning protocol

Data1 Data2 Data3 Datan Image analysis protocol

Combined Experiment Data Transformation protocol

Submit Final free text comment

Create account

Extracts 1…nExtracts 1…n Extracts 1…n Extracts 1…n

E1 E2 En E1 E2 En E1 E2 En E1 E2 En

Extraction protocol

MIAMExpresssubmission procedure

http://www.ebi.ac.uk/miamexpress

MAGE-ML

Page 6: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

6

How the information should be exchanged?

• MAGE OM- MicroArray Gene Expression Object Model– formal specification - UML (Unified Modeling

Language) model– described by a set of diagrams– standardized through Object Management Group– describes the domain of microarray data– can serve as a source for generating various

software artifacts

Page 7: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

7

MAGE - brief history

• August 1997 - Life Sciences Research group formed within the Object Management Group

• March 2000 - gene expression RFP issued• December 2000 - initial submissions of proposals

for gene expression data standards:– EBI (on behalf of MGED) - MAML

– Rosetta (on behalf of GEML community) - GEML + some IDLs

– NetGenics - IDLs

Page 8: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

8

MAGE - brief history (2)• Decision to proceed with a joint submission• Decision to base the standard on UML• Submitters’ meetings throughout 2001• End of January 2002 - MAGE becomes an adopted

specification• October 2002 - MAGE becomes an available

specification

• MAGE-ML - XML language - automatically derived from MAGE

• (More than) MIAME-compliant; only subset can be used

Page 9: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

9

MAGE – an example diagram

Page 10: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

10

Use case of MAGE:ArrayExpress architecture

ArrayExpress(Oracle)

Browser

MIAMEexpress

MAGE-ML(DTD)

MAGE-OM

MAGE-ML (doc)MAGE-ML (doc)MAGE-ML (doc)

dataloader

Velocitytemplateengine

Castor

object/relationalmapping

Web pagetemplateWeb pagetemplate

Java servlets Tomcat

Page 11: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

11

ArrayExpress(Oracle)

OtherMicroarraydatabases

www

EBI

ExpressionProfiler

ExternalBioinformatics

databases

Data analysis

www

Queries

www

MIAMExpress(MySQL)

MAGE-ML

Submissions

Array Manufacturers

LIMS

Microarray

software

Data Analysissoftware

ArrayExpress Infrastructure

MAGE-ML import,

export

Local MIAMExpressInstallations

Data

pipelines

MAGE-ML

Page 12: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

12

Common terms (ontologies)• What is an ontology?

– formal model of some domain– simplest ontologies – controlled vocabularies– hierarchical, other relations, constraints, …

• MGED Ontology• maintained by Chris Stoeckert, UPenn• enables:

– unambiguous annotation– therefore, queries

• currently sample description• experiment design description to come• multiple formats: RDFS, DAML+OIL

Page 13: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

13

Ontologies and ArrayExpress

• Curation team– lead by Helen Parkinson– currently 5 curators

• Curation tool under development– management of all relevant ontologies “under one roof”– support in distributed ontology development– submission tracking– accession numbers– ...

Page 14: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

14

Common ways of describing data processing

• no “deliverables” yet

• MAGE can describe data processing– just syntax, too much free text

• Laboratory Activity Broker process within OMG - common points?

• problem:– it is possible to come up with a universal

framework that can describe all possible scenarios of data processing

– however, how will it be used in real life?

Page 15: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

15

process instance

clustering pattern discovery

visualization data filtering

data parameter values

in

out

...

workflow enactment

process typedata type

in

out

workflow

parameters

Page 16: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

16

Benefits

• compile “best practices” of data analysis

• document what has been done to obtain final results

• enable “high-throughput” data analysis work

Page 17: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

17

How to query information• again no “deliverables”

• initial plan - MAGE will include query support– all methods were dropped - a data model

• ArrayExpress - 2 large components:– repository - retrieve experiments as units,

MAGE-based– warehouse - gene & data- oriented queries,

work across experiments

• G2G (Jason Stewart) - protocol + query language for distributed queries

Page 18: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

18

ratio absolute change

confidence measure

namedesign element type

speciessample type

bioassay type

performer labexper. type

array design name

platform type

provider

Properties Properties

Properties

Properties Properties

Page 19: 1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)

19

Problems Components ArrayExpress

What toexchange

MIAME - MIAMExpress, a tool that helps tocapture important annotations

How toexchange

MAGE - can import/will be able to exportMAGE-ML

- DB schema based on MAGE

Commonterms

MGEDontology

- curation team

- curation tool being developed

Dataprocessing

ongoingwork

- data analysis modules built on top(Expression Profiler)

Queries ongoingwork

- repository (experiments as units)

- warehouse (for numeric/gene basedqueries, cross-experiment)

Summary