1 outline standardization - necessary components –what information should be exchanged –how the...

1

Outline

• Standardization - necessary components– what information should be exchanged– how the information should be exchanged– common terms (ontologies)– common ways of describing data processing– how to query information

• ArrayExpress– public repository for microarray data– www.ebi.ac.uk/arrayexpress

2

What information should be exchanged?

• MIAME - Minimum Information About a Microarray Experiment– informal specification– paper published in Nature Genetics– goal - to initiate discussion:

• which details are important and which may not be

3

Ultimate dream

Samples

Gen

es

Gene expression levels (in mRNAcounts/cell)Pointers to (a)

well-establishedgene database(s)

Pointers to a well-establishedsample ontology

Minimuminformationis the followingtable:

4

Currently: MIAME six parts

1. Experimental design: the set of the hybridisation experiments as a whole

2. Array design: each array used and each element (spot) on the array

3. Samples: samples used, the extract preparation and labeling

4. Hybridizations: procedures and parameters

5. Measurements: images, quantitation, specifications

6. Controls: types, values, specifications

5

Login

Pending/New Experiment

Sample1 Sample2 Sample3 Samplen Sample protocol

Hybridisations Hyb protocol

Array1 Array2 Array3 Arrayn Scanning protocol

Data1 Data2 Data3 Datan Image analysis protocol

Combined Experiment Data Transformation protocol

Submit Final free text comment

Create account

Extracts 1…nExtracts 1…n Extracts 1…n Extracts 1…n

E1 E2 En E1 E2 En E1 E2 En E1 E2 En

Extraction protocol

MIAMExpresssubmission procedure

http://www.ebi.ac.uk/miamexpress

MAGE-ML

6

How the information should be exchanged?

• MAGE OM- MicroArray Gene Expression Object Model– formal specification - UML (Unified Modeling

Language) model– described by a set of diagrams– standardized through Object Management Group– describes the domain of microarray data– can serve as a source for generating various

software artifacts

7

MAGE - brief history

• August 1997 - Life Sciences Research group formed within the Object Management Group

• March 2000 - gene expression RFP issued• December 2000 - initial submissions of proposals

for gene expression data standards:– EBI (on behalf of MGED) - MAML

– Rosetta (on behalf of GEML community) - GEML + some IDLs

– NetGenics - IDLs

8

MAGE - brief history (2)• Decision to proceed with a joint submission• Decision to base the standard on UML• Submitters’ meetings throughout 2001• End of January 2002 - MAGE becomes an adopted

specification• October 2002 - MAGE becomes an available

specification

• MAGE-ML - XML language - automatically derived from MAGE

• (More than) MIAME-compliant; only subset can be used

9

MAGE – an example diagram

10

Use case of MAGE:ArrayExpress architecture

ArrayExpress(Oracle)

Browser

MIAMEexpress

MAGE-ML(DTD)

MAGE-OM

MAGE-ML (doc)MAGE-ML (doc)MAGE-ML (doc)

dataloader

Velocitytemplateengine

Castor

object/relationalmapping

Web pagetemplateWeb pagetemplate

Java servlets Tomcat

11

ArrayExpress(Oracle)

OtherMicroarraydatabases

www

EBI

ExpressionProfiler

ExternalBioinformatics

databases

Data analysis

www

Queries

www

MIAMExpress(MySQL)

MAGE-ML

Submissions

Array Manufacturers

LIMS

Microarray

software

Data Analysissoftware

ArrayExpress Infrastructure

MAGE-ML import,

export

Local MIAMExpressInstallations

Data

pipelines

MAGE-ML

12

Common terms (ontologies)• What is an ontology?

– formal model of some domain– simplest ontologies – controlled vocabularies– hierarchical, other relations, constraints, …

• MGED Ontology• maintained by Chris Stoeckert, UPenn• enables:

– unambiguous annotation– therefore, queries

• currently sample description• experiment design description to come• multiple formats: RDFS, DAML+OIL

13

Ontologies and ArrayExpress

• Curation team– lead by Helen Parkinson– currently 5 curators

• Curation tool under development– management of all relevant ontologies “under one roof”– support in distributed ontology development– submission tracking– accession numbers– ...

14

Common ways of describing data processing

• no “deliverables” yet

• MAGE can describe data processing– just syntax, too much free text

• Laboratory Activity Broker process within OMG - common points?

• problem:– it is possible to come up with a universal

framework that can describe all possible scenarios of data processing

– however, how will it be used in real life?

15

process instance

clustering pattern discovery

visualization data filtering

data parameter values

in

out

...

workflow enactment

process typedata type

in

out

workflow

parameters

16

Benefits

• compile “best practices” of data analysis

• document what has been done to obtain final results

• enable “high-throughput” data analysis work

17

How to query information• again no “deliverables”

• initial plan - MAGE will include query support– all methods were dropped - a data model

• ArrayExpress - 2 large components:– repository - retrieve experiments as units,

MAGE-based– warehouse - gene & data- oriented queries,

work across experiments

• G2G (Jason Stewart) - protocol + query language for distributed queries

18

ratio absolute change

confidence measure

namedesign element type

speciessample type

bioassay type

performer labexper. type

array design name

platform type

provider

Properties Properties

Properties

Properties Properties

19

Problems Components ArrayExpress

What toexchange

MIAME - MIAMExpress, a tool that helps tocapture important annotations

How toexchange

MAGE - can import/will be able to exportMAGE-ML

- DB schema based on MAGE

Commonterms

MGEDontology

- curation team

- curation tool being developed

Dataprocessing

ongoingwork

- data analysis modules built on top(Expression Profiler)

Queries ongoingwork

- repository (experiments as units)

- warehouse (for numeric/gene basedqueries, cross-experiment)

Summary

1 outline standardization - necessary components –what information should be exchanged –how the...

Documents