Download - ArrayExpress: Helen Parkinson
EBI is an Outstation of the European Molecular Biology Laboratory.
MAGE-TAB - The ArrayExpress Production Experience
Helen Parkinson, PhD
www.ebi.ac.uk/arrayexpress
Content
• All change at ArrayExpress• Data acquisition • Validation• Extension • Downloads• Long Term Future• Tutorial – submitting in MAGETAB format
www.ebi.ac.uk/arrayexpress
MAGEMLMAGEML
MAGEML
MAGEML
AEM.EXPRESS
MAGETABULATOR
Tracking
M.EXPRESS
MAGETABULATOR
AE2
MAGETAB MIGRATION
MAGETAB
MAGETAB
www.ebi.ac.uk/arrayexpress
Data acquisition
• MAGETAB data acquisition is integrated with existing tab2mage submissions
• MAGETAB export is being added to the MIAMExpress system
• All MAGE-ML submissions will be converted to MAGETAB• We will unify data acquisition on MAGETAB• We decided to do most curation/validation/ontology matching
at the end for MAGETAB submissions• MAGETAB makes curator edit and user update much easier• Human readable tab delimited formats=efficient curation
• 1600 Experiments processed (1600/3700) • All curated• Subset of ArrayExpress MAGETAB data will be re-curated at
migration
www.ebi.ac.uk/arrayexpress
Automated processing and validation
• Sections• MAGETAB Column Headers• MAGTAB Column Orders• MAGETAB Content – length, terms• External data files – released monthly• vs. ArrayExpress content • MIAME score• DW candidates
www.ebi.ac.uk/arrayexpress
Extensibility
• Solexa data • Proteomics• Metabolomics
• Array Genotype data (Gen2Phen)
• Association study data (Gen2Phen, Engage)
• Locus specific SNP data
• Clinical Data
• …..
www.ebi.ac.uk/arrayexpress
Downloads
• All ArrayExpress data will be available in MAGETAB format now (exported direct from AE)
• ~90% is currently available and passes checks (issues with MAGE-OM->MAGETAB)
• More ontology term sources will be added incrementally – NCI thesaurus/OBI/ArrayExpress Factor Ontology
• Beta MAGETAB ArrayExpress Bioconductor Module (Huber, Kauffman)
• All MAGETAB generation code is available• All validation code is available
www.ebi.ac.uk/arrayexpress
Ontologies
• Working to develop OBI to replace MGED ontology• Generating a sample/factor ontology for ArrayExpress
based on data content • Developed in Protégé/OWL format• Will be served from OLS• Also mapping to external ontologies for samples e.g NCI
thesaurus• Text mining to annotate external data using dictionaries
based on NCI thesaurus and some custom ones (GEOimporter, tab2mage->MAGETAB)• Data import, meta analysis
www.ebi.ac.uk/arrayexpress
Future: ArrayExpress and Community
• ArrayExpress Submission in MAGETAB ADF format • All ArrayExpress ADF in MAGETAB format• Alpha ArrayExpress-MAGETAB BioConductor MAGETAB importer• AE2• AE2 data migration • More people post their MAGETAB examples and we agree on a gold
std validated set for typical cases• Community lists of MAGETAB supportive tools where people can
register their interests and describe their applications (like GO tools)• Addressing HLA • MAGETAB model, firm up the spec• Decide what factors really are, and whether the MAGE case is still
valid – controlled vs uncontrolled variables instead? • Issues with global variables - inter experiment comparison of
compounds needs to know dose even if dose doesn’t vary in an experiment
www.ebi.ac.uk/arrayexpress
Acknowledgments • Anna Farne• Ele Holloway• James Malone• Margus Lukk ArrayExpress Production Team• Helen Parkinson• Tim Rayner• Faisal Rezwan• Eleanor Williams• Mengyao Zhao• Holly Zheng• Mohammad Shojatalab ArrayExpress Development Team
• FundingEC - FELICS, EMERALD, Gen2Phen, MUGENNIH - MAGE grant
www.ebi.ac.uk/arrayexpress
Tutorial
• Creation of MAGETAB templates• Completion of a pre-made template• Curation
• Scoring and validation templates• Viewing Data in ArrayExpress• Backend of the template generation/tracking system
• www.ebi.ac.uk/~parkinso/MAGETAB_tutorial/