sysmo-db and isa katy wolstencroft, university of manchester, uk
TRANSCRIPT
SysMO-DB and ISA
Katy Wolstencroft, University of Manchester, UK
Data Exchange in SysMO Public data sources
model organism databases – (e.g. SGD)
BRENDA …. Data produced by SysMO
SABIO-RK, iChiP, MeMo …. Local databases & Files
Excel Spreadsheets The most common form of
experimental data format.
Proteomics
Met
adat
a
Metabolomics
Microarray
Proteomics
Single Cell Data
Variable descriptions of dataLittle adoption of community controlled vocabulary terms
Challenges..…
Enable data to be easily exchanged & integrated Preserving project autonomy Working with existing resources
Wikis; CMS - Alfresco, eGroupWare,MediaWiki; Databases- BASE, maxD; Files and Spreadsheets.
Falling in with common work practices Exploiting existing resources in the community
COSMIC
BaCell-SysMO
SysMOLab
MOSES
Alfresco
Alfresco
Wiki
Wiki
ANOTHER
A DATASTORE
Extracting Data
JERM
JERM “Just Enough Results Model” Minimum information to exchange data
What type of data is it Microarray, growth curve, enzyme activity…
What was measured Gene expression, OD, metabolite concentration….
What do the values in the datasets mean Units, time series, repeats….
Which experiment does it relate to How was the data created
SOPs and protocols
The Idea
For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data
Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template
Define a JERM….. Top down analysis of standards Bottom up analysis of practice
1
2
3
ISA-TAB
For publishing
JERM data needs to be related to SOPs, experimental context and other data
JERM must be “MIBBI” compliant for exporting to public repositories e.g. Microarray data needs to be MIAME compliant
CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions
http://www.mibbi.org/index.php/MIBBI_portal
Minimum Information Models
Investigation Title Invasive vs. non-invasive strains of yeast
Experimental Design individual_genetic_characteristics_design growth_condition_design
Experimental Factor Name EF_Genotype EF_GrowthCond
Experimental Factor Type genotype growth_condition
Person Last Name Falstaff Shakespeare
Person First Name John Bill
Person Roles submitter;investigator investigator
Experiment Description An experiment was performed to...
Protocol Name Yeast Growth RNA extraction
Protocol Type grow nucleic_acid_extraction
Protocol Description S. cerevisiae cultures were grown on...
Total cellular RNA was extracted...
Protocol Parameters carbon source;temperature
SDRF File my_sdrf_file.txt
ISA-TAB
Relating data and its experimental context Investigation, Study, Assay
TAB = tabular A format suitable for spreadsheets
“assists in the reporting and local management of experimental metadata (i.e. sample characteristics, technologies used, type of measurements) from studies employing one or a combination of technologies
facilitates submission to international public repositories of genomics, transcriptomics and proteomics studies”
Originally developed for multiple ‘omics data
ArrayExpress Pride
Existing production systems
Transcriptomics data files +
required experimental descriptors
Proteomics data files +
required experimental descriptors
HUPO-PSI
standards
MGED
standards
Mage TAB ProteomeHarvest
MIAMExpress
Mage-ML PSI-XML(s)
Current situation @ EBINO common
representation
of complex studies
Independent databases,
different metadata representation, format,
diverse terminologies etc.
STO
RA
GE
SU
BM
ISS
IO
NR
ETR
IEV
AL
ISA Provides....
A common framework for describing how your data relates to its experimental context
A common framework for relating different types of data
ISA Provides
Cross walking between the Omics data stores Relating microarrays and proteomics etc if they
are part of the same study Providing a single mechanism for submission to
multiple data silos
ISA Defined
Investigation: high level description of the area and the main aims of a project
Study: a particular biological hypothesis or analysis
Assay: specific, individual experiments required to be undertaken together in order to address the study hypotheses
ISA in SysMO
Investigation: main aims of SysMO projects Analysis of Central Carbon Metabolism of Sulfolobus
solfataricus under varying temperatures Study: a collection of experiments designed to answer a
particular biological question Comparison of S. solfataricus grown at 70 and 80 degrees
Assay: individual experiments in the study Comparison of transcriptome 70 and 80c (Cdna microarray) Comparison of proteome at 70 and 80c (Protein expression
profiling) Enzyme activity tests for s. solfataricus (Assay types) Intracellular metabolomics of s. solfataricus at 70 and 80c
(Metabolomics)
ISA in SysMO
Assays linked to data files Data files linked together Assays and data files linked to protocols and
SOPs
ISA data is available to all in consortium Data files and SOPs may be shared or kept
private
Advantages
A common structure across consortium Can be bundled together with data files to
produce a common export format Allows automated submission to public omics stores
ArrayExpress, Pride etc
Requires SysMO consortium members to only record metadata once
Experimental Data Metadata
People
ProjectsAssay
Study
Experimental conditionsFactors studied
Models
SOPs
Homogenised terminology and values in the datasets themselves
Workflows
Based on ISA-TAB
Investigation
SEEK + JERM
Acknowledgements SysMO-DB Team SysMO-PALS
myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB