arrayexpress and gene expression atlas: mining functional genomics data emma hastings functional...

89
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL [email protected] http://www.ebi.ac.uk/~emma/Zagreb/

Upload: angela-golden

Post on 28-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

Emma HastingsFunctional Genomics Team

EBI-EMBL

[email protected]://www.ebi.ac.uk/~emma/Zagreb/

Page 2: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

2

Services available from EBI

GenomesGenomes

DNA & RNA sequenceDNA & RNA sequence

Gene expressionGene expression

Protein sequenceProtein sequence

Protein families,

motifs and domains

Protein families,

motifs and domains

Protein structureProtein structure

Protein interactionsProtein interactions

Chemical entitiesChemical entities

PathwaysPathways

SystemsSystems

Literature and ontologiesLiterature and ontologies

www.ebi.ac.uk

Page 3: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress3

Talk structure

Why do we need a database for functional genomics data?

ArrayExpress database

• Archive

• Gene Expression Atlas

Database content

Query the database

Data download

Data submission

Page 4: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

What is functional genomics?

Functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and interactions. Unlike genomics and proteomics, functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein-protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. Functional genomics attempts to answer questions about the function of DNA at the levels of genes, RNA transcripts, and protein products.

ArrayExpress & Atlas4

Page 5: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Components of a functional genomics experiment

• Array design information• Location of each element• Description of each element• Hybridization protocol

• Quantification matrix• Software specifications

• Sample source• Sample treatments• RNA extraction protocol• Labelling protocol

• Control array elements• Normalization method

• Image• Scanning protocol• Software specifications

ArrayArray

Normalized dataNormalized data

Raw dataRaw data

SampleSample SampleSample• Sample source• Sample treatments• Template preparation

LibraryLibrary

• Library preparation

ChipChip

• Cluster amplification

• Sequencing and imaging

Data analysisData analysis

• From images to sequences• Quality Control• Sequence alignment• Assembly• Specific steps depending on the application

Data analysisData analysis

Page 6: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Why do we need a database for functional genomics data?

ArrayExpress & Atlas6

E-MEXP-2451

Transcription profiling of grape berries during ripening

Grape berries undergo considerable physical and biochemical changes during the ripening process. Ripening is characterized by a number of changes, including the degradation of chlorophyll, an increase in berry deformability, a rapid increase in the level of hexoses in the berry vacuole, an increase in berry volume, the catabolism of organic acids, the development of skin colour, and the formation of compounds that influence flavour, aroma, and therefore, wine quality. The aim of this work is to identify differentially expressed genes during grape ripening by microarray and real-time PCR techniques. Using a custom array of new generation, we analysed the expression of 6000 grape genes from pre-veraison to full maturity, in Vitis vinifera cultivar Muscat of Hamburg, in two different years (2006 and 2007). Five time points per year and two biological replicates per stadium were considered. To reduced intra-plant and inter-plant biological variability, for each ripening stadium we collected around hundred berries from several bunch grapes of five plants of V. vinifera cv Muscat of Hamburg. We will use the real-time PCR technique to validate microarray data.Muscat of Hamburg. We will use the real-time PCR technique to validate microarray data.

E-MEXP-2451

Transcription profiling of grape berries during ripening

Grape berries undergo considerable physical and biochemical changes during the ripening process. Ripening is characterized by a number of changes, including the degradation of chlorophyll, an increase in berry deformability, a rapid increase in the level of hexoses in the berry vacuole, an increase in berry volume, the catabolism of organic acids, the development of skin colour, and the formation of compounds that influence flavour, aroma, and therefore, wine quality. The aim of this work is to identify differentially expressed genes during grape ripening by microarray and real-time PCR techniques. Using a custom array of new generation, we analysed the expression of 6000 grape genes from pre-veraison to full maturity, in Vitis vinifera cultivar Muscat of Hamburg, in two different years (2006 and 2007). Five time points per year and two biological replicates per stadium were considered. To reduced intra-plant and inter-plant biological variability, for each ripening stadium we collected around hundred berries from several bunch grapes of five plants of V. vinifera cv Muscat of Hamburg. We will use the real-time PCR technique to validate microarray data.Muscat of Hamburg. We will use the real-time PCR technique to validate microarray data.

Page 7: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Why do we need a database for functional genomics data?

ArrayExpress & Atlas7

Page 8: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpresswww.ebi.ac.uk/arrayexpress/

Is a public repository for functional genomics data, mostly generated using microarray or high throughput sequencing (HTS) assays

Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ

Provides easy access to well annotated microarray data in a structured and standardized format

Facilitates the sharing of microarray designs and experimental protocols

Based on FGED standards: MIAME checklist, MAGE-TAB format and MO Ontology.

MINSEQE checklist for HTS data (http://www.mged.org/minseqe/)

ArrayExpress8

Page 9: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Reporting standards

Data standardization efforts focus on answering 3 major questions:

1. What information do we need to capture?

2. What syntax (or file format) should we use to exchange data?

3. What semantics (or ontology) should we use to best describe its annotation?

ArrayExpress9

Page 10: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

What information do we need to capture?MIAME checklist

Minimal Information About a Microarray Experiment

The 6 most critical elements contributing towards MIAME are:

1. Essential sample annotation including experimental factors and their values (e.g. compound and dose)

2. Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….)

3. Sufficient array annotation (e.g. gene identifiers, genomic coordinates, probe sequences or array catalog number)

4. Essential laboratory and data processing protocols (e.g. normalization method used)

5. Raw data for each hybridization (e.g. CEL or GPR files)

6. Final normalized data for the set of hybridizations in the experiment

ArrayExpress10

Page 11: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

What information do we need to capture? MINSEQE checklist

Minimal Information about a high-throughput Nucleotide SEQuencing Experiment

The proposed guidelines for MINSEQE are (still work in progress):

1. General information about the experiment

2. Essential sample annotation including experimental factors and their values (e.g. compound and dose)

3. Experimental design including sample data relationships (e.g. which raw data file relates to which sample, ….)

4. Essential experimental and data processing protocols

5. Sequence read data with quality scores, raw intensities and processing parameters for the instrument

6. Final processed data for the set of assays in the experiment

ArrayExpress11

Page 12: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray experiment:

IDF Investigation Description Format file, contains top-level information about the experiment including title, description, submitter contact details and protocols.

SDRF Sample and Data Relationship Format file contains the relationships between samples and arrays, as well as sample properties and experimental factors, as provided by the data submitter.

ADF Array Design Format file, describes the design of an array, i. e. the sequence located at each feature on the array and annotation of the sequences.

Data files Raw and processed data files. The ‘raw’ data files are the files produced by the microarray image analysis software, such as CEL files for Affymetrix or GPR files from GenePix.

The processed data file is a ‘data matrix’ file containing processed values, as provided by the data submitter.

12

What syntax (or file format) should we use to exchange data?

Page 13: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

What semantics (or ontology) should we use to best describe its annotation?

Ontology, which is a formal specification of terms in a particular subject area and the relations among them.

Its purpose is to provide a basic, stable and unambiguous description of such terms and relations in order to avoid improper and inconsistent use of the terminology pertaining to a given domain.

ArrayExpress13

Page 14: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Experimental factor ontology (EFO)http://www.ebi.ac.uk/efo

Application focused ontology modeling experimental factors (EFs) in AE

Developed to:

Consistent curation – ensure that curators use same vocabulary

Query support (e.g, query for 'cancer' and get also ‘leukemia')

EFs are transformed into an ontological representation, forming classes and relationships between those classes

EFO terms map to multiple existing domain specific ontologies, such as the Disease Ontology and Cell Type Ontology

ArrayExpress14

Page 15: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress & Atlas15

What semantics (or ontology) should we use to best describe its annotation?

Page 16: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress16

ArrayExpress – two databases

Page 17: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress – two databases

ArrayExpress18

Page 18: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress Archive

22281

ArrayExpress & Atlas19

...and counting...and counting

Page 19: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

How much data in AE Archive?

ArrayExpress20

Page 20: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress21

Archive by species

Page 21: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress22

Browsing the AE Archive

Page 22: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Browsing the AE ArchiveSpecies

investigatedSpecies

investigatedCurated title of

experimentCurated title of

experiment

The date when the data released

The date when the data released

AE unique experiment ID

AE unique experiment ID

ArrayExpress23

Raw sequencing data available in

ENA

Raw sequencing data available in

ENA

The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed

The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed

The total number of experiments and assay retrieved

The total number of experiments and assay retrieved

The direct link to raw and processed data. An icon indicates that this type of data is available.

The direct link to raw and processed data. An icon indicates that this type of data is available.

Loaded in Atlas flagLoaded in Atlas flag

Number of assays

Number of assays

Page 23: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress24

Browsing the AE Archive

Page 24: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Experimental factor ontology (EFO)Example:

ArrayExpress25

carcinomacarcinoma

acinar cell carcinoma

adrenocortical carcinoma

bladder carcinoma

breast carcinoma

cervical carcinoma

esophageal carcinoma

follicular thyroid carcinoma

hepatocellular carcinoma

pancreatic carcinoma

papillary thyroid carcinoma

renal carcinoma

signet ring cell carcinoma

uterine carcinoma

adenocarcinoma

adenoid cystic carcinoma

endometrioid carcinoma

gastric carcinoma

hereditary leiomyomatosis and renal cell cancer

hypopharyngeal carcinoma

acinar cell carcinoma

adrenocortical carcinoma

bladder carcinoma

breast carcinoma

cervical carcinoma

esophageal carcinoma

follicular thyroid carcinoma

hepatocellular carcinoma

pancreatic carcinoma

papillary thyroid carcinoma

renal carcinoma

signet ring cell carcinoma

uterine carcinoma

adenocarcinoma

adenoid cystic carcinoma

endometrioid carcinoma

gastric carcinoma

hereditary leiomyomatosis and renal cell cancer

hypopharyngeal carcinoma

Page 25: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Searching AE ArchiveSimple query - EFO

ArrayExpress26

•Matches to exact terms are highlighted in yellow•Matches to child terms in the EFO are highlighted in pink

Page 26: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Searching AE ArchiveSimple query

Search across all fields:• AE accession number e.g. E-MEXP-568 • Secondary accession numbers e.g. GEO series accession

GSE5389 • Experiment name • Submitter's experiment description • Sample attributes, experimental factor and values, including

species (e.g. GeneticModification, Mus musculus, DREB2C over-expression)

• Publication title, authors and journal name, PubMed ID

Synonyms for terms are always included in searches e.g. 'human' and 'Homo sapiens’

ArrayExpress27

Page 27: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

AE Archive query output

• Matches to exact terms are highlighted in yellow

Page 28: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

AE Archive – experiment view

ArrayExpress29

MIAME or MINSEQE scores show how much the experiment is standard compliant

MIAME or MINSEQE scores show how much the experiment is standard compliant

Experimental factor(s) and its values

Experimental factor(s) and its values

Link to files available.

This varies between sequencing and microarray data. For

microarray experiments you also have array design file.

Link to files available.

This varies between sequencing and microarray data. For

microarray experiments you also have array design file.

Page 29: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

AE Archive – SDRF file

ArrayExpress30

Page 30: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

SDRF file – sample & data relationship

ArrayExpress31

Page 31: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

AE Archive – ADF file

ArrayExpress32

Page 32: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

AE Archive – all files

ArrayExpress33

Page 33: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress34

AE Archive – all files

Page 34: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Searching AE ArchiveAdvanced query

Combine search terms• Enter two or more keywords in the search box with the operators AND,

OR or NOT. AND is the default search term; a search for kidney cancer will return hits with a match to ‘kidney' AND ‘cancer’

• Search terms of more than one word must be entered inside quotes otherwise only the first word will be searched for. E.g. “kidney cancer”

Specify fields for searches• Particular fields for searching can also be specified in the format

of fieldname:value

ArrayExpress35

Page 35: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Searching AE ArchiveAdvanced query - fieldnames

ArrayExpress36

Field name Searches Example

accession Experiment primary or secondary accession accession:E-MEXP-568

array Array design accession or name array:AFFY-2 OR array:Agilent*

ef Experimental factor, the name of the main variables in an experiment.

ef:celltype OR ef:compound

efv Experimental factor value. Has EFO expansion. efv:fibroblast

expdesign Experiment design type expdesign:”dose response”

exptype Experiment type. Has EFO expansion. exptype:RNA-seq

gxa Presence in the Gene Expression Atlas. Only value is gxa:true.

ef:compound AND gxa:true

pmid PubMed identifier pmid:16553887

sa Sample attribute values. Has EFO expansion. sa:wild_type

species Species of the samples. Has EFO expansion. species:”homo sapiens” AND ef:cellline

Page 36: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Searching AE ArchiveAdvanced query

ArrayExpress37

Filter What is filtered

assaycount:[x TO y] filter on the number of of assays where x <= y and both values are between 0 and 99,999 (inclusive) . To count excluding the values given use curly brackets e.g. assaycount:{1 TO 5} will find experiments with 2-4 assays. Single numbers may also be given e.g. assaycount:10 will find experiments with 10 assays.

efcount:[x TO y] filter on the number of experimental factors

samplecount:[x TO y] filter on the number of samples

sacount:[x TO y] filter on the number of sample attribute categories

rawcount:[x TO y] filter on the number of raw files

fgemcount:[x TO y] filter on the number of final gene expression matrix (processed data) files

miamescore:[x TO y] filter on the MIAME compliance score (maximum score is 5)

date:yyyy-mm-dd filter by release date•date:2009-12-01 - will search for experiments released on 1st of Dec 2009 •date:2009* - will search for experiments released in 2009•date:[2008-01-01 2008-05-31] - will search for experiments released between 1st of Jan and end of May 2008

Filtering experiments by counts of a particular attribute• Experiments fulfilling certain count criteria can also be searched for e.g. having

more than 10 assays (hybridizations)

Page 37: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress & Atlas38

Searching AE ArchiveAdvanced query – Examples

Page 38: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Exercise 1

ArrayExpress39

Page 39: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress40

ArrayExpress – two databases

Page 40: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress41

The criteria we use for selecting experiments for inclusion in the Atlas are as follows:

• Array designs relating to experiment must be provided to enable re-annotation using Ensembl or Uniprot (or have the potential for this to be done)

• High MIAME scores • Experiment must have 6 or more hybridizations • Sufficient replication and large sample size• EF and EFV must be well annotated• Adequate sample annotation must be provided • Processed data must be provided or raw data which can be

renormalized must be available

Gene Expression AtlasExperiment selection criteria

Page 41: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress42

New meta-analytical tool for searching gene expression profiles across experiments in AE

Data is taken as normalized by the submitter

Gene-wise linear models (limma) and t-statistics are applied to calculate the strength of genes’ differential expression across conditions across experiments

The result is a two-dimensional matrix where rows correspond to genes and columns correspond to conditions, rather than samples.

The matrix entries are p-values together with a sign, indicating the significance and direction of differential expression

Gene Expression AtlasAtlas construction

Page 42: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress43

Gene Expression AtlasAtlas construction

Page 43: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Gene Expression AtlasAtlas construction

Page 44: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress45

Gene Expression Atlas

Page 45: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress46

Atlas home pagehttp://www.ebi.ac.uk/gxa/

Query for genesQuery for genes

Query for conditionsQuery for conditionsRestrict query by direction of differential expression

Restrict query by direction of differential expression

The ‘advanced query’ option allows building more complex queries

The ‘advanced query’ option allows building more complex queries

Page 46: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress47

Atlas home pageThe ‘Genes’ search box

Page 47: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress48

Atlas home pageThe ‘Conditions’ search box

Page 48: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress49

Atlas home pageA single gene query

Page 49: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Atlas gene summary page

ArrayExpress50

Page 50: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress & Atlas51

Atlas home pageA ‘Conditions’ only query

Page 51: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress52

Atlas heatmap view

Page 52: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Atlas list view

Click the ‘expression profile’ link to view the experiment page

Click the ‘expression profile’ link to view the experiment page

ArrayExpress53

Page 53: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Atlas data download

ArrayExpress54

Page 54: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress55

Atlas experiment page

Box plot showing differential expression of Mt2,Agxt2l1 and Insig2 across the growth conditions studied. Data is available for 3 separate array probes.

Box plot showing differential expression of Mt2,Agxt2l1 and Insig2 across the growth conditions studied. Data is available for 3 separate array probes.

Search for a specific gene, experimental factor or expression level (up/down)

Search for a specific gene, experimental factor or expression level (up/down)

Page 55: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress & Atlas56

Atlas experiment page – HTS data

Page 56: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress57

Atlas gene-condition query

Page 57: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress58

Atlas query refining

Page 58: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress59

Atlas gene-condition query

Page 59: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress60

Atlas query refining

Page 60: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress61

Atlas query refining

Page 61: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress62

Exercises 2 & 3

Page 62: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress63

Data submission to AE

Page 63: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress64

Data submission to AEwww.ebi.ac.uk/microarray/submissions.html

Page 64: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress- Overview

• MIAMExpress is a web based tool for submitting microarray data and microarray designs to the ArrayExpress database

• MIAMExpress can be used for experiments up to 50 hybridizations in size

ArrayExpress & Atlas65

Array design (custom)Array design (custom) Experiment Experiment ProtocolProtocol

Page 65: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress- Getting Started

ArrayExpress & Atlas66

Page 66: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress- Protocol Submission

ArrayExpress & Atlas67

Change the automatically generated protocol name to something meaningful like Sanger Lab Growth Protocol

Change the automatically generated protocol name to something meaningful like Sanger Lab Growth Protocol

Page 67: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress- Experiment Submission

ArrayExpress & Atlas68

Experiment description Samples Extracts

Labeledextracts Hybs

Upload raw and norm data

Upload combined data files

Batchloader interface where you fill in your

experiment information in a spreadsheet-like

format

Batchloader interface where you fill in your

experiment information in a spreadsheet-like

format

Web page interface in which a series of

web forms are filled out.

Web page interface in which a series of

web forms are filled out.

Page 68: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Batch Upload Tool

ArrayExpress & Atlas69

Experiment description Samples Extracts

Labeledextracts Hybs

Upload raw and norm data

Upload combined data files

HELPHELP

Page 69: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Batch Upload Tool

ArrayExpress & Atlas70

Click multiply to create as many rows as you need. Hint: If you complete all the values in the first row prior to this it will save you time

Click multiply to create as many rows as you need. Hint: If you complete all the values in the first row prior to this it will save you time

ToolbarToolbar

Cross document functionalityCross document functionality

Page 70: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Batch Upload Tool

ArrayExpress & Atlas71

Labeledextracts

Double click and select one or more samples per extract

Double click and select one or more samples per extract

Page 71: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Batch Upload Tool

ArrayExpress & Atlas72

HybsUpload raw and

norm data

Use the folder icon to select the required data files

Use the folder icon to select the required data files

Affymetrix users submit the .CEL file here (and .EXP if available). For all other submissions the raw data must be a single .txt or .gpr file.

Affymetrix users submit the .CEL file here (and .EXP if available). For all other submissions the raw data must be a single .txt or .gpr file.

Page 72: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Batch Upload Tool

ArrayExpress & Atlas73

HybsUpload raw and

norm dataUpload combined

data files

If you are supplying a Final Gene Expression Data File you must supply a transformation protocol

If you are supplying a Final Gene Expression Data File you must supply a transformation protocol

Page 73: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Data File Formats for Submission

• Submitted normalized data files must contain data from a single hybridization only. If your normalization procedure creates a file containing data from all your hybs then you can submit this as a final gene expression matrix (FGEM).

• A normalization protocol should be submitted along with your normalized data files-be precise when describing how the data was calculated

• A final gene expression matrix (FGEM) or combined data file is a file containing data from several hybridizations.

• The creation of your FGEM must be described in your transformation protocol.

• The format of the FGEM is as follows:• each line corresponds to an array element • each column corresponds to your calculated value

ArrayExpress & Atlas74

Page 74: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Array Submission- Background

• An array design describes how a microarray was manufactured, what was printed/synthesized at each position on the array and what biological sequences these represent.

• If the array design has already been submitted to ArrayExpress, you do not need to re-submit it.

• If you have used a custom array design you will need to submit the array design as an Array Design Format (ADF) file

• After submitting an array design you can immediately continue with your experiment submission

• The ADF file can be created in any spreadsheet application but must be saved as a tab delimited text file.

ArrayExpress & Atlas75

Page 75: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Array Submission

ArrayExpress & Atlas76

Enter some general information about your array

Enter some general information about your array

Checker- This tool will check for errors in the format and content of an ADF file.

Checker- This tool will check for errors in the format and content of an ADF file.

Page 76: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MIAMExpress-Array Submission

ArrayExpress & Atlas77

Location and name of each reporter (probe) on the array

Annotation such as the nucleotide sequence or database accession associated with the reporter e.g. RefSeq

Type of reporter (cDNA, oligonucleotide, RNA, DNA etc) that is present

Group of reporter - which are the control and which are the experimental reporters

Optional additional information - this can be used to show that several reporters are associated with the same gene for example, or add a comment about a reporter.

control_biosequence - for example a spike

control_buffer - buffer spotted on the array

control_empty - nothing spotted on the array

control_genomic_DNA - e.g. salmon sperm DNA

control_label - landing lights

control_reporter_size - size standard

control_spike_calibration - spike at varying concentrations

control_unknown_type

Page 77: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MAGE-TAB Example: IDF

ArrayExpress & Atlas78

Page 78: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MAGE-TAB Example: SDRF

ArrayExpress & Atlas79

Page 79: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MAGE-TAB Example: SDRF

ArrayExpress & Atlas80

Page 80: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MAGE-TAB Submission

ArrayExpress & Atlas81

Submitter is directed to either submit an experiment or download a template to the desktop

Submitter is directed to either submit an experiment or download a template to the desktop

Ontology LinkOntology Link

Indicate submission typeIndicate submission type

Page 81: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

MAGE-TAB Submission

ArrayExpress & Atlas82

Page 82: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

What happens after submission?

• Email confirmation• Curation• The curation team will review your submission and will

email you with any questions.• Possible reopening for editing

• We will send you an accession number when all the required information has been provided.

• We will load your experiment into ArrayExpress and provide you with a reviewer login for viewing the data before it is made public.

ArrayExpress & Atlas83

Page 83: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Submission of High-throughput sequencing gene expression data

ArrayExpress & Atlas84

Page 84: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Source Name Material Type Term Source REF Characteristics[Organism] Characteristics[Sex] Characteristics[StrainOrLine] Protocol REFfinch 1 whole_organism MGED Ontology Geospiza fortis male Pinta EXTRACTIONfinch 2 whole_organism MGED Ontology Geospiza fortis male Pinta EXTRACTIONfinch 3 whole_organism MGED Ontology Geospiza fortis male Marchesa EXTRACTIONfinch 4 whole_organism MGED Ontology Geospiza fortis male Marchesa EXTRACTIONfinch 5 whole_organism MGED Ontology Geospiza fortis male Santiago EXTRACTIONfinch 6 whole_organism MGED Ontology Geospiza fortis male Santiago EXTRACTIONfinch 7 whole_organism MGED Ontology Geospiza fortis male Floreana EXTRACTIONfinch 8 whole_organism MGED Ontology Geospiza fortis male Floreana EXTRACTIONfinch 9 whole_organism MGED Ontology Geospiza fortis male Pinzon EXTRACTIONfinch 10 whole_organism MGED Ontology Geospiza fortis male Pinzon EXTRACTION

MAGE-TAB Example: SDRF

Page 85: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress & Atlas86

MAGE-TAB Example: SDRF

Page 86: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

What needs to be included in the Spreadsheet?• Include Assay Name and Technology Type columns

• Raw files must go in the Array Data File column

 

• A sequencing protocol must be provided.• The sequencing protocol should have a performer- this is used as the run center name. • This protocol must have a Protocol Hardware value saying which sequencing instrument was

used (e.g. 454 GS, Illumina Genome Analyzer, AB SOLiD System• Reference this in the Protocol REF column before the Assay Name column.

• These 4 extra Comment[] columns should be added after Extract Name to provide information about how the library was prepared

• 1. Comment[LIBRARY_LAYOUT] - Specifies whether to expect single or paired reads . The column should contain one of the following: SINGLE or PAIRED

• 2. Comment[LIBRARY_SOURCE] - Specifies the type of source material that is being sequenced. The column should contain one of the following: TRANSCRIPTOMIC, METAGENOMIC, SYNTHETIC, VIRAL RNA, OTHER *

• 3. Comment[LIBRARY_STRATEGY] - Sequencing technique intended for the library. The column should contain one of the following: WGS, WCS, WXS, CLONE, CLONEEND, POOLCLONE, FINISHING, AMPLICON, RNA-Seq, EST, FL-cDNA, CTS, ChIP-Seq, MNase-Seq, DNase-Hypersensitivity, Bisulfite-Seq, MRE-Seq, MeDIP-Seq, MBD-Seq, OTHER *

• 4. Comment[LIBRARY_SELECTION] - Method was used to select and/or enrich the material being sequenced. The column should contain one of the following: RANDOM, PCR, RANDOM PCR, RT-PCR, cDNA, CAGE, RACE, ChIP, MNase DNAse, HMPR, MF, MSLL, 5-methylcytidine antibody, MBD2 protein methyl-CpG binding domain, Hybrid Selection, Reduced Representation, Restriction Digest, size fractionation, CF-S, CF-M, CF-H, CF-T, other, unspecified *

ArrayExpress & Atlas87

Page 87: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Submissions- Key Point

Don’t be afraid to ask

ArrayExpress & Atlas88

[email protected] [email protected]

Page 88: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

Summary: ArrayExpress

Search by experiment

Search by keywordSearch by keyword

Link to sample properties and

experiment design

Link to sample properties and

experiment design

View experimentView experiment

Search by gene across experiments

Browse results summaryBrowse results summary

Search by gene name,

species and experimental condition

Search by gene name,

species and experimental condition

View expression

under different conditions and

profiles

View expression

under different conditions and

profiles

Page 89: ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Emma Hastings Functional Genomics Team EBI-EMBL emma@ebi.ac.uk emma/Zagreb

ArrayExpress90

AcknowledgementsArrayExpress/Atlas training  is funded by the European Commission under SLING, grant agreement number 226073 (Integrating Activity) within

Research Infrastructures Action of the FP7 Capacities Specific Programme.