gxdb a universal tool to collect, analyse , manage and visualize transcriptomic data

31
GxDb a universal tool to collect, analyse, manage and visualize transcriptomic data Wolfgang Raffelsberger, Raymond Ripp and Laetitia Poidevin BingGi Days January 2010

Upload: saxton

Post on 26-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

GxDb a universal tool to collect, analyse , manage and visualize transcriptomic data. Wolfgang Raffelsberger, Raymond Ripp and Laetitia Poidevin. BingGi Days January 2010. Introduction. What is transcriptomic ? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

GxDb a universal tool to collect, analyse, manage and

visualize transcriptomic data

Wolfgang Raffelsberger, Raymond Ripp and Laetitia Poidevin

BingGi DaysJanuary 2010

Page 2: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

What is transcriptomic ?

-> a high throughput analysis of gene expression by measuring the amount of mRNA

What are the techniques ?

-> DNA microarrays-> SAGE-> Differential Display-> ….

=> large quantities of data

GxDb: integrative tool to

Introduction

collecttreatanalyzemanage visualize

Page 3: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

GxDb is a website and a database

Page 4: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Organization of data in GxDb

SampleSample

Individual• name• age• description

Individual• name• age• description

OrganismOrganism

GenotypeGenotypeTissueTissue

TreatmentTreatment

SampleConditionSampleCondition

ex: mouse wt aged 9 dayex: mouse wt aged 9 day

ArraytypeArraytype ex: Mouse430_2ex: Mouse430_2

Page 5: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

ArraytypeArraytype

RealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4

CEL file r3CEL file r3

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7

CEL file r6CEL file r6

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

Organization of data in GxDb

ex: Mouse430_2ex: Mouse430_2

ex: wt_d9ex: wt_d9ex: wt_d9ex: wt_d9

ex: wt_d11ex: wt_d11

ex: wt_d13ex: wt_d13

ex: wt_d15ex: wt_d15

Page 6: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Organization of data in GxDb

ArraytypeArraytype

RealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4

CEL file r3CEL file r3

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7

CEL file r6CEL file r6

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

Experiment

ArraytypeArraytypeRealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2CEL file r1CEL file r1

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4CEL file r3CEL file r3

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7CEL file r6CEL file r6

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

ExperimentSignal Intensity

Ratio

Cluster

≠ expressed genes

Quality

Treatment and Analysis

protocol

Treatment and Analysis

protocol

Page 7: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

1) Normalization 6 methods: RMA, gcRMA, dChip, MAS5.0, plier, vsn

=> signal intensity

2) Calculate average (between replicats) and ratio

3) Filtering - Eliminate probesets that are never expressed in all arrays of one experiment based on distribution or call (according to normalization method) - Eliminate probesets with very low changes between condition et reference

based on fold change based on standard deviation

4) Statistical analysis - method: t-test combined with empirical bayes for shrinkage - estimation of FDR (false discovery rate) - tag probesets with differential expression (automatic threshold findings)

Treatment and Analysis protocol

Page 8: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Treatment and Analysis protocol

1) Normalization 2) Calculate average (replicats) and ratio 3) Filtering4) Statistical analysis

5) Clusteringtool: Cluspackmethods: k-means (DPC) Mixtures models (aic and bic)

=> clusters

6) Quality Control Reporttool: RReportGenerator for Automatic Statistical AnalysisAutomatic Statistical Analysis to estimate the quality of arrays

Page 9: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload form

Page 10: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 1: Selection of Arraytype and Experiment

Page 11: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 1

Create your new experiment

Page 12: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Organism

Genotype

SampleCondition

Individual

TreatmentType

Treatment

Tissue

Sample

Upload formStep 1

Create your news samples

Page 13: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 1: Selection of Arraytype and Experiment

Page 14: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 2: Upload of .cel files

Page 15: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 3: Select the corresponding sample to each cel file

Page 16: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 4: Select the interesting comparisons to calculate ratio

Ratio:Condition / reference

Example:C3H_rd1_d10 / C3H_wt_d10

Page 17: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 5: Launch Treatment and Analysis protocol

Page 18: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Upload formStep 5: Clustering, Quality analysis and loading in database

Page 19: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Signal IntensityRatio

≠ expressed gene

Clustering

RealExp

Organization of data in GxDb

QualitySample

Experiment

Cel file

Arraytype-Probeset

Page 20: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Query GxDb

Page 21: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Query GxDb

Experiment

Probeset

Sample

RealExpSignal Intensity

RatioCluster

Page 22: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

time-co

urse

of re

tinal d

evelo

pm

en

t

Visualization in GxDb

Page 23: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

GxDb WebsiteUpload

Querying Display

alnitak

Star3

Star4

Star5

Star6

Star7

Star8

/GxData

GxDb SQL database

http://gx.igbmc.frWeb Services

Café des sciences QSub

Ordonnanceur

GxDb ressourcesLanguages used:

PHP (HTML) - Upload - PipeWork - RadarGenerator - Fed

R - Treatment and analysis protocol - RReportGenerator

SQL

Tcl - Gx (~ Gscope) - Probeset loading

C - Cluspack

Page 24: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Conclusion and Prospects• Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis

=> Comparisons => Analyse the strengths and weaknesses of the different protocols

• Improvement of website • More user friendly• Visualization of clusters, ratio• Tools for meta-analysis

• Possibility of upload data directly from GEO

• Diagnostic report to analyze easier the data

• Links to others databases and tools: STRING, GSEA..

Page 25: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data
Page 26: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

Ratio Pipework

Organism

Normalization

Ratio minimumRatio maximum

Page 27: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

• Integration and storage in a unifying format

• Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis

=> Comparisons => Analyse the strengths and weaknesses of the different protocols

• Facilitated querying and data visualization

Advantages of GxDb

Page 28: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data
Page 29: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

ArraytypeArraytype

RealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

ArraytypeArraytype

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4

CEL file r3CEL file r3

ArraytypeArraytype

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7

CEL file r6CEL file r6

ArraytypeArraytype

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

GxDb transcriptomics

Page 30: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

PROBESET 3• probeset_id• genename• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane

PROBESET 2• genename• probeset_id• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane

Experiment Experiment

ArraytypeArraytype

RealExp 1RealExp 1

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2CEL file r1CEL file r1

ArraytypeArraytype

RealExp 2RealExp 2

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

ArraytypeArraytype

RealExp 3RealExp 3

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

ArraytypeArraytype

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

ArraytypeArraytype PROBESET• probeset_id• genename• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane

45000

SampleSample

Individual• name• age• description

Individual• name• age• description

OrganismOrganism GenotypeGenotype

TissueTissue

TreatmentTreatment

SampleConditionSampleCondition

Signal Intensity

Ratio

Cluster

Page 31: GxDb a universal tool to collect,  analyse , manage and visualize  transcriptomic  data

already exists ?

Arraytypes

Createnew Arraytype

already exists ?

Sample

Create new Sample with• existing or new Individual• existing or new Organism• existing or new Tissues• existing or new Genotype• existing or new Treatment

• Upload your .CEL files

• Enter their association to Arraytypes and Samples

• Define Couples of RealExpsfor the Ratio Calculation

• Fill in the other information for the Experiment

Run Automatic AnalysisQuery and Display Results

GxDb protocol from upload to display

Quality Report

Signal Intensity

Ratio

Cluster

Differentially Expressed

Genes