bioinformatic analysis of ‘omic’ data in genetic...

24
Jornades de consultoria estadística i software II Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies www.creal.cat Jornades de consultoria estadística i software II UAB, Octubre 2013 Juan R Gonzalez BRGE – Bioinformatics Research Group in Epidemiology Center for Research in Environmental Epidemiology (CREAL)

Upload: others

Post on 01-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Jornades de consultoria estadística i software IIUAB, Octubre 2013

Bioinformatic analysis of ‘omic’ datain genetic epidemiology studies

www.creal.cat

Jornades de consultoria estadística i software IIUAB, Octubre 2013

Juan R GonzalezBRGE – Bioinformatics Research Group in EpidemiologyCenter for Research in Environmental Epidemiology (CREAL)

Page 2: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

OMICS

DISEASOME (PHENOTYPE)DISEASOME (PHENOTYPE)

All the disordersand diseases ofan organism,viewed as awhole

www.creal.cat

All the disordersand diseases ofan organism,viewed as awhole

Page 3: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

OMICS

EPIGENOMEchanges in gene expression caused by mechanisms other than DNA sequence

tissue and time especific

EPIGENOMEchanges in gene expression caused by mechanisms other than DNA sequence

tissue and time especific

TRANSCRIPTOMEgene expression (RNA)tissue and time especific

TRANSCRIPTOMEgene expression (RNA)tissue and time especific

GENOMEhereditary information (DNA)

stable>99% equal between individuals

1.5% coding genes

GENOMEhereditary information (DNA)

stable>99% equal between individuals

1.5% coding genes

EXPOSOMEdynamic

diet, metalls, air pollution, stress…

EXPOSOMEdynamic

diet, metalls, air pollution, stress…

www.creal.cat

METAGENOME(metatranscriptome, virome…)

bacteria and virus1-3% body’s mass

trilions of microorganisms

METAGENOME(metatranscriptome, virome…)

bacteria and virus1-3% body’s mass

trilions of microorganisms

PROTEOMEtissue and time especific

PROTEOMEtissue and time especific

TRANSCRIPTOMEgene expression (RNA)tissue and time especific

TRANSCRIPTOMEgene expression (RNA)tissue and time especific

DISEASOME (PHENOTYPE)DISEASOME (PHENOTYPE)

METABOLOMEtissue and time especificMETABOLOMEtissue and time especific

Page 4: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

OMICS

Why OMICS?

To identifybiomarkers

exposuredisease

To understandmolecularmechanims

www.creal.cat

To identifybiomarkers

exposuredisease

Page 5: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Metabolomic data [serum/plasma] Mass-spectometry data (up to 2500 metabolites)

Proteomic data [plasma] Quantification of up to 30 proteins

Transcriptomic data [DNA whole blood] Gene expression and splice variant expression Also information about regulatory non-coding RNAs(snc-RNAs and miRnAs)

Epigenomic data [DNA whole blood] DNA methylation data

Genomic data [DNA whole blood] (Existing GWAS) SNP array data

OMICS

www.creal.cat

Metabolomic data [serum/plasma] Mass-spectometry data (up to 2500 metabolites)

Proteomic data [plasma] Quantification of up to 30 proteins

Transcriptomic data [DNA whole blood] Gene expression and splice variant expression Also information about regulatory non-coding RNAs(snc-RNAs and miRnAs)

Epigenomic data [DNA whole blood] DNA methylation data

Genomic data [DNA whole blood] (Existing GWAS) SNP array data

Page 6: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

SNP array data (GWAS)

Imputed will be analyzedSNP -> 4,000,000 markers

GENOMICS

www.creal.cat

Can bepredictedusing existingGWAS data

Page 7: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

samplesge

nes

CasesControls

MOLECULAR PROFILES

www.creal.cat

gene

s

Pathway 1

Page 8: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Data pre-processing Quality control (filtering) Normalization (control for batcheffect)

Statistical analysis Clustering and enrichment/pathwayanalysis Visualization and Annotation

PIPELINE FOR OMIC DATA

www.creal.cat

Data pre-processing Quality control (filtering) Normalization (control for batcheffect)

Statistical analysis Clustering and enrichment/pathwayanalysis Visualization and Annotation

Page 9: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Step 2: Statistical Analysis

RNA-seq: 0, 1, 2, ….., 2456, …, 34567, …Generalized linear models(Negative Binomial)

Transcriptome

www.creal.catMicroarray

RNA-seq

Microarrays: 6.1, 6.9, 12.4, 11.4, 8.5, …Generalized linear models(Gaussian)

Page 10: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Step 2: Statistical Analysis

Peaks: 11234, 1353, 1234, 12, 455, 122, …Clustering methods(Non-parametric)

Metabolome

www.creal.cat

CpG islands: 1, 0, 1, 1, 0, 1, 0, …Beta regression(Beta distribution)

Epigenome

Page 11: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Step 3: Enrichment/pathway analysis

Gene set (Under- or over-expressed)

www.creal.cat

Page 12: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

GO, KEGG, “Exposure-related”, …

Step 3: Enrichment/pathway analysis

www.creal.cat

Page 13: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Step 4: Annotation and visualization

www.creal.cat

Page 14: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Step 4: Annotation and visualization

5

2

0

2

Z-value

www.creal.cat

Controls

Cases

5

Page 15: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

BMI

SCZAsthma-obesity

Step 4: Annotation and visualization

www.creal.cat

IQ

BMIAsthma-obesity

Page 16: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Softwarewww.bioconductor.org

www.creal.cat

Page 17: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

GENOMet Network (IP Juan R González, MTM2010-09526-E)

www.creal.cat/jrgonzalez/software.htm

Software

www.creal.cat

Page 18: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Bioinformatics & Data Analysis

Software and statistical methods – SNP arraysSNPassoc (CRAN) – paper 190 cites [Bioinformatics] [SNPs]CNVassoc (CRAN) – 4th most viewed paper [BMC Genomics] [CNVs]R-GADA (R-forge) – Highly accessed [BMC Bioinformatics] [CNVs]inveRsion (Bioconductor) – Highly accessed [BMC Bioinformatics] [Inversions]invClust[Bioinformatics] [Inversions]BayesGen (CRAN) [Statistics in Medicine] [CNVs & SNPs]MLPAstats (CRAN) [BMC Bioinformatics] [CNVs]MAD (R-forge) [BMC Bioinformatics] [Mosaicisms]

Used to analyzed ~58,000 genomes [Nat Gen, 2012]

Software and statistical methods – SequencingtweeDEseq (Bioconductor) [BMC Bioinformatics] [RNAseq]GRIAL [Bioinformatics] [Inversions]MADseq (Bioconductor) [In progress] [Mosaicisms]RASP (Bioconductor) [In progress] [Alternative Splicing]

www.creal.cat

Software and statistical methods – SNP arraysSNPassoc (CRAN) – paper 190 cites [Bioinformatics] [SNPs]CNVassoc (CRAN) – 4th most viewed paper [BMC Genomics] [CNVs]R-GADA (R-forge) – Highly accessed [BMC Bioinformatics] [CNVs]inveRsion (Bioconductor) – Highly accessed [BMC Bioinformatics] [Inversions]invClust[Bioinformatics] [Inversions]BayesGen (CRAN) [Statistics in Medicine] [CNVs & SNPs]MLPAstats (CRAN) [BMC Bioinformatics] [CNVs]MAD (R-forge) [BMC Bioinformatics] [Mosaicisms]

Used to analyzed ~58,000 genomes [Nat Gen, 2012]

Software and statistical methods – SequencingtweeDEseq (Bioconductor) [BMC Bioinformatics] [RNAseq]GRIAL [Bioinformatics] [Inversions]MADseq (Bioconductor) [In progress] [Mosaicisms]RASP (Bioconductor) [In progress] [Alternative Splicing]

Page 19: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Limitations

Data storage Computing time

www.creal.cat

Data storage Computing time

Page 20: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Bioinformatics & Data Analysis

Infrastructure @ CREAL

3 workstations (one more by the end of the next month)WK1: CPU with 2GHz and 32G of RAM memory (8 cores)WK2: CPU with 2GHz and 32G of RAM memory (8 cores)WK3: CPU with 2GHz and 64G of RAM memory (24 cores)

Disk space1Tb (WK1) + 2Tb (WK1) + 6Tb (WK1)+ External device 14Tb

www.creal.cat

Infrastructure @ CREAL

3 workstations (one more by the end of the next month)WK1: CPU with 2GHz and 32G of RAM memory (8 cores)WK2: CPU with 2GHz and 32G of RAM memory (8 cores)WK3: CPU with 2GHz and 64G of RAM memory (24 cores)

Disk space1Tb (WK1) + 2Tb (WK1) + 6Tb (WK1)+ External device 14Tb

Page 21: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Limitations

Cloud computing ???

www.creal.cat

Page 22: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Challenges

www.creal.cat

Page 23: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Challenges

www.creal.cat

Page 24: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies

Bioinformatics & Data Analysis

Human Resources

Luis Pérez-JuradoArmand Gutiérrez

Tonu EskoEva ReinmaaLili MilaniAndreas Mestpalu

www.creal.cat

Luis Pérez-JuradoArmand Gutiérrez

Benjamín Rodríguez-Santiago

Tonu EskoEva ReinmaaLili MilaniAndreas Mestpalu

Marta PuigMario Cáceres