artifacts and effects in gene expression data carlo colantuoni april 12, 2006

67
Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Post on 22-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Artifacts and Effects in Gene Expression Data

Carlo Colantuoni

April 12, 2006

Page 2: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Experimental Artifacts

Page 3: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

~200 microarrays ~100 samples

Nylon

NIA cDNA microarray Core Facility

P33

9600MGC

elements

Page 4: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Uncorrected Intensities: MDS Colored by Batch

Page 5: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Removing The Batch Effect

We Will Use These Dimensions for Additional Corrective Transformations

Much LikeRed:Green Analysis

Page 6: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Uncorrected Intensities: MDS Colored by Batch

Page 7: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Batch Subtracted Measures: MDS Colored by Batch

Page 8: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

MDS of All Array Experiments: Subject Replicates

Page 9: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Hybridization Artifacts

Page 10: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 11: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

A “Simple” Pilot:

2 subjects in rep. = 4 arrays

Differing amounts of dye2-color (reference)

~48,000 probes

Page 12: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

4 arrays: Raw Log Intensities

Page 13: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

4 arrays: Raw Linear Intensities

Page 14: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

1 array: Ratio v. Intensity

Page 15: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

1 array: Ratio v. Intensity

Page 16: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Biological Effects

… or are they?

Page 17: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Big Effects:

Tissue Types and Growth Factor

Treatments

Page 18: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Illumina 24K

Page 19: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 20: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 21: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 22: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 23: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 24: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Smaller Effects:

Correlation of Gene Expression with

Biological Indices

Page 25: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

pH

Page 26: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

PMI

Page 27: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

age

Page 28: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

NylonP33

10K

Page 29: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Illuminacustom

700

Page 30: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 31: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 32: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 33: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 34: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 35: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

More Subtle Effects:

Differential Gene Expression by Genotype

Page 36: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

COMT Val158Met SNP Affects Cognition and Risk for Schizophrenia

COMT enzyme activity

GeneticsCognition & Disease

Risk for Schizophrenia

Working Memory Performance

Patterns of Cortical Activation

Amphetamine & Tolcapone Response

VVVMMM

Page 37: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

p<0.00002

Over-Expression of HSP70 in VV Homozygotes

Page 38: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

VV-VM

Effect of COMT V158M on Gene Expression

NylonP33

10K

Page 39: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

MM-VM

Effect of COMT V158M on Gene Expression

NylonP33

10K

Page 40: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

VV-MM

Effect of COMT V158M on Gene Expression

NylonP33

10K

Page 41: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

VV-VM T-stat

MM

-VM

T-s

tat

Page 42: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Looking Across Multiple Effects: Age and

Genotype

Page 43: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 44: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

N=15 genes across 80 subjects

p<7.34e-13

Page 45: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Alternative Approaches

Page 46: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

COMT Activity as a Function of COMT Genotype

Page 47: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

-0.4 -0.2 0.0 0.2 0.4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Distribution of Observed (black) and Permuted (blue) Correlations (r)

Correlation (r)

Den

sity

Correlation of COMT Activity with Expression

Permuted

Observed

Correlation (r)

N=64

Page 48: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

p<0.000089

r=0.45

Page 49: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

AcknowledgementsClinical Brain Disorders Branch, NIMH, NIH

Daniel Weinberger

Section on NeuropathologyJoel KleinmanThomas Hyde

Tissue ResourcesMary Herman

Amy Deep-SoboslayColleen Lynch

GenotypingRichard Straub

Bhaskar Kolachana

COMT ActivityJingshan ChenSamer Helem

RNA ResourcesJohanna CreswellClaudia AguirreRobert Fatula

Jeet BahraIsha Khan

Debora RothmondBarbara Lipska

Nick BeMariam Khan

National Institute on Drug Abuse, NIH, DHHSWilliam FreedElin Lehrmann

National Institute on Aging , NIH, DHHSKevin BeckerWilliam Wood

Diane Teichberg

Johns Hopkins School of Public HealthDepartment of Biostatistics

Scott ZegerZhianqan TanRafael Irizarry

Giovanni ParmigianiElizabeth Johnson

NHGRI Microarray FacilityAbdel Elkahloun

Iddil Berkov CBDB

Page 50: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 51: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Beyond Individual Genes:Functional Gene Groups

• Borrow statistical power across entire

dataset

• Beyond threshold enrichment

• Systematic patterns throughout the dataset

-0.4 -0.2 0.0 0.2 0.4

01

23

Distribution of Observed (black) and Permuted (red+blue) Correlations (r)

Correlation (r)

Den

sity

Correlation of Age with Gene Expression

Page 52: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Over-Expression of HSP70’s in VV Homozygotes

p<7.42e-08

T statistic

Page 53: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

3 Statistical Tests:

2

Kolmogorov-Smirnov

“Information”

Is THIS …

… Different from THIS?

Page 54: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

histogrambins

E

O

2

ED =

(O-E)2______

2 is the sum of D values where:

All Genes

Subset of Interest

Page 55: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

All Genes

Subset of Interest

Kolmogorov-Smirnov

Page 56: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

All Genes

Subset of Interest

Product of Individual Probabilities

Page 57: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

histogrambins

E

O

2

ED =

(O-E)2______

2 is the sum of D values where: E^0.5DPCA =

O-E______

Page 58: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Dimension #1

Dim

ensi

on #

2

p value

0.0

>0.130

600

540

p<0.001

N = 20Pent.Phos.#30

p<0.032

N = 25Fruc.Mann.#51

p<0.097

N = 94Sphingo-Glycolip.#600

51

p<0.110

N = 96IP3#562

p<0.996

N = 44Pyrimid.Metabo.#240

p<0.999

N = 17Ribo-flavin#740

562

240

p<0.079

N = 3Lipo-Polysacch.#540

p<0.107

N = 4Lys.Biosyn.#300

740

300

Log10 Ratio Z-Score

Pro

port

ion

of G

enes

p<0.079

N = 24Aln.Asp.#252

p<0.133

N = 7C byfolate#670

252

670

N = 89 Gene Subsets

All Genes

The distribution of gene expression values for each gene group is passed to PCA as D^0.5 values and then plotted as a single point in low dimensional space.

Distance from center indicates deviation from distribution of all gene expression values in the microarray experiment

Proximity indicates similarity in the shape of distributions.

ED =

(O-E)2______

E^0.5DPCA =

O-E______

Page 59: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Analysis of Gene Networks

Page 60: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 61: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006
Page 62: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

No Effect of Other COMT SNPs: P3224

Permuted

Observed

T statistic

1/1-1/2N=21 N=30

Page 63: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

Distribution of p-values from Observed (Black) and Permuted Data

p-value

Den

sity

Distribution of p-values

Permuted

Observed

p-value

N=90

Page 64: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

-0.4 -0.2 0.0 0.2 0.4

01

23

Distribution of Observed (black) and Permuted (red+blue) Correlations (r)

Correlation (r)

Den

sity

Permuted

Correlation of Age with Gene Expression

Observed

Correlation (r)

N=90

Page 65: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

-0.45 -0.40 -0.35 -0.30

0.00

0.05

0.10

0.15

0.20

Distribution of Observed (black) and Permuted (red+blue) Correlations (r)

Correlation (r)

Den

sity

Permuted

Observed=

Correlation of Age with Gene Expression

FDR =False Pos.

Total Pos.

Permuted

Observed

Correlation (r)

Page 66: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

Correlation of GFAP Expression with Age

r=0.47

p<0.000002

Age (yr)

Ex

pre

ss

ion

: L

og

(Rat

io)

SD

Un

its

fro

m M

ea

n

(p<0.02)

Page 67: Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006

2 arrays(4 channels)