bioinformatics expression profiling and functional genomics part i: preprocessing ad 29/10/2006

95
Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Upload: tobias-matthews

Post on 06-Jan-2018

234 views

Category:

Documents


1 download

DESCRIPTION

Overview MICROARRAY PREPROCESSING Gene expression Omics era Transcript profiling Experiment design Preprocessing Exercises

TRANSCRIPT

Page 1: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Bioinformatics

Expression profiling and functional genomics

Part I: PreprocessingAd 29/10/2006

Page 2: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• http://www.esat.kuleuven.ac.be/~kmarchal/• Course material: course notes + powerpoint files• Exercises

Page 3: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Overview

MICROARRAY PREPROCESSING

• Gene expression

• Omics era

• Transcript profiling

• Experiment design

• Preprocessing

• Exercises

Page 4: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

mRNA

DNA

transcriptiontranscription

translationtranslation

+1+1

protein

protein

Gene expression

Page 5: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Adaptation of cell to its environment

FNR box cytN cytO cytQ cytP

??

Bacterial cell

ininoutout

Signal 1Signal 2Signal 2

Adaptation of a cell: response on environmental signalsresponse to e.g. hormones (cell differentiation)

Cellular response determined by the genes which are switched on upon a signal

Gene expression

Page 6: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Action of genetic networks underlie the observed phenotypical behavior

Gene expression

Page 7: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Overview

MICROARRAY PREPROCESSING

• Gene expression

• Omics era

• Transcript profiling

• Experiment design

• Preprocessing

• Exercises

Page 8: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Functional genomics

Structural Genomics

Comparative Genomics

Page 9: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Traditional molecular biology – Directed toward understanding the role of a particular gene or

protein in a molecular biological process– Northern analysis– Mutational analysis– Expression by reporter fusions

Omics era Measurement of the expression of 1000 of genes, proteins

simultaneously

Omics era

– The function or the expression of a gene in a global context of the cell

– Holistic approaches allow better understanding of fundamental molecular biological processes

Because a gene does not act on its own, it is always embedded in a larger network (systems biology)

Page 10: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Detection Reference Test

Reference sample Test sample

RNA RNA

cDNA cDNA

transcriptomics

mRNA

DNA

transcriptiontranscription

translationtranslation

+1+1+1+1

proteinprotein

protein

Omics era

Page 11: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

proteomics

mRNA

DNA

transcriptiontranscription

translationtranslation

+1+1+1+1

proteinprotein

protein

Omics era

Page 12: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

metabolomics

Omics era

Page 13: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

SYSTEMS BIOLOGYConsider the cell as a system

Omics era

Page 14: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

SYSTEMS BIOLOGY

Mechanistic insight in the biological system at molecular biological level

High throughput data

Omics era

Page 15: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• analysis of such large scale data is no longer trivial => computational challenges– Low signal/ noise– High dimensionality

• Simple spreadsheet analysis such as excel are no longer sufficient

• More advanced datamining procedures become necessary

• Another urgent problem is also how to store and organize all the information.

Bioinformatics

Omics era

Page 16: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Overview

MICROARRAY PREPROCESSING

• Gene expression

• Omics era

• Transcript profiling– Principle of microarray – Applications

• Experiment design

• Preprocessing

• Exercises

Page 17: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Detection Reference Test

Reference sample Test sample

RNA RNA

cDNA cDNA

transcriptomics

Transcript profiling

Page 18: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Previously: measure expression level of one gene:Northern blot analysis

• Novel techniques: measure expression level of all genes simultaneously => EXPRESSION PROFILING

Principle: hybridisation

mRNA: 5’ –UGACCUGACG- 3’

cDNA 3’ -ACTGGACTGC-5’

Hybridize : stick together

Transcript profiling

Page 19: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Monitor molecular activities on a global level– protein levels proteomics, – enzyme activities– Metabolites– gene expression (mRNA), transcriptomics = transcript profiling

allows to gain a general insight in the global cell behavior (holistic)

Molecular biological methods

– RT-PCR

– SAGE

– Protein arrays

– Microarray analysis

Transcript profiling

Page 20: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Transcript profiling

cDNA clones

Printing slides

SLIDE PRODUCTION

Experiment design

Sample preparation

Hybridization & scanning

cDNA µA EXPERIMENT

DATA ANALYSIS

EXPERIMENTAL PROCEDURES

Page 21: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

cDNA array

Spotted cDNA Glass side

Upscaled Northern hybridisation

++11

Gene (DNA)

Transcript (mRNA)

cDNA

Transcript profiling

Page 22: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Preparation of probes

• Collect cDNA clones

• Amplify target cDNA insert by PCR

• Check yield & specificity by electrophoresis

Spot + PCR products on glass slides

Transcript profiling

Page 23: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Detection Reference Test

Reference sample Test sample

RNA RNA

cDNA cDNA

Transcript profiling

Page 24: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Signal 1 Signal 2Signal 2

2. mRNA isolation2. mRNA isolation

3. labeling3. labeling

4. Hybridization + washing4. Hybridization + washing 5. scanning5. scanning 6. Image analysis6. Image analysis

numerical value

1. Cell culture1. Cell culture

Transcript profiling

Page 25: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

http://www.bio.davidson.edu/courses/genomics/chip/chip.html

Transcript profiling

Page 26: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Superimposed color image

* Transform into color images

* Superimpose color images from R and G channel

good alignment bad alignment

Transcript profiling

Page 27: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

black spots : gene was neither expressed in test nor in control sample

green : gene was only expressed in control sample

red : gene was only expressed in test sample

yellow : gene was expressed both in test and in control sample

Superimposed color image

Transcript profiling

Page 28: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Signal intensity is proportional with the amount of cDNA present in the samplesignal cy3 -> numerical valuesignal cy5 -> numerical value

Data analysis

Image analysis

Transcript profiling

Page 29: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Transcript profiling

Data representation

Gene profileExperiment profile

Page 30: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Spotted DNA microarray High density oligonucleotide array

Transcript profiling

Page 31: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Overview

MICROARRAY PREPROCESSING

• Gene expression

• Omics era

• Transcript profiling

• Experiment design

• Preprocessing

• Exercises

Page 32: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Depending on experimental design other mathematical approach

• Comparison of 2 samples (black/white)

• Comparison of multiple arrays

• Global dynamic profiling

• Static experiment: Comparison of samples (mutants, patients)

Experiment Design

Page 33: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Type1: Comparison of 2 samples

Statistical testing

Control sample

Induced sample

Retrieve statistically over or under expressed genes

2 sample design

Experiment Design

Page 34: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

black/white experiment description (array V mice genes)

• Condition 1 : pygmee mouse 10 days old (test)• Condition 2 : normal mouse 10 days old (ref)

detect differentially expressed genes

Experiment design (Latin Square)

Condition 1Dye1Replica L

Condition 1dye1Replica R

Condition 2dye2Replica L

Condition 2dye2Replica R

Condition 2dye1Replica L

Condition 2dye1Replica R

Condition 1dye2Replica L

Condition 1dye2Replica R

Array 1

Array 2

Per gene, per condition 4 measurements available

Experiment Design

Page 35: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Measure expression of all genes

• During time (dynamic profile)

• In different conditions

Identify coexpressed genes

Identify mechanism of coregulation

Motif Finding

Clustering

Multiple array design

Experiment Design

Page 36: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Original dataset : 6178 genes

Preprocessing:• select 4634 most variable (25 % most variable)• variance normalized• adaptive quality based clustering (32 clusters) (95%)

Multiple array design• Study of Mitotic cell cycle of Saccharomyces cerevisiae with oligonucleotide

arrays (Cho et al.1999) - 15 time points (E=18)• time points 90 & 100 min deleted (Zhang et al. 1999, Tavazoie et al., 1999)

Experiment Design

Page 37: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Reference: unsynchronized cells• Condition: synchronized cells during cell cycle at

distinct time intervals

Condition 1

Dye1

Replica L

Condition 2

Dye1

Replica L

Condition 3

Dye1

Replica L

Condition 4

Dye1

Replica L. …

Condition 19

Dye2

Replica L

Condition 19

Dye2

Replica L

Condition 19

Dye2

Replica L

Condition 19

Dye2

Replica L

Array 1

Reference design: e.g. Spellman dataset

Experiment Design

Page 38: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Loop design

Experiment Design

Page 39: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Overview

MICROARRAY PREPROCESSING

• Gene expression

• Omics era

• Transcript profiling

• Experiment design

• Preprocessing

– Sources of Variation

– General normalization steps

– Slide by slide normalization

– ANOVA normalization

Page 40: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Sources of variation– Overshine effects– Dye effect– Spot effects– Array effect

Consistent errors

• Consistent errors complicate direct comparison of measurements of the same gene/condition

• Consistent errors need to be removed by preprocessing/normalization

Preprocessing

• Tedious• Influences downstream measurements

Page 41: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Signal 1 Signal 2Signal 2

2. mRNA isolation2. mRNA isolation

3. labeling3. labeling

4. Hybridization + washing4. Hybridization + washing 5. scanning5. scanning 6. Image analysis6. Image analysis

numerical value

1. Cell culture1. Cell culture

Preprocessing

Dye effectDye effect

Page 42: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Dye, condition effect: within slide variation

Measurement error: – Preparation mRNA– Labeling &reverse transcription

Normalization

Global normalization assumption

Overall signal in one channel more pronounced than in other channel

0)/(log2 reftest

Preprocessing

Page 43: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Signal 1 Signal 2Signal 2

2. mRNA isolation2. mRNA isolation

3. labeling3. labeling

4. Hybridization + washing4. Hybridization + washing 5. scanning5. scanning 6. Image analysis6. Image analysis

numerical value

1. Cell culture1. Cell culture

Preprocessing

Array effectArray effect

Page 44: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• normalization within slide

• ratio

Differences in global intensity between slides

Comparison between slides impossible

Array effects: between slide variation

Preprocessing

Hybridization differences

Page 45: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Array effects: Between slide variation

-7

-5

-3

-1

1

3

5

7

1

Q1

maxvalue

minvalue

Q3

-7

-5

-3

-1

1

3

5

7

1

Ser ies1

Ser ies2

Ser ies3

Ser ies4

-9

-7

-5

-3

-1

1

3

5

7

1 Ser ies1

Ser ies2

Ser ies3

Ser ies4

Preprocessing

Page 46: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Measurement error: Different quantity of DNA in spot

Difference in duplicate spots

Ratio: compare differential expression between genes

Spot effect

Absolute levels between genes incomparable

Gene 1: test: 4 ref:2 R/G:2

Gene 2: test: 8 ref:4 R/G:2

Pin main effects: spot effects

Preprocessing

Page 47: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Non specific signal Cy5 or Cy3 resulting from overshining= emission from neighboring spots

Overshine effects: within slide variation

Preprocessing

Background intensity increases with the intensity of the neighboring spots

Page 48: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Removing sources of variation is obligatory step

• To make comparisons within a slide possible• E.g. find differentially expressed genes

• To allow interslide comparisons• E.g. combining the replica’s of the original experiment and the color flip

Preprocessing

Page 49: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

OverviewMICROARRAY PREPROCESSING

• Gene expression

• Omics era

• Transcript profiling

• Experiment design

• Preprocessing– Sources of Variation

– General normalization steps

– Slide by slide normalization

– ANOVA normalization ANOVA

Page 50: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ANOVA based

Filtering

Linearisation

Bootstrapping

Log transformation

Array by array approach

Filtering

normalization

Ratio

Test statistic (T-test)

Log transformation

Preprocessing

Background corrBackground corr

Page 51: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Background correction compensates for overshining• Background correction is considered additive

Preprocessing: Background correction

Background correction

Page 52: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ANOVA based

Filtering

Linearisation

Bootstrapping

Log transformation

Array by array approach

Filtering

normalization

Ratio

Test statistic (T-test)

Log transformation

Preprocessing

Background corrBackground corr

Page 53: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• additive error: independent on the measured intensity the absolute level of the error remains the same (at low levels high relative error, at high expression levels low relative error).

• multiplicative error: the error increases with the measured intensity (at high levels high relative error)

Multiplicative error

Preprocessing: log transformation

Page 54: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

LOG2 transformed intensity values: Multiplicative effects removed, additive effects more pronounced

residuals are constant at high intensities

Additive error: error increases as the signal is lower (intuitively plausible)

Preprocessing: log transformation

Page 55: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Preprocessing: log transformation

Page 56: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Log (test/ref) = log2(test)-log2(ref): • upregulation range 0…+infinity• downregulation range 0…-infinity

2 fold overexpression2 fold underexpression

Ratio = 2Ratio = 0.5

Log2(Ratio) = 1Log2(Ratio) = -1

• ratio (test/ref) test>ref upregulation range 1…+infinity• test<ref downregulation range 0...1: range of downregulation squashed

Why log2

Preprocessing: log transformation

Page 57: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ANOVA based

Filtering

Linearisation

Bootstrapping

Log transformation

Array by array approach

Filtering

normalization

Ratio

Test statistic (T-test)

Log transformation

Preprocessing

Background corrBackground corr

Page 58: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Spots are identified by Image analysis– Array Vision– ImaGene– Matarray

Spot detection and signal acquisition

e.g. Signal is definedMean pixel intensity of all pixels in a spot for which the Intensity is higher than the local background + 2SD

• Spots can have different qualities– Irregular spots– Spots with excessive large diameter– Spots which are extremely small

artifacts

Preprocessing: filtering

Page 59: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Red >0.1 stdevGreen >1 stdevBlue >2 stdev

Preprocessing: filtering

Page 60: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Filtering:

Zero values: treat these separatelyratiolog transformation

Zero values: black white experiment interestinggenes off in condition 1 versus on in condition 2

Undefined

Preprocessing: filtering

Page 61: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Some genes only labeled with green dye , not with red dye• If no mRNA of a gene is present, the green dye binds aspecifically to a spot?

color flip essential to eliminate false positives

Seemingly underexpressed

cloneIdexp1 LroodItest RroodItest LgroenIref RgroenIref26635 2106.563 0 101692.979 10399.82227141 836.407 0 123838.567 45432.93127500 803.205 0 111507.935 72379.88728152 0 1331.273 9263.894 14005.90528333 0 1255.175 87102.68 9188.58728756 363.247 0 115771.253 88541.34330694 924.256 0 22029.599 50306.219

cloneIdexp2 LgroenItestRgroenItest LroodIref RroodIref26635 14376.307 12190.883 0 995.69427141 14804.307 13242.277 1315.193 762.17227500 22051.507 18835.761 0 028152 29270.26 26939.077 90.713 3402.7328333 25964.137 22326.256 0 028756 14270.607 20442.069 0 1007.76330694 20150.615 19003.462 4750.326 7988.791

Preprocessing: filtering

Page 62: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

MICROARRAY PREPROCESSING

• Gene expression

• Omics era

• Transcript profiling

• Experiment design

• Preprocessing– Sources of Variation

– General normalization steps

– Slide by slide normalization

– ANOVA normalization

Overview

Page 63: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ANOVA based

Filtering

Linearisation

Bootstrapping

Log transformation

Array by array approach

Filtering

normalization

Ratio

Test statistic (T-test)

Log transformation

Preprocessing

Background corrBackground corr

Page 64: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• On average ratio red/green should be 1

– Rescale based on average of housekeeping genes

– Rescale based on spikes

– Rescale based on average expression value of the full array (global normalization)

• Methods used for normalization

– linear normalization

– Intensity dependent normalization

Preprocessing: normalization

Page 65: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Linear Normalization

G

R

G

R

Preprocessing: normalization

Page 66: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

– Red and green related by a constant factor– Calculate factor by linear regression

Log2(ratio)0 Log2(ratio)0

• Linear normalization factor determined by linear regression

• Filtering to remove outliers in the non-linear range (green values)

•http://afgc.stanford.edu/~finkel/talk.htm

Preprocessing: normalization

Page 67: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Linear normalization not straightforward,…

Log2

(R/G

)

(Log2(R) + Log2(G))/2

Linear fit

Lowess fit

Preprocessing: normalization

Page 68: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Non-linear intensity dependent normalization

Lowess (Dudoit et al., 2000) : genes seemingly underexpressed due to specific dye effect will be compensated for

Log R and log G recalculated based on the lowess fit

Lowess linearizes and normalizes the data !!!!!

Preprocessing: normalization

Page 69: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Intensity dependent normalization

Preprocessing: normalization

Page 70: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Result of the normalizationA. Before normalization

-7

-5

-3

-1

1

3

5

7

1

Q1

maxvalue

minvalue

Q3

-7

-5

-3

-1

1

3

5

7

1

Ser ies1

Ser ies2

Ser ies3

Ser ies4

-9

-7

-5

-3

-1

1

3

5

7

1 Ser ies1

Ser ies2

Ser ies3

Ser ies4

B. After normalization

RATIO1_NORM

-6

-4

-2

0

2

4

6

1

Q1

maxvalue

minvalue

Q3

RATIO2_NORM

-6

-4

-2

0

2

4

6

1

Q1

maxvalue

minvalue

Q3

RATIO3_NORM

-6

-4

-2

0

2

4

6

1

Q1

maxvalue

minvalue

Q3

Preprocessing: normalization

Page 71: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ANOVA based

Filtering

Linearisation

Bootstrapping

Log transformation

Array by array approach

Filtering

normalization

Ratio

Test statistic (T-test)

Log transformation

Preprocessing

Background corrBackground corr

Page 72: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Compensates for spot effects

• Choice of the reference important

– Intuitive reference:• First time point• Uninduced sample

– Independent reference (reference design)• Tissue mixture

Intuitive interpretation possible

Ratio often undefined

interpretation complicated

Ratio defined

Preprocessing: ratio

Page 73: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Log ratio: • upregulation range 0…+infinity• downregulation range 0…-infinity

2 fold overexpression2 fold underexpression

Ratio = 2Ratio = 0.5

Log2(Ratio) = 1Log2(Ratio) = -1

• ratio (R/G): • R>G upregulation range 1…+infinity• R<G downregulation range 0...1: range of downregulation squashed

Preprocessing: ratio

Page 74: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ANOVA based

Filtering

Linearisation

Bootstrapping

Log transformation

Array by array approach

Filtering

normalization

Ratio

Test statistic (T-test)

Log transformation

Preprocessing

Background corrBackground corr

Page 75: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Overview further analysis

Raw data

Preprocessed data

Differentially expressed genes

Clusters of coexpressed

genes

Preprocessing

ClusteringTest statistic

Page 76: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ANOVA based

Filtering

Linearisation

Bootstrapping

Log transformation

Array by array approach

Filtering

Normalization

Ratio

Test statistic (T-test)

Log transformation

Background corrBackground corr

Preprocessing

Page 77: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

I. MAIN EFFECTS + EFFECT OF INTEREST

Overall mean

Array effect

(hybridisation effciency)

Condition effect

(mRNA isolation effciency)

Gene effect

Constitutive level of gene

GC effect

Differential expression due to the altered variety

Dye effect

(labeling efficiency)

ijnmkijmnjiijnmk GCDACGy

Model the expression level of each as a combination of the different factorsLeast squares fit:

• subject to restrictions

• contrast of interest: estimate (GC)i1 – (GC)i2

MultiFactor, Linear, fixed levels

Preprocessing: ANOVA

Page 78: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Assumption:

Independent, additive error ~F where F is a distribution with mean and variance 2

ijnmkijmnjiijnmk GCDACGy

Plot the residualsyestimated - ymeasured

Estimated intensity

Preprocessing: ANOVA

Page 79: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

I. MAIN EFFECTS + EFFECT OF INTEREST

Analysis of variance shows relative contribution of each of the effects

ijnmkijmnjiijnmk GCDACGy

Explains the relative contribution of each of these effects

Preprocessing: ANOVA

Page 80: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Advantages:

• Gains more information with less observations=> derives variation from all measurements made (less replica’s required e.g. array effect based on N-1 gene measurements)

• Statistical testing: estimated error can be used for bootstrapping to estimate confidence levels

• No ratio’s requiredRequirements:

• Requires knowledge about experimental effects• Model used implicates that all effects and combinations of

effects should be linear• Bootstrapping: residuals should be normally distributed around

zero with constant variance

Preprocessing: ANOVA

Page 81: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ijnmkijmnjiijnmk GCDACGy

Estimate error

Simulate new datasets based on estimated error (3000 times)

Calculate factor of interest (GC effect) for each bootstrapped dataset (recalculate ANOVA)

Calculate CI on (GC1-GC2) of N genes based on 3000 bootstraps

Use this interval to test for significant genes

ijnmkijnmboot yy

0GC1-GC2

ANOVA Bootstrap analysis

Preprocessing: ANOVA

Page 82: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006
Page 83: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

DATA• Filtered for zero values• set 1: unnormalised data

MODELS (Kerr et al. 2000, 2001)• Model 1 (no spot effects)• Model 2 (spot effects independent)• Model 3 (spot effects dependent)

MODELS• GC effects not confounded with the spot effects• type of model does influence the (residual error)=> Does influence the bootstrap interval

More Arrays Simulaneously Preprocessing

Page 84: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

DATA• Filtered for zero values• set 1: unnormalised data

MODELS (Kerr et al. 2000, 2001)• Model 1 (no spot effects)• Model 2 (spot effects independent)• Model 3 (spot effects dependent)

MODELS• GC effects not confounded with the spot effects• type of model does influence the (residual error)=> Does influence the bootstrap interval

More Arrays Simulaneously Preprocessing

Page 85: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

I. MAIN EFFECTS + EFFECT OF INTEREST

Overall mean

Array effect

(hybridisation effciency)

Condition effect

(mRNA isolation effciency)

Gene effect

Constitutive level of gene

GC effect

Differential expression due to the altered variety

Dye effect

(labeling efficiency)

ijnmkijmnjiijnmk GCDACGy

More Arrays Simulaneously Preprocessing

Page 86: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Least squares fit:• subject to restrictions• contrast of interest: estimate (VG)k1g – (VG)k2g• Usual confidence intervals based on normal theory not appropriate

Bootstrap analysis of residuals avoid making distributional assumptions about error

Assumption:

Independent, additive error ~F where F is a distribution with mean and variance 2

ijnmkijmnjiijnmk GCDACGy

More Arrays Simulaneously Preprocessing

Page 87: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

More Arrays Simulaneously Preprocessing

Page 88: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ŷ

ŷŷ

ŷ

TEST, ARRAY 1

REFERENCE, ARRAY 1

REFERENCE, ARRAY 2

TEST, ARRAY 2

More Arrays Simulaneously Preprocessing

Page 89: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

More Arrays Simulaneously

Additive error and non linear effects undermine application of ANOVA

Preprocessing

Page 90: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

ŷ

ŷŷ

ŷ

TEST, ARRAY 1

REFERENCE, ARRAY 1

REFERENCE, ARRAY 2

TEST, ARRAY 2

More Arrays Simulaneously Preprocessing

Page 91: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Lowess

ijnmkijkinimnjiijnmk GCRGAGDACGy

99 % confidence interval based on 100 genes, 3000 bootstraps

retained 370 genes (62 T-test p value < 0.01)

Bootstrap analysis

ID Rat_1 Rat_2 Rat_3 Rat_4 p D_GC_effects285 -3.31674 -3.20904 -2.08115 -1.62183 0.008818 -2.577397

1076 -1.39327 -2.04573 -1.85822 -2.42609 0.002899 -2.1754383755 -0.81029 -1.50631 -0.99613 -1.40283 0.005643 -1.245061

Preprocessing

Page 92: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

Methods tested on pygmee dataset 3750 genes

1. ANOVA 99 % CI

2. ANOVA 95 % CI

3. SAM

4. T-test

5. Fold test

Retained 360 genes

Construct for each gene a binary profile 1 1 1 1 1

Hierarchically cluster genes based on this profile

methodsComparison

Only 8 genes retained by all methods

Page 93: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

methodsmethodsComparison

Page 94: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

methodsComparison

Page 95: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006

• Latin Square (mouse data set)

• Reference: normal mouse• Condition: pygmee mouse• Two experiments C=1, C=2 reflects two sample time points• 2 batches: not all genes of the genome on one array

A 1, C 1 B1

Test = R

Ref = G

A 2, C 1 B1

Test = G

Ref = R

A 5, C 2 B1

Test = R

Ref = G

A 6, C 2 B1

Test = G

Ref = R

A 3, C 1 B2

Test = R

Ref = G

A 4, C 1 B2

Test = R

Ref = G

A 7, C 2

B2

Test = R

Ref = G

A 8, C 2 B2

Test = G

Ref = R

Transcript profiling Experiment Design