microarray normalization, error models, quality wolfgang huber embl-ebi brixen 16 june 2008
Post on 21-Dec-2015
229 views
TRANSCRIPT
![Page 1: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/1.jpg)
Microarray normalization, error models, quality
Wolfgang HuberEMBL-EBIBrixen 16 June 2008
![Page 2: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/2.jpg)
Oligonucleotide microarrays
![Page 3: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/3.jpg)
Base Pairing
![Page 4: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/4.jpg)
Oligonucleotide microarrays
5µm5µm
Millions of copies of a Millions of copies of a specificspecificoligonucleotide probe oligonucleotide probe molecule per patchmolecule per patch
Image of array after hybridisation and stainingImage of array after hybridisation and staining
up to 6.5 Mioup to 6.5 Miodifferent probe patchesdifferent probe patches
Target - single strandedTarget - single stranded cDNAcDNA
Oligonucleotide probeOligonucleotide probe
**
**
*
1.28cm1.28cm
GeneChipGeneChip
Hybridized Probe CellHybridized Probe Cell
![Page 5: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/5.jpg)
Probe sets
![Page 6: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/6.jpg)
Terminology for transcription arrays
Each target molecule (transcript) is represented by several oligonucleotides of (intended) length 25 bases
Probe: one of these 25-mer oligonucleotidesPerfect match (PM): A probe exactly complementary to the
target sequenceMismatch (MM): same as PM but with a single homomeric
base change for the middle (13th) base (G-C, A-T) Probe-pair: a (PM, MM) pair.Probe set: a collection of probe-pairs (e.g. 11) targeting the
same transcript
MGED/MIAME: „probe“ is ambiguous!Reporter: the sequenceFeature: a physical patch on the array with molecules
intended to have the same reporter sequence (one reporter can be represented by multiple features)
![Page 7: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/7.jpg)
Image analysis
• about 100 pixels per feature• segmentation• summarisation into one number representing the intensity level for this feature
CEL file
![Page 8: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/8.jpg)
array data
samples:mRNA fromtissue biopsies,cell lines
arrays:probes = gene-specific DNA strands
2.93
1.67
0.72
0.6
5.8
1.12
tissue B
3.314.2MCAM
0.671.32LAMA4
0.120.01CASP4
1.02.2ALDH4
1.81.1VIM
2.120.02ErbB2
tissue Ctissue A
fluorescent detection of the amount of
sample-probe binding
![Page 9: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/9.jpg)
Why do you need ‘normalisation’?
![Page 10: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/10.jpg)
From: lymphoma dataset
vsn package
Alizadeh et al., Nature 2000
bias
![Page 11: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/11.jpg)
MA-plotM
A
2
2
2
2
log
log ( )
1 1log2 2log
1 1
A RG
RM
G
RA
GM
![Page 12: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/12.jpg)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
05
10
15
log 2
inte
nsity
arrays / dyes
![Page 13: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/13.jpg)
5 10 15
0.0
00
.05
0.1
00
.15
0.2
00
.25
8 arrays from the lymphoma data (Alizadeh 2000)
log2intensity
De
nsi
ty
![Page 14: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/14.jpg)
log2 Cope et al. Bioinformatics 2003
Non-linearity
![Page 15: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/15.jpg)
ratio compression
Yue et al., (Incyte
Genomics) NAR (2001)
29 e41
nominal 3:1
nominal 1:1
nominal 1:3
![Page 16: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/16.jpg)
A complex measurement process lies between mRNA concentrations and intensities
o RNA degradation
o quality of actual probe sequences (vs intended)
o image segmentation
o amplification efficiency
o scratches and spatial gradients on the array
o signal quantification
o reverse transcription efficiency
o cross-talk across features
o signal "preprocessing"
o hybridization efficiency and specificity
o cross-hybridisation
o labeling efficiency
o optical noise
The problem is less that these steps are ‘not perfect’; it is that they vary from array to array, experiment to experiment.
![Page 17: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/17.jpg)
Why do you need statistics?
![Page 18: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/18.jpg)
tumor-normal
Which genes are differentially transcribed?
same-same
log-ratio
![Page 19: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/19.jpg)
Statistics 101:
bias accuracy
p
recis
ion
vari
an
ce
![Page 20: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/20.jpg)
Basic dogma of data analysis
Can always increase sensitivity
on the cost of specificity, or vice
versa, the art is to
- optimize both, then
- find the best trade-off.
X
X
X
X
X
X
X
X
X
![Page 21: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/21.jpg)
How to compare microarray intensities with each other?
How to address measurement uncertainty (“variance”)?
How to calibrate (“normalize”) for biases between samples?
Questions
![Page 22: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/22.jpg)
Sources of variationamount of RNA in the biopsy efficiencies of-RNA extraction-reverse transcription -labeling-fluorescent detection
probe purity and length distributionspotting efficiency, spot sizecross-/unspecific hybridizationstray signal
Calibration Error model
Systematic o similar effect on many measurementso corrections can be estimated from data
Stochastic
o too random to be ex-plicitely accounted for o remain as “noise”
![Page 23: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/23.jpg)
Quantile normalisation
![Page 24: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/24.jpg)
Quantile normalisation
Ben Bolstad 2001
1e
2e
d
d
d
![Page 25: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/25.jpg)
data("Dilution")nq = normalize.quantiles(exprs(Dilution))nr = apply(exprs(Dilution), 2, rank)for(i in 1:4) plot(nr[,i], nq[,i], pch=".", log="y", xlab="rank",
ylab="quantile normalized", main=sampleNames(Dilution)[i])
![Page 26: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/26.jpg)
6 8 10 12 14
0.0
0.2
0.4
0.6
0.8
before
log2(exprs(Dilution))
De
nsi
ty
6 8 10 12 14
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
after quantile normalisation
log2(nq)
De
nsi
ty
![Page 27: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/27.jpg)
Quantile normalisation is: per array rank-transformation followed by replacing ranks with values
from a common reference distribution
Histogram of log2(nq[, 1])
log2(nq[, 1])
Fre
qu
en
cy
6 8 10 12 14
05
00
01
00
00
15
00
02
00
00
25
00
03
00
00
![Page 28: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/28.jpg)
Quantile normalisation
+ Simple, fast, easy to implement
+ Always works, needs no user interaction / tuning
+ Non-parametric: can correct for quite nasty non-linearities (saturation, background) in the data
- Always "works", even if data are bad / inappropriate
- Conservative: rank transformation looses information - may yield less power to detect differentially expressed genes
- Aggressive: when there is an excess of up- (or down) regulated genes, it removes not just technical, but also biological variation
![Page 29: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/29.jpg)
loess normalisation
![Page 30: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/30.jpg)
"loess" normalisationloess (locally weighted scatterplot smoothing): an
algorithm for robust local polynomial regression by W. S. Cleveland and colleagues (AT&T, 1980s) and handily available in R
![Page 31: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/31.jpg)
Local polynomial regression
0
22 1
1 1
1
Global polynomial regression
( ) ...
applied to data ( , ),..., ( , ), with equal weights
resulting in global fit ( ,..., )
Local polynomial regression around
with w
pp
n n
p
y x a x a x a x a
x y x y
a a
1
eights ( - )
resulting in local fit ( ( ),..., ( ))
b
p
h x
a v a v
bandwidth b
![Page 32: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/32.jpg)
Robust regression
2 4 6 8 10
51
01
52
0
x
y
lmrlm
2
1
1
1, ,
OLS: ( ) min
M-est.: ( ) min
LTS: { ( ) | } min
n
i ii
n
i ii
i i i n
y f x
M y f x
y f x
F
![Page 33: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/33.jpg)
C. Loader
Local Regression and Likelihood
Springer Verlag
![Page 34: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/34.jpg)
loess normalisation
before after
• local polynomial regression of M against A• 'normalised' M-values are the residuals
![Page 35: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/35.jpg)
local polynomial regression normalisation in >2 dimensions
![Page 36: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/36.jpg)
n-dimensional local regression model for microarray normalisation( ) ( )
: log-intensity of gene in condition , replicate
: baseline value gene ( -value)
: effect of treatment on gene
( ) : intensity-dependent normalisation fu
kij k ij k ik k kij
kij
k
ik
ij k
Y
Y k i j
k A
i k
nction for array
( ) : intensity-dependent error scale function
: i.i.d. error term
k
kij
ij
An algorithm for fitting this robustly is described (roughly) in the paper. They only provided software as a compiled binary for Windows. The method has not found much use.
![Page 37: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/37.jpg)
Estimating relative expression
(fold-changes)
![Page 38: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/38.jpg)
ratios and fold changes
Fold changes are useful to describe continuous changes in expression
1000
1500
3000
x3
x1.5
A B C
0
200
3000
?
?
A B C
But what if the gene is “off” (below detection limit) in one condition?
![Page 39: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/39.jpg)
ratios and fold changes
The idea of the log-ratio (base 2)0: no change
+1: up by factor of 21 = 2 +2: up by factor of 22 = 4 -1: down by factor of 2-1 = 1/2 -2: down by factor of 2-2 = ¼
What about a change from 0 to 500?- conceptually- noise, measurement precision
A unit for measuring changes in expression: assumes that a change from 1000 to 2000 units has a similar biological meaning to one from 5000 to 10000.
![Page 40: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/40.jpg)
Many data are measured in definite units:
• time in seconds• lengths in meters• energy in Joule, etc.
Climb Mount Plose (2465 m) from Brixen (559 m) with weight of 78 kg, working against a gravitation field of strength 9.81 m/s2 :
What is wrong with microarray data?
(2465 - 559) · 78 · 9.81 m kg m/s2
= 1 458 433 kg m2 s-2
= 1 458.433 kJ
![Page 41: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/41.jpg)
vsn normalisation
![Page 42: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/42.jpg)
Robust affine regression normalisation (n-dim.) with the additive-multiplicative error model
Error model:
Trey Ideker et al. (2000) JCB
David Rocke and Blythe Durbin:
(2001) JCB, (2002) Bioinformatics
Use for robust affine regression normalisation:
W. Huber, Anja von Heydebreck et al. (2002) Bioinformatics
![Page 43: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/43.jpg)
ik i ika a
ai per-sample offset
ik additive noise
bi per-sample gain factor
bk sequence-wise probe efficiency
ik multiplicative noise
exp( )ik i k ikb b b
ik ik ik ky a b x
The two component model
measured intensity = offset + gain true abundance
![Page 44: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/44.jpg)
The two-component model
raw scale log scale
“additive” noise
“multiplicative” noise
B. Durbin, D. Rocke, JCB 2001
![Page 45: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/45.jpg)
Parameterization
(1 )
y a b x
y a b x e
two practically equivalent forms
(<<1)
a: average background
on one array, for one color, the same for all features
also dependent on the reporter sequence
background fluctuations
same distribution in whole experiment
different distributions
b: average gain factor on one array, for one color, the same for all features
intensity dependent
gain fluctuations same distribution in whole experiment
different distributions
![Page 46: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/46.jpg)
variance stabilizing transformations
Xu a family of random variables with E Xu = u, Var Xu = v(u).
Define
Var f(Xu ) independent of u
( )v( )
x
duf x
u
derivation: linear approximation
![Page 47: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/47.jpg)
0 20000 40000 60000
8.0
8.5
9.0
9.5
10
.01
1.0
raw scale
tra
nsf
orm
ed
sca
le
variance stabilizing transformation
f(x)
x
![Page 48: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/48.jpg)
variance stabilizing transformations
1( )
v( )
x
f x duu
1.) constant variance (‘additive’)
2( ) sv u f u
2.) constant CV (‘multiplicative’) 2( ) logv u u f u
4.) additive and multiplicative
2 2 00( ) ( ) arsinh
u uv u u u s f
s
3.) offset2
0 0( ) ( ) log( )v u u u f u u
![Page 49: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/49.jpg)
the “glog” transformation
intensity-200 0 200 400 600 800 1000
- - - f(x) = log(x)
——— hs(x) = asinh(x/s)
2arsinh( ) log 1
arsinh log log2 0limx
x x x
x x
P. Munson, 2001
D. Rocke & B. Durbin, ISMB 2002
W. Huber et al., ISMB 2002
![Page 50: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/50.jpg)
raw scale log glog
difference
log-ratio
generalized
log-ratio
constant partvariance:
proportional part
glog
![Page 51: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/51.jpg)
dif
fere
nc
e re
d-g
reen
rank(average)
![Page 52: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/52.jpg)
“usual” log-ratio
'glog' (generalized log-ratio)
1
2
2 21 1 1
2 22 2 2
log
log
x
x
x x c
x x c
c1, c2 are experiment specific parameters (~level of background noise)
![Page 53: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/53.jpg)
Variance Bias Trade-Off
Est
imat
ed l
og
-fo
ld-c
han
ge
Signal intensity
logglog
![Page 54: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/54.jpg)
Variance-bias trade-off and shrinkage estimators
Shrinkage estimators:a general technology in statistics:pay a small price in bias for a large decrease of variance, so overall the mean-squared-error (MSE) is reduced.
Particularly useful if you have few replicates.
Generalized log-ratio is a shrinkage estimator for fold change
![Page 55: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/55.jpg)
Background correction
![Page 56: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/56.jpg)
Background correctionBackground correction
Irizarry et al. Biostatistics 2003
0 pm
500 fm 1 pm
750 fm
![Page 57: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/57.jpg)
Background correction: RMA, Irizarry et al. (2002)Background correction: RMA, Irizarry et al. (2002)
~ Normal with mean and sd read off values
~ Exponential
closed form expression for [ | ],
ˆ use this as ( 0).
(NB, [ 0] 1 is not realistic)
PM B S
B MM
S
E S PM
s
P S
![Page 58: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/58.jpg)
Background correction: Background correction:
raw intensities x
biased background correction
s=E[S|data]
unbiased background correction
s=x-b
log2(s) glog2(s|data)
?
![Page 59: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/59.jpg)
Comparison between RMA and VSN background correction
![Page 60: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/60.jpg)
Summaries for Affymetrix genechip probe sets
![Page 61: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/61.jpg)
Data and notationPMikg , MMikg = Intensities for perfect match and
mismatch probe k for gene g on chip i
i = 1,…, n one to hundreds of chips
k = 1,…, J usually 11 probe pairs
g = 1,…, G tens of thousands of probe sets.
Tasks: calibrate (normalize) the measurements from different chips (samples)summarize for each probe set the probe level data, i.e., 11 PM and MM
pairs, into a single expression measure.compare between chips (samples) for detecting differential
expression.
![Page 62: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/62.jpg)
Expression measures: MAS 4.0
Expression measures: MAS 4.0
Affymetrix GeneChip MAS 4.0 software used AvDiff, a trimmed mean:
o sort dk = PMk -MMk o exclude highest and lowest valueo K := those pairs within 3 standard deviations of
the average
1( )
# k kk K
AvDiff PM MMK
![Page 63: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/63.jpg)
Expression measures MAS 5.0
Expression measures MAS 5.0
Instead of MM, use "repaired" version CT
CT = MM if MM<PM
= PM / "typical log-ratio" if MM>=PM
Signal = Weighted mean of the values log(PM-CT)
weights follow Tukey Biweight function
(location = data median,
scale a fixed multiple of MAD)
0 20 40 60 80 100
0.0
0.4
0.8
Tukey Biweight
x
w
![Page 64: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/64.jpg)
Expression measures: Li & Wong
Expression measures: Li & Wong
dChip fits a model for each gene
where
i : expression measure for the gene in sample i
k : probe effect
i is estimated by maximum likelihood
2, (0, )ki ki k i ki kiPM MM N
![Page 65: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/65.jpg)
![Page 66: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/66.jpg)
![Page 67: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/67.jpg)
dChip
RMA
bi is estimated using the robust method median polish (successively remove row and column medians, accumulate terms, until convergence).
Expression measures RMA: Irizarry et al. (2002)Expression measures
RMA: Irizarry et al. (2002)
2log ki k i kiY a b
2, (0, )ki k i ki kiY N
![Page 68: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/68.jpg)
However, median (and hence median polish) is not always so robust...
See also: Casneuf T. et al. (2007), In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation. BMC Bioinformatics 2007;8(1): 461
x
Fre
qu
en
cy
-2 0 2 4 6 8 10
02
46
x
Fre
qu
en
cy
-2 0 2 4 6 8 100
24
6
- median- trimmed mean (0.15)
![Page 69: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/69.jpg)
Probe effect adjustment by using gDNA reference
Huber et al., Bioinformatics 2006
![Page 70: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/70.jpg)
Genechip S. cerevisiae Tiling Array
4 bp tiling path over complete genome(12 M basepairs, 16 chromosomes)
Sense and Antisense strands6.5 Mio oligonucleotides 5 m feature size
manufactured by Affymetrixdesigned by Lars Steinmetz (EMBL & Stanford Genome Center)
![Page 71: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/71.jpg)
RNA Hybridization
![Page 72: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/72.jpg)
Before normalization
![Page 73: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/73.jpg)
Probe specific response normali-zation
2log ii
i
yq
s
2
( )glog i i
ii
y b sq
s
2log iy
2log is
remove ‘dead’ probes
2glog
i ii
i
PM MMq
s
S/N
3.22
3.47
4.04
4.58
4.36
![Page 74: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/74.jpg)
Probe-specific response normalization
si probe specific response factor. Estimate taken from DNA hybridization data
bi =b(si ) probe specific background term. Estimation: for strata of probes with similar si, estimate b through location estimator of distribution of intergenic probes, then interpolate to obtain continuous b(s)
2
( )glog i i
ii
y b sq
s
![Page 75: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/75.jpg)
Estimation of b: joint distribution of (DNA, RNA) values of intergenic PM probes
log2 DNA intensity
log
2 R
NA
in
ten
sity unannotated
transcripts
background
b(s)
![Page 76: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/76.jpg)
After normalization
![Page 77: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/77.jpg)
Quality assessment
![Page 78: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/78.jpg)
Quality Assessment and Control
• arrayQualityMetrics package by Audrey Kauffmann
• Show example report
![Page 79: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/79.jpg)
References
Bioinformatics and computational biology solutions using R and Bioconductor, R. Gentleman, V. Carey, W. Huber, R. Irizarry, S. Dudoit, Springer (2005).
Variance stabilization applied to microarray data calibration and to the quantification of differential expression. W. Huber, A. von Heydebreck, H. Sültmann, A. Poustka, M. Vingron. Bioinformatics 18 suppl. 1 (2002), S96-S104.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. R. Irizarry, B. Hobbs, F. Collins, …, T. Speed. Biostatistics 4 (2003) 249-264.
Error models for microarray intensities. W. Huber, A. von Heydebreck, and M. Vingron. Encyclopedia of Genomics, Proteomics and Bioinformatics. John Wiley & sons (2005).
Normalization and analysis of DNA microarray data by self-consistency and local regression. T.B. Kepler, L. Crosby, K. Morgan. Genome Biology. 3(7):research0037 (2002)
Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. S. Dudoit, Y.H. Yang, M. J. Callow, T. P. Speed. Technical report # 578, August 2000 (UC Berkeley Dep. Statistics)
A Benchmark for Affymetrix GeneChip Expression Measures. L.M. Cope, R.A. Irizarry, H. A. Jaffee, Z. Wu, T.P. Speed. Bioinformatics (2003).
....many, many more...
![Page 80: Microarray normalization, error models, quality Wolfgang Huber EMBL-EBI Brixen 16 June 2008](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5f5503460f94a3f992/html5/thumbnails/80.jpg)
Acknowledgements
Anja von Heydebreck (Darmstadt)Robert Gentleman (Seattle)Günther Sawitzki (Heidelberg)Martin Vingron (Berlin)Rafael Irizarry (Baltimore)Terry Speed (Berkeley)Judith Boer (Leiden) Anke Schroth (Wiesloch)Friederike Wilmer (Hilden)Jörn Tödling (Cambridge)Lars Steinmetz (Heidelberg)Audrey Kauffmann (Cambridge)