microarray preprocessing and quality assessment wolfgang huber european bioinformatics institute h....
Post on 20-Jan-2016
214 views
TRANSCRIPT
![Page 1: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/1.jpg)
Microarray preprocessing and quality assessmentWolfgang HuberEuropean Bioinformatics Institute
H. Sueltmann DKFZ/MGA
![Page 2: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/2.jpg)
Which genes are differentially transcribed?
same-same tumor-normal
log-ratio
![Page 3: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/3.jpg)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
05
10
15
log
feat
ure
inte
nsity
(a.
u.)
arrays, colour channels
![Page 4: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/4.jpg)
Scatterplot, colored by PCR-plateTwo RZPD Unigene II filters (cDNA nylon membranes)
PCR plates
![Page 5: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/5.jpg)
PCR plates
![Page 6: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/6.jpg)
PCR plates: boxplots
![Page 7: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/7.jpg)
print-tip effects
-0.8 -0.6 -0.4 -0.2 0.0 0.2
0.0
0.2
0.4
0.6
0.8
1.0
41 (a42-u07639vene.txt) by spotting pin
log(fg.green/fg.red)
F
1:11:21:31:42:12:22:32:43:13:23:33:44:14:24:34:4
q (log-ratio)
F(q
)
![Page 8: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/8.jpg)
spotting pin quality decline
after delivery of 3x105 spots
after delivery of 5x105 spots
H. Sueltmann DKFZ/MGA
![Page 9: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/9.jpg)
spatial effectsspatial effects
R Rb R-Rbcolor scale by rank
spotted cDNA arrays, Stanford-type
another array:
print-tip
color scale ~ log(G)
color scale ~ rank(G)
![Page 10: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/10.jpg)
10 20 30 40 50 60
1020
3040
5060
1:nrhyb
1:nr
hyb
1 2 3 4 5 6 7 8 910111213141516171823242526272829303132333435363738737475767778798081828384858687888990919293949596979899100
0.6
0.8
1.0
1.2
1.4
1.6
1.8
Batches: array to array differences dij = mediank |hik -hjk|
arrays i=1…63; roughly sorted by time
![Page 11: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/11.jpg)
A complex measurement process lies between mRNA concentrations and intensities
o tissue contamination
o clone identification and mapping
o image segmentation
o RNA degradation
o PCR yield, contamination
o signal quantification
o amplification efficiency
o spotting efficiency
o ‘background’ correction
o reverse transcription efficiency
o DNA-support binding
o hybridization efficiency and specificity
o other array manufacturing-related issues
The problem is less that these steps are ‘not perfect’; it is that they vary from array to array, experiment to experiment.
![Page 12: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/12.jpg)
Statistics 101:
bias accuracy
p
reci
sio
n
v
aria
nce
![Page 13: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/13.jpg)
Basic dogma of data analysis
Can always increasesensitivity on the cost of specificity,
or vice versa,
the art is to - optimize both- then find the best trade-off.
X
X
X
X
X
X
X
X
X
![Page 14: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/14.jpg)
ratios and fold changes
Fold changes are useful to describe continuous changes in expression
1000
1500
3000
x3
x1.5
A B C
0
200
3000
?
?
A B C
But what if the gene is “off” (below detection limit) in one condition?
![Page 15: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/15.jpg)
ratios and fold changes
The idea of the log-ratio (base 2)0: no change
+1: up by factor of 21 = 2 +2: up by factor of 22 = 4 -1: down by factor of 2-1 = 1/2 -2: down by factor of 2-2 = ¼
What about a change from 0 to 500?- conceptually- noise, measurement precision
A unit for measuring changes in expression: assumes that a change from 1000 to 2000 units has a similar biological meaning to one from 5000 to 10000.
![Page 16: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/16.jpg)
ratio compression
Yue et al., (Incyte
Genomics) NAR (2001)
29 e41
![Page 17: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/17.jpg)
How to compare microarray intensities with each other?
How to address measurement uncertainty (“variance”)?
How to calibrate (“normalize”) for biases between samples?
![Page 18: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/18.jpg)
Sources of variationamount of RNA in the biopsy efficiencies of-RNA extraction-reverse transcription -labeling-fluorescent detection
probe purity and length distributionspotting efficiency, spot sizecross-/unspecific hybridizationstray signal
Calibration Error model
Systematic o similar effect on many measurementso corrections can be estimated from data
Stochastic
o too random to be ex-plicitely accounted for o remain as “noise”
![Page 19: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/19.jpg)
iik ika a
ai per-sample offset
ik additive noise
bi per-sample normalization factor
bk sequence-wise probe efficiency
ik multiplicative noise
exp( )iik k ikb b b
ik ik ik ky a b x
The two component model
measured intensity = offset + gain true abundance
![Page 20: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/20.jpg)
The two-component model
raw scale log scale
“additive” noise
“multiplicative” noise
B. Durbin, D. Rocke, JCB 2001
![Page 21: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/21.jpg)
Parameterization
(1 )
y a b
y a
x
xb e
two practically equivalent forms
(<<1)
a systematic background
same for all probes (per array x color)
per array x color x print-tip group
random background
iid in whole experiment
iid per array
b systematic gain factor
per array x color per array x color x print-tip group
random gain fluctuations
iid in whole experiment
iid per array
![Page 22: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/22.jpg)
Important issues for model fitting
Parameterization (model complexity)variance vs bias
"Heteroskedasticity" (unequal variances) weighted regression or variance stabilizing
transformation
Outliers use a robust method
AlgorithmIf likelihood is not quadratic, need non-linear
optimization. Local minima / concavity of likelihood?
![Page 23: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/23.jpg)
Models are never correct, but some are useful
True relationship:
1 2 22
(0, 0.15 )y x x N
Model: linear dependence Model: quadratic dependence
![Page 24: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/24.jpg)
variance stabilizing transformations
Xu a family of random variables with EXu=u,
VarXu=v(u). Define
var f(Xu ) independent of u
1( )
v( )
x
f x duu
derivation: linear approximation
![Page 25: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/25.jpg)
0 20000 40000 60000
8.0
8.5
9.0
9.5
10
.01
1.0
raw scale
tra
nsf
orm
ed
sca
le
variance stabilizing transformations
f(x)
x
![Page 26: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/26.jpg)
variance stabilizing transformations
1( )
v( )
x
f x duu
1.) constant variance (‘additive’)
2( ) sv u f u
2.) constant CV (‘multiplicative’) 2( ) logv u u f u
4.) additive and multiplicative
2 2 00( ) ( ) arsinh
u uv u u u s f
s
3.) offset2
0 0( ) ( ) log( )v u u u f u u
![Page 27: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/27.jpg)
the “glog” transformation
intensity-200 0 200 400 600 800 1000
- - - f(x) = log(x)
——— hs(x) = asinh(x/s)
2arsinh( ) log 1
arsinh log log2 0limx
x x x
x x
P. Munson, 2001
D. Rocke & B. Durbin, ISMB 2002
W. Huber et al., ISMB 2002
![Page 28: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/28.jpg)
raw scale log glog
difference
log-ratio
generalized
log-ratio
constant partvariance:
proportional part
glog
![Page 29: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/29.jpg)
ii k i k i ka a L a i p e r - s a m p l e o ff s e t
L i k l o c a l b a c k g r o u n d p r o v i d e d b y i m a g e a n a l y s i s
i k ~ N ( 0 , b i2 s 1
2 )
“ a d d i t i v e n o i s e ”
b i p e r - s a m p l en o r m a l i z a t i o n f a c t o r
b k s e q u e n c e - w i s el a b e l i n g e ffi c i e n c y
i k ~ N ( 0 , s 22 )
“ m u l t i p l i c a t i v e n o i s e ”
e x p ( )ii k k i kb b b
i k i k i k i ky a b x
m e a s u r e d i n t e n s i t y = o ff s e t + g a i n * t r u e a b u n d a n c e
parameter estimation (vsn package)
2Yarsinh , (0, )iki
k ki kii
aN c
b
:
o maximum likelihood estimator: straightforward – but sensitive to outliers
o model is for genes that are unchanged; differentially transcribed genes act as outliers.
o robust variant of ML estimator, à la Least Trimmed Sum of Squares regression.
o works well as long many genes are not differentially transcribed (<50% throughout the intensity range)
![Page 30: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/30.jpg)
“usual” log-ratio
'glog' (generalized log-ratio)
1
2
2 21 1 1
2 22 2 2
log
log
xx
x x c
x x c
c1, c2 are experiment specific parameters (~level of background noise)
![Page 31: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/31.jpg)
Variance Bias Trade-Off
Est
imat
ed l
og
-fo
ld-c
han
ge
Signal intensity
logglog
![Page 32: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/32.jpg)
Variance-bias trade-off and shrinkage estimators
Shrinkage estimators:a general technology in statistics:pay a small price in bias for a large decrease of variance, so overall the mean-squared-error (MSE) is reduced.
Particularly useful if you have few replicates.
Generalized log-ratio is a shrinkage estimator for fold change
![Page 33: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/33.jpg)
“Single color normalization”
n red-green arrays (R1, G1, R2, G2,… Rn, Gn)
within/between slidesfor each slide i=1…n
calculate Mi= log(Ri/Gi), Ai= ½ log(Ri*Gi)normalize Mi vs Ai
Then normalize M1…Mn
all at oncenormalize the combined matrix (R, G)
then calculate log-ratios or any other contrast you like
![Page 34: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/34.jpg)
What about non-linear effects?
o Good data operate in the linear regime, where fluorescence intensity increases proportionally to target abundance (see e.g. Affymetrix dilution series)
Two reasons for non-linearity:
o At the high intensity end: saturation/quenching. This can and should be avoided experimentally - loss of data!
o At the low intensity end: background offsets
![Page 35: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/35.jpg)
0 20 40 60 80 100
02
04
06
08
01
00
original scale
x
y
1 2 5 10 20 50 1002
05
01
00
log/log scale
x
y
10 1y x
![Page 36: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/36.jpg)
Non-linear or affine linear?
![Page 37: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/37.jpg)
Response curveLockhart et. al. Nature Biotechnology 14 (1996)
![Page 38: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/38.jpg)
Gene expression matters
![Page 39: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/39.jpg)
Probe set summaries for Affymetrix expression analysis
genechips
![Page 40: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/40.jpg)
Probe set summarization - data and notation
PMijg , MMijg = Intensities for perfect match and mismatch probe j for gene g in chip i
i = 1,…, n one to hundreds of chips
j = 1,…, J usually 11 or 16 probe pairs
g = 1,…, G 6…30,000 probe sets.
Tasks: calibrate (normalize) the measurements from different
chips (samples)summarize for each probe set the probe level data, i.e.,
16 PM and MM pairs, into a single expression measure.
compare between chips (samples) for detecting differential expression.
![Page 41: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/41.jpg)
Expression measures: MAS 4.0
Affymetrix GeneChip MAS 4.0 software used AvDiff, a trimmed mean:
o sort dj = PMj -MMj o exclude highest and lowest valueo J := those pairs within 3 standard
deviations of the average
1( )
# j jj J
AvDiff PM MMJ
![Page 42: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/42.jpg)
Expression measures MAS 5.0
Instead of MM, use "repaired" version CTCT= MM if MM<PM
= PM / "typical log-ratio" if MM>=PM
"Signal" =
Tukey.Biweight (log(PM-CT))
(… median)
Tukey Biweight: B(x) = (1 – (x/c)^2)^2 if |x|<c, 0 otherwise
![Page 43: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/43.jpg)
Expression measures: Li & Wong
dChip fits a model for each gene
where
– i: expression index for gene i
– j: probe sensitivity
Maximum likelihood estimate of MBEI is used as expression measure of the gene in chip i.
Need at least 10 or 20 chips.
Current version works with PMs only.
2, (0, )ij ij i j ij ijPM MM N
![Page 44: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/44.jpg)
Expression measures RMA: Irizarry et al. (2002)
o Estimate one global background value b=mode(MM). No probe-specific background!
o Assume: PM = strue + b
Estimate s0 from PM and b as a conditional expectation E[strue|PM, b].
o Use log2(s).
o Nonparametric nonlinear calibration ('quantile normalization') across a set of chips.
![Page 45: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/45.jpg)
RMA
Additive model for probe effects Pj and expression value Ei
log2Sij = Ei + Pj + εij
Estimate Ei using robust procedure
![Page 46: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/46.jpg)
0 200 400 600
0.00
00.
010
X=B+S
X
Den
sity
B
SX
-2 -1 0 1 2 3
-4-2
0
x
f(x)
log2xglog2x
0 50 100 200
-50
050
150
RMA background correction
xS
0 50 100 200
-50
050
150
VSN background correction
x
Svs
n0 50 100 200
-2-1
01
2
log2
x
log 2
Srm
a
0 50 100 200
-2-1
01
2
glog2
x
glog
2Svs
n
bioc/Courses/bioc_R_intro/vsn_vs_bgcorrect.R
![Page 47: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/47.jpg)
ArrayQualityMetrics
R package by Audrey Kauffmann (EBI)
Collaboration with Alvis Brazma, Misha Kapushesky (ArrayExpress)
EU Project "Emerald" http://www.microarray-quality.org
![Page 48: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/48.jpg)
A probe effect normalisation for tiling arrays
Huber et al., Bioinformatics 2006
![Page 49: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/49.jpg)
Genechip S. cerevisiae Tiling Array
4 bp tiling path over complete genome(12 M basepairs, 16 chromosomes)
Sense and Antisense strands6.5 Mio oligonucleotides 5 m feature size
manufactured by Affymetrixdesigned by Lars Steinmetz (EMBL & Stanford Genome Center)
![Page 50: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/50.jpg)
RNA Hybridization
![Page 51: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/51.jpg)
Before normalization
![Page 52: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/52.jpg)
Probe specific response normali-zation
2log ii
i
yq
s
2
( )glog i i
ii
y b sq
s
2log iy
2log is
remove ‘dead’ probes
2glog
i ii
i
PM MMq
s
S/N
3.22
3.47
4.04
4.58
4.36
![Page 53: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/53.jpg)
Probe-specific response normalization
si probe specific response factor. Estimate taken from DNA hybridization data
bi =b(si ) probe specific background term. Estimation: for strata of probes with similar si, estimate b through location estimator of distribution of intergenic probes, then interpolate to obtain continuous b(s)
2
( )glog i i
ii
y b sq
s
![Page 54: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/54.jpg)
Estimation of b: joint distribution of (DNA, RNA) values of intergenic PM probes
log2 DNA intensity
log
2 R
NA
in
ten
sity unannotated
transcripts
background
b(s)
![Page 55: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/55.jpg)
After normalization
![Page 56: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/56.jpg)
Cross-Hybridization
One or several probes from a probeset hybridize with intended transcripts.
Reasons:
sequence similarity (gene families)
errors in gene annotation (e.g. mis-assignment of exons)
pure chance
Casneuf et al., Bioinformatics 2006
![Page 57: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/57.jpg)
![Page 58: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/58.jpg)
Affymetrix CDF
Custom CDF
without sequence-similar genes (E<10-10 )
all genes
![Page 59: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/59.jpg)
How to disentangle correlation
because of functional relatedness and
because of cross-hybridization?
![Page 60: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/60.jpg)
Meta-correlation
for probe set pair X and Y, the Spearman correlation between
alignment scores of X’s reporters to Y's target transcript
and
the signal correlation coefficient of X's reporters to Y's probe set expression value.
![Page 61: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/61.jpg)
Significant positive metacorrelation in cases of off-target sequence similarity
![Page 62: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/62.jpg)
Metacorrelation examplesProbe set
summariesX' reporters off-target alignment vs
signal correlationX: extensin-like family protein
Y: belongs to a zinc finger family.
0.63
0.89
0.70X, Y: functions unknown
0.62
X: protein kinase
Y: ATP citrate lyase
neighbours
0.3
0.55
![Page 63: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/63.jpg)
Caveat RMA
Simulated periodic probeset
White noise
9 of B,2 of A
RMA (median polish) summariesAB=-0.07AC=0.73
![Page 64: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/64.jpg)
Histogram of x
x
Fre
qu
en
cy
-2 0 2 4 6
0.0
1.0
2.0
3.0
Histogram of x
x
Fre
qu
en
cy
-2 0 2 4 6
0.0
1.0
2.0
3.0
set.seed(0xdada)a=c(rnorm(6), rnorm(5, mean=4))b=a; b[6]=b[7]myhist=function(x){ hist(x, col="orange", breaks=seq(-3,6,by=0.8),ylim=c(0,3)) abline(v=median(x), col="black",lwd=3)}par(mfrow=c(2,1))myhist(a)myhist(b)
Median is robust, but can be sensitive to small wiggles in the data
![Page 65: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/65.jpg)
![Page 66: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/66.jpg)
![Page 67: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/67.jpg)
References
Bioinformatics and computational biology solutions using R and Bioconductor, R. Gentleman, V. Carey, W. Huber, R. Irizarry, S. Dudoit, Springer (2005).
Variance stabilization applied to microarray data calibration and to the quantification of differential expression. W. Huber, A. von Heydebreck, H. Sültmann, A. Poustka, M. Vingron. Bioinformatics 18 suppl. 1 (2002), S96-S104.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. R. Irizarry, B. Hobbs, F. Collins, …, T. Speed. Biostatistics 4 (2003) 249-264.
Error models for microarray intensities. W. Huber, A. von Heydebreck, and M. Vingron. Encyclopedia of Genomics, Proteomics and Bioinformatics. John Wiley & sons (2005).
Differential Expression with the Bioconductor Project. A. von Heydebreck, W. Huber, and R. Gentleman. Encyclopedia of Genomics, Proteomics and Bioinformatics. John Wiley & sons (2005).
![Page 68: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/68.jpg)
Acknowledgements
Anja von Heydebreck (Darmstadt)Robert Gentleman (Seattle)Günther Sawitzki (Heidelberg)Martin Vingron (Berlin)Annemarie Poustka, Holger Sültmann, Andreas
Buness, Markus Ruschhaupt (Heidelberg)Rafael Irizarry (Baltimore)Judith Boer (Leiden) Anke Schroth (Heidelberg)Friederike Wilmer (Hilden)Lars Steinmetz (Heidelberg)Tineke Casneuf (Gent)
![Page 69: Microarray preprocessing and quality assessment Wolfgang Huber European Bioinformatics Institute H. Sueltmann DKFZ/MGA](https://reader035.vdocument.in/reader035/viewer/2022062519/56649d6c5503460f94a4c65c/html5/thumbnails/69.jpg)
This new EMBL initiative promotes cross-disciplinary research. EIPODs are supported by at least two labs at the five EMBL sites in Heidelberg and Hamburg (Germany), Grenoble (France), Hinxton (UK) and Monterotondo (Italy). EIPOD projects connect scientific fields that are usually separate, or transfer techniques to a novel context.
For a list of possible projects and further information please visit: www.embl.org/eipod
You are also encouraged to propose your own interdisciplinary project.
Online application until 31st August 2007
2007 Call for Applications
EMBL
EMBL Interdisciplinary Postdocs - EIPOD