Detection and Restoration of Hybridization Problems in Affymetrix
GeneChip Data by Parametric Scanning
Tomokazu Konishi
Akita Pref. Univ.
a monolith?
!
praise and censurefor microarray technology
毀毀毀毀
Admired:the comprehensiveness
Criticized:the low reproducibility
Origin of the failure:the intelligent framework for data processing
values without units
obtained from hybridization images
Index CHI11 69715.252 55335.523 89216.684 145717.85 128202.26 1143737 75725.448 39021.069 115491.510 97384.4711 10182312 194268.713 114838.714 118911.315 45748.0516 113630.717 53177.5518 65225.819 117009.420 32688.52
excitation light fluorescence
Requirements for understanding unitless values
philosophy or
metaphysics
framework
ex. International Systems of Units (SI)
desirable framework for microarray
• Measurements–Standard–Scale
• Interpretations–direct link to cell functions
Measurements
• Standard• Scale
are available from the data distribution
Parametric Normalization (2003-4)
data service is available through Skylight Biotech Inc.
Human fibroblast
(Iyer et al. 1999 Science)
ord
ere
d re
spo
nse
va
lue
t h e o re t ica l- 3 -2 -1 0 1 2 3
z -s c o re
-3
-2
-1
0
1
2
3
raw data
SuperNORM
lognormal distribution can be foundby subtracting proper background
distortedby saturation
noise-affected
normalized
statistical framework
• Standard• Scale
are available from the data distribution
Parametric Normalization (2002-4)
data service is available through Skylight Biotech Inc.
desirable framework for microarray
• Measurements–Standard–Scale
• Interpretations–direct link to cell functions
statistical framework
cell
link to cell functions
• Most of the factors are well characterized
• Bottom-Up approach
– nucleotide sequence recognition factors
– controlling the rate limiting steps
– concrete physics
The cell is not a black-box
cell
[mRNA]s are on the balancesrates of synthesis and degradation
Pseudo Equilibrium
Cytosol
Interactions among factorschange the rate
energies describe the interactions
]Eregulator[em0p KkCE
]regulator[cr00
p KkCΔGRate of synthesis
Rate of degradation
energies = k [factors]
energies determine the [mRNA]
kc = ApP0[polymerase]/Ad
T
EGEk
Rexp][mRNA d
0pp
cg
link to cell functions
Thermodynamic Model of
Transcriptome Formation (2005)
(the theory and some supporting evidences)
desirable framework for microarray
• Measurements– Standard
– Scale
• Interpretations– direct link to cell functions
physical framework
The framework for microarray data
science
Common to
physics and
biochemistry
feedback to the wet methods
Parametric Scanning
The GeneChip system
• detects nucleotide hybridization,– Measure mRNA levels– Find SNPs
• has 1,000,000 probes (=cells) synthesized in situ
column ~ 1,000 cells
row ~ 1,000 cells
pseudo image: comparison with a standard
scratchy noise
pseudo image: comparison with a standard
pseudo image: comparison with a standard
malfunction region
pseudo image: comparison with a standard
air bubble?
troubles should be removedin prior to data analyses
ideal standard
ideal standard
group of chips
brief normalization&
trimmed mean
finding problems
ideal standard each chip
(z-score)(z-score)
Δz
Normalize with robust parameters
distribution of Δz
in >85% of data,
Δz : N(0, 12)
scanning the image by using moving windows
scanning
• Any of large Δz can be large signal
• Clusters of large Δz should be extremely rare
medians in the moving window are challenged by the test
distribution of the medians of windows
N(0, 12)
(central limiting theorem)
test level is given as the expectation of cancellation
expect = 2
Number of windows:
n=500,000
Double sided test
test=qnorm(1/n)= -4.61
cancellation
expectation = 2
improvement in reproducibilityPMdata, repeated measurements
Some innocent data are cancelled accompanying
only a limited portion!
Now we can handle the .cel data
0
20-2
-2
22 4 7 3 9 3 _ a t
exp
1
exp 2
exp
1
0
-2
22 4 6 0 0 4 _ a t
20-2exp 2
verification by the cells
finding splicing variants
Cancellation by dChip packagePMdata, repeated measurements
After cancellation Cancelled data
improved reproducibility
MAS 5.0 SuperNORM
Exp. 1 (logratio)
Exp
. 2 (
logr
atio
)
Exp. 1 (logratio)
Exp
. 2 (
logr
atio
)
commercially available from …
Skylight-biotech Inc.
http://www.super-norm.com