(2) ratio statistics of gene expression levels and applications to microarray data analysis
DESCRIPTION
(2) Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent. Outline. Introduction Ratio Statistics - PowerPoint PPT PresentationTRANSCRIPT
(2) Ratio statistics of gene expression levels and applications to microarray data analysis
Bioinformatics, Vol. 18, no. 9, 2002
Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent
OutlineOutline
Introduction
Ratio Statistics
Quality Metric for Ratio Statistics
Conclusion
IntroductionIntroduction
Motivation Expression-based analysis for large families of genes
has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.
IntroductionIntroduction Results 1. estimation of signal ratios from the two channels,
and the significance of those ratios.
2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper.
3. a quality metric is formulated for spots
Ratio StatisticsRatio Statistics
Consider a microarray having n genes, with red and green fluorescent expression values labeled by
and , respectively.
Hypothesis test:
Assumption:
Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variationconstant coefficient of variation
nRRR ,...,, 21 nGGG ,...,, 21
kk
kk
kk
GR
H
GG
RR
c
c
0under
kk
kk
GR
GR
H
H
:
:
1
0
Ratio test statistics:
Assuming and to be normally and
identically distributed, has the density function
Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)
kkk GRT /
kR kG
kT
],)1(2
)1(exp[
2)1(
1)1();(
2
2
22
2
tc
t
tc
ttctf
kT
n
i i
i
t
t
nc
12
2
)1(
)1(1ˆ
Ratio Statistics assuming a constant Ratio Statistics assuming a constant coefficient of variation coefficient of variation (cont.)
self-self experiment Duplicate
),log(log)log(log
logloglog
,'/
''
'
kkkk
kkk
GGRR
ttT
ttT
).log(loglog where
)log(1
log1
2
R
R
n
ik
RR
Rn
ck
Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)
Confidence interval
1. Integrating the ratio density function
2. The C.I. is determined by the parameter c, one can
either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.
2
2'log
2log
2'log
2loglog
4c
)()(
Therefore,
GGRRT
Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratioto-noise ratio
The actual expression intensity measurement is of the form
kBRkkk BRSRR )(
level backgroundmean theis
and level, background fluoresent theis
, gene of
t measuremenintensity expression theis where
kBRk
k
BR
k
SR
Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)
Null hypothesis of interest:
test statistics:
kkkk GRSGSR HH : : 00
kk SGSR
k
k
k
SR
BRkk
kR
BRSRE
RE
])[(
][
kkk GRT /
Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)
Major difference:
1. the assumption of a constant cv applies to
and , not to and
2. the density of is not applicable
SNR (signal-to-noise ratio)
kSR
kSG kR kG
kT
Assuming that are independent,
SNR (SNR (signal-to-noise ratiosignal-to-noise ratio))
and kk BRSR
2222 )(kkkkk BRSRBRSRR c
k
k
kk
kBR
SR
BRBRk
kR BRE
SRESNR
][
][
2
22
22
2
222
2 1)(
kk
k
k
kk
k
k
kRSR
BR
SR
BRSR
R
RR SNR
ccc
c
The Expression intensity scatter plot
Confidence interval for the Confidence interval for the test statisticstest statistics
Assumption:
k
k
BGkk
BRkk
k
kk BGSG
BRSR
G
RT
)(
)(
BGBGBGp
BRBRBRp
NpN
NpNT
),(),(
),(),(
)( ,under 0 kkkk GRSGSRpH
t.independen and
ddistributenormally are ,,, kkkk BGBRSGSR
Confidence interval for the Confidence interval for the test statistics test statistics (cont.)
Under the assumption of constant cv for the signal (wi
thout the background),
cpp
ratio) std d(backgroun /
ratio) noise-to-(signal /
par.) (variance },max{
BGBR
B
BGBRB
ps
),0(),(
),0(),(
BGBB
BGBB
NcssN
NcssNT
The 99% confidence interval for ratio statistic
1 (b) )1or ( 100 (a)
,2.0
BGBR
c
Correction of background Correction of background estimationestimation
Owing to interaction between the fluorescent signal and background, local-background estimation is often biased.
To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.
Correction of background Correction of background estimation estimation (cont.)(cont.)
Simulation
1. generate 10,000 data points from exp. dist. with
2,000 to simulate 10,000 gene expression levels,
2. The intensity measurement for each channel is
further simulated by using a normal dist. with mean
intensity from the exp. dist. and a constant cv of 0.2
3. simulate background level by a normal dist.
(1) no bias: background level ~ N (0,100)
(2) some bias: background level ~ N (b,100)
),0(),(
),0(),(
BGp
BGp
NpN
NpNT
Scatter plot of simulated expression data
500 of bias estimation background with points data 10,000 (b)
estimation background from bias no with points data 10,000 )(a
dog-leg effect
Correction of background Correction of background estimation estimation (cont.)(cont.)
G = aR+b
we employ a chi-square fitting method that minimizes
N
k GR
kk
kk
baRG
122
22 ))((
N
k BGBRkk
N
k kkBGBRkk
GRc
RGGRcb
11222
11222
)ˆ2ˆ2)((
)()ˆ2ˆ2)((
Quality Metric for Ratio Quality Metric for Ratio StatisticsStatistics
For a given cDNA target, the following factors affect ratio measurement quality:
(1) Weak fluorescent intensities
(2) A smaller than normal detected target area
(3) A very high local background level
(4) A high standard deviation of target intensity
(1)Fluorescent intensity (1)Fluorescent intensity measurement quality measurement quality
Under the null hypothesis, the signal means are equal, so that
B
R
BGBR
RGR SNRSNR
},max{
},min{
otherwise , 1
6ˆ 2
GR3 ,
ˆ 6
GR
3ˆ 2
GR ,0
obtain to,ˆ and G)/2(R ,estimators
hypothesis-nullby their and replace We
BB
B
B
BR
Iw
(2)Target area measurement (2)Target area measurement quality quality
.target
theof components connectedlargest two theof
area thebe let and tip,-print particular afor
t cDNA targe theofmask of area thebe Let
k
A
A
kT
M
otherwise ,1
20.0 ,
}05.0,/10max{a ,0
by
qualityt measuremen area the define We
./
istarget each of area alproportion The
minmin
min
min
bb
M
a
MTk
sasss
a-s
As
w
AAak
(3)Background flatness quality(3)Background flatness quality
Define background flatness
similarly. defined is and
6 ,0
64 ,3
)6(
4 ,1
where},,min{
BG
BRBRk
BRBRkBRBRBR
kBRBR
BRBRk
BR
BGBRb
w
BR
BRBR
BR
w
www
(4)Signal intensity consistency (4)Signal intensity consistency quality quality
Typical target shap
cv=0.48 cv=0.45 cv=0.31
cv=0.81 cv=0.98 cv=0.59
(4)Signal intensity consistency (4)Signal intensity consistency quality quality (cont.)
9.0 ,1
1.10.9 ,2.0
9.0
1.1 ,0
channels,green and
red for the variationoft coefficienintensity the
between minimun thedenote Letting
min,
min,min,
min,
min,
k
kk
k
s
k
cv
cvcv
cv
w
cv