Yandell © 2003 Wisconsin Symposium III 1
QTLs & Microarrays OverviewBrian S. Yandell
University of Wisconsin-Madisonwww.stat.wisc.edu/~yandell/statgen
• what is a QTL?
• what is the goal of your QTL study?
• why worry about multiple QTL?
• why study multiple traits together?
Yandell © 2003 Wisconsin Symposium III 2
what is a QTL?• QTL = quantitative trait locus (or loci)
– trait = phenotype = characteristic of interest– quantitative = measured somehow
• glucose, insulin, gene expression level
• Mendelian genetics– allelic effect + environmental variation
– locus = location in genome affecting trait• gene or collection of tightly linked genes
• some physical feature of genome
Yandell © 2003 Wisconsin Symposium III 3
typical phenotype assumptions• normal "bell-shaped" environmental variation• genotypic value GQ is composite of m QTL• genetic uncorrelated with environment
22
2
)var(
)var()|(var
)|(
Q
Q
Q
G
Gh
QY
GQYE
8 9 10 11 12 13 14 15 16 17 18 19 20
qq QQ
Qqdatahistogram
Yandell © 2003 Wisconsin Symposium III 4
what is the goal of QTL study?• uncover underlying biochemistry
– identify how networks function, break down– find useful candidates for (medical) intervention– epistasis may play key role– statistical goal: maximize number of correctly identified QTL
• basic science/evolution– how is the genome organized?– identify units of natural selection– additive effects may be most important (Wright/Fisher debate)– statistical goal: maximize number of correctly identified QTL
• select “elite” individuals– predict phenotype (breeding value) using suite of characteristics
(phenotypes) translated into a few QTL– statistical goal: mimimize prediction error
Yandell © 2003 Wisconsin Symposium III 5
why worry about multiple QTL?• so many possible genetic architectures!
– number and positions of loci– gene action: additive, dominance, epistasis– how to efficiently search the model space?
• how to select “best” or “better” model(s)?– what criteria to use? where to draw the line?– shades of gray: exploratory vs. confirmatory study– how to balance false positives, false negatives?
• what are the key “features” of model?– means, variances & covariances, confidence regions– marginal or conditional distributions
Yandell © 2003 Wisconsin Symposium III 6
0 5 10 15 20 25 30
01
23
rank order of QTL
addi
tive
eff
ect
Pareto diagram of QTL effects
54
3
2
1
major QTL onlinkage map
majorQTL
minorQTL
polygenes
(modifiers)
Yandell © 2003 Wisconsin Symposium III 7
advantages of multiple QTL approach• improve statistical power, precision
– increase number of QTL detected
– better estimates of loci: less bias, smaller intervals
• improve inference of complex genetic architecture– patterns and individual elements of epistasis
– appropriate estimates of means, variances, covariances• asymptotically unbiased, efficient
– assess relative contributions of different QTL
• improve estimates of genotypic values– less bias (more accurate) and smaller variance (more precise)
– mean squared error = MSE = (bias)2 + variance
Yandell © 2003 Wisconsin Symposium III 8
epistasis examples(Doebley Stec Gustus 1995; Zeng pers. comm.)
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
geno
typi
c va
lue
aa Aa AA
bb
BbBB 1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
geno
typi
c va
lue
bb Bb BB
aa
AaAA
-1.0
-0.5
0.0
0.5
effe
ct +
/- 2
se
a1 d1 a2 d2 iaa iad ida idd
20
40
60
80
10
0ge
noty
pic
valu
e
aa Aa AA
bbBb
BB2
04
06
08
01
00
geno
typi
c va
lue
bb Bb BB
aa
Aa
AA
-40
-20
02
0ef
fect
+/-
2 s
e
a1 d1 a2 d2 iaa iad ida idd
02
46
geno
typi
c va
lue
aa Aa AA
bb
BbBB 02
46
geno
typi
c va
lue
bb Bb BB
aa
AaAA -2-1
01
23
effe
ct +
/- 2
se
a1 d1 a2 d2 iaa iad ida idd
traits 1,4,91: dom-dom interaction4: add-add interaction9: rec-rec interaction(Fisher-Cockerham effects)
4
91
Yandell © 2003 Wisconsin Symposium III 9
Type 2 Diabetes Mellitus
from Unger & Orci FASEB J. (2001) 15,312
Yandell © 2003 Wisconsin Symposium III 10
Insulin Requirement
from Unger & Orci FASEB J. (2001) 15,312
decompensation
Yandell © 2003 Wisconsin Symposium III 11
studying diabetes in an F2
• segregating cross of inbred lines– B6.ob x BTBR.ob F1 F2– selected mice with ob/ob alleles at leptin gene (chr 6)– measured and mapped body weight, insulin, glucose at various ages
• (Stoehr et al. 2000 Diabetes)– sacrificed at 14 weeks, tissues preserved
• gene expression data– Affymetrix microarrays on parental strains, F1
• key tissues: adipose, liver, muscle, -cells• novel discoveries of differential expression (Nadler et al. 2000 PNAS; Lan et
al. 2002 in review; Ntambi et al. 2002 PNAS)– RT-PCR on 108 F2 mice liver tissues
• 15 genes, selected as important in diabetes pathways• SCD1, PEPCK, ACO, FAS, GPAT, PPARgamma, PPARalpha, G6Pase, PDI,
…
Yandell © 2003 Wisconsin Symposium III 12
why map gene expressionas a quantitative trait?
• cis- or trans-action?– does gene control its own expression? – evidence for both modes (Brem et al. 2002 Science)
• mechanics of gene expression mapping– measure gene expression in intercross (F2) population– map expression as quantitative trait (QTL technology)– adjust for multiple testing via false discovery rate
• research groups working on expression QTLs– review by Cheung and Spielman (2002 Nat Gen Suppl)
– Kruglyak (Brem et al. 2002 Science)
– Doerge et al. (Purdue); Jansen et al. (Waginingen)
– Williams et al. (U KY); Lusis et al. (UCLA)
– Dumas et al. (2000 J Hypertension)
Yandell © 2003 Wisconsin Symposium III 13
0 50 100 150 200 250 300
02
46
8LO
D
chr2 chr5 chr9
0 50 100 150 200 250 300
-0.5
0.0
0.5
1.0
effe
ct (
add=
blue
, do
m=
red)
chr2 chr5 chr9
Multiple Interval MappingSCD1: multiple QTL plus epistasis!
Yandell © 2003 Wisconsin Symposium III 14
2-D scan: assumes only 2 QTL!
epistasis LOD
joint LOD
Yandell © 2003 Wisconsin Symposium III 15
trans-acting QTL for SCD1(no epistasis yet: see Yi, Xu, Allison 2003)
dominance?
Yandell © 2003 Wisconsin Symposium III 16
Bayesian model assessment:chromosome QTL pattern for SCD1
1 3 5 7 9 11 13 15
0.00
0.05
0.10
0.15
4:1,
2,2*
34:
1,2*
2,3
5:3*
1,2,
35:
2*1,
2,2*
35:
2*1,
2*2,
36:
3*1,
2,2*
36:
3*1,
2*2,
35:
1,2*
2,2*
36:
4*1,
2,3
6:2*
1,2*
2,2*
32:
1,3
3:2*
1,2
2:1,
2
3:1,
2,3
4:2*
1,2,
3
model index
mod
el p
oste
rior
pattern posterior
0.2
0.4
0.6
0.8
model index
post
erio
r /
prio
r
Bayes factor ratios
1 3 5 7 9 11 13 15
3 44 4
55 5
6 65
66
23
2
weak
moderate
Yandell © 2003 Wisconsin Symposium III 17
high throughput dilemma• want to focus on gene expression network
– hundreds or thousands of genes/proteins to monitor – ideally capture networks in a few dimensions
• multivariate summaries of multiple traits
– elicit biochemical pathways• (Henderson et al. Hoeschele 2001; Ong Page 2002)
• may have multiple controlling loci– allow for complicated genetic architecture– could affect many genes in coordinated fashion– could show evidence of epistasis– quick assessment via interval mapping may be misleading
Yandell © 2003 Wisconsin Symposium III 18
why study multiple traits together?• environmental correlation
– non-genetic, controllable by design– historical correlation (learned behavior)– physiological correlation (same body)
• genetic correlation– pleiotropy
• one gene, many functions• common biochemical pathway, splicing variants
– close linkage• two tightly linked genes• genotypes Q are collinear
Yandell © 2003 Wisconsin Symposium III 19
idea of mapping microarrays(Jansen Nap 2001)
Yandell © 2003 Wisconsin Symposium III 20
goal: unravel biochemical pathways(Jansen Nap 2001)
Yandell © 2003 Wisconsin Symposium III 21
central dogma via microarrays(Bochner 2003)
Yandell © 2003 Wisconsin Symposium III 22
coordinated expression in mouse genome (Schadt et al. 2003)
expression pleiotropy
in yeast genome(Brem et al. 2002)
Yandell © 2003 Wisconsin Symposium III 23
PC simply rotates & rescalesto find major axes of variation
-5 0 5
-50
5
mRNA1
mR
NA
2
-2 -1 0 1
-2-1
01
2
V1
V2
Yandell © 2003 Wisconsin Symposium III 24
multivariate screenfor gene expressing mapping
principal componentsPC1(red) and SCD(black)
PC2
(22%
)
PC1 (42%)
Yandell © 2003 Wisconsin Symposium III 25
mapping first diabetes PC as a trait
0 50 100 150 200 250 300
0.0
00
0.0
20
loci
his
togr
am
hong7pc.bim summaries with pattern ch2,ch5,ch9
ch2 ch5 ch9
0 50 100 150 200 250 300
-1.0
0.5
addi
tive
ch2 ch5 ch9
0 50 100 150 200 250 300
-2.0
0.0
dom
inan
ce
ch2 ch5 ch9
Yandell © 2003 Wisconsin Symposium III 26
pFDR for PC1 analysis
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
relative size of HPD region
pr(
H=
0 | p
>si
ze )
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
pr( locus in HPD | m>0 )
BH
pF
DR
(-)
and
size
(.)
00.
050.
10.
150.
2S
tore
y pF
DR
(-)
prior probability
fraction of posterior
found in tails