genetics and methodology of twin and family studies
TRANSCRIPT
23rd International Workshop on Statistical Genetics and Methodology of Twin and Family Studies: Advanced course
Pak Sham (director)
Lindon Eaves Jeff Barrett David Evans Goncalo Abecasis Mike Neale Hermine Maes Sarah Medland Dorret Boomsma Danielle Posthuma Meike Bartels
John Hewitt (host) Jeff Lessem Marleen de Moor Stacey Cherny Ben Neale Shaun Purcell Manuel Ferreira Nick Martin Kate Morley Allan McRae
Hunting QTLs – an introduction
Nick MartinQueensland Institute of Medical Research
Boulder workshop: March 2, 2009
Mendel 1865 – genetics of discrete traits
Stature in adolescent twins
Stature
190.0185.0
180.0175.0
170.0165.0
160.0155.0
150.0145.0
Women700
600
500
400
300
200
100
0
Std. Dev = 6.40 Mean = 169.1
N = 1785.00
0
1
2
3
1 Gene 3 Genotypes 3 Phenotypes
0
1
2
3
2 Genes 9 Genotypes 5 Phenotypes
01234567
3 Genes 27 Genotypes 7 Phenotypes
05
1015
20
4 Genes 81 Genotypes 9 Phenotypes
R.A. Fisher, 1918
The explanation of quantitative inheritance in Mendelian terms
unaffected affected
Disease liability
Single threshold
severe
Disease liability
Multiple thresholds
mildnormal mod
Multifactorial Threshold Model of Disease
Complex Trait Model
Disease Phenotype
Commonenvironment
Marker Gene1
Individualenvironment
Polygenicbackground
Gene2
Gene3
Linkage
Linkagedisequilibrium
Mode ofinheritanceLinkage
Association
Using genetics to dissect metabolic pathways: Drosophila eye color
Beadle & Ephrussi, 1936
Finding QTLs
Linkage
Association
First (unequivocal) positional cloning of a complex disease QTL !
Linkage analysis
Thomas Hunt Morgan – discoverer of linkage
Linkage = Co-segregation
A2A4
A3A4
A1A3
A1A2
A2A3
A1A2 A1A4 A3A4 A3A2
Marker allele A1cosegregates withdominant disease
Linkage Markers…
x
1/4 1/4 1/4 1/4
IDENTITY BY DESCENTSib 1
Sib 2
4/16 = 1/4 sibs share BOTH parental alleles IBD = 28/16 = 1/2 sibs share ONE parental allele IBD = 1
4/16 = 1/4 sibs share NO parental alleles IBD = 0
For disease traits (affected/unaffected)Affected sib pairs selected
IBD = 2IBD = 1IBD = 0
1000
250
750
500
Expected 1 2 3 127 310
Markers
For continuous measuresUnselected sib pairs
1.00
0.25
0.75
0.50
IBD = 0 IBD = 1 IBD = 2
Cor
rela
tion
betw
een
sibs
0.00
Twin 1 mole count
Twin 2 mole count
EC
A
Q Q
AC
ErMZ = 1, rDZ = π̂
rMZ = 1, rDZ = 0.5
rMZ = rDZ = 1
q q
a a
c ce e
Linkage for mole counts in Australian twin families
Flat mole count: chromosome 9 linkage in Australian and UK twins
Australia
UK
Linkage for MaxCigs24 in Australia and Finland
AJHG, in press
Linkage Doesn’t depend on “guessing gene” Needs family data ! Works over broad regions and whole genome Only detects large effects (>10%) Requires large samples (10,000’s?) Can’t guarantee close to gene On the whole, has failed to deliver for complex
traits
Association Looks for correlation between specific
alleles and phenotype (trait value, disease risk)
1 2 3 4 51234567890 1234567890 1234567890 1234567890 1234567890
191,044,935 GTGAGTATGG CTTTCTTCCC TCTCCCGCCA CCCCCTGCCC CACACTGCAA Intron 1191,044,985 GCTGCAAACG CGGTACTTTC GGGCTCGCCT TTGACGTTAG GAAACTAGCC191,045,035 TGAGCCTATG CAGGGAAAAA AAATCGAAAA GGTCAATTTG TTAAGTAAGG191,045,085 TTAAATCTGG GTGATGCTCG GGTACAGTTT AAGAACCGAG GGAGACAGTT191,045,135 GATATGAGGG CGGTGGTTGA TGCGCTAAGA AATTGCGGGT TGGCTTTTTG191,045,185 TCCTCCTGCA TTCAAAATGA CATCAGAATC CTGCGGCTGA AGCGCGTCCC rs34774923191,045,235 CAGCATTCAT ACGTTGCATG ATGAGTTCTC ATCAGCTTAC ACAGCTACTG191,045,285 GAAGGTGATG CTCTTGCTGG TTCTGAATAT ACTCGTTTAA AATCCATTTT191,045,335 TGTTTTTTAA TTATAGAGCA GATCTCACCC AGTCCGAATG TGGAACAAAT191,045,385 AATTGTTATG CAGCGTCTGC TTAAAAGAAG TGTCGTAGGT GGGGAAGGAA191,045,435 GAGCGCAGGG GAATCAGTCA CCCACCTCTT TGTACAGTCT CTGGCGTGGT rs10218513191,045,485 CCAGAACCTC CTGCTCTAAA GAGAGAAGCG TGGGCCGGCT CCAGACAGTT191,045,535 CCATGTCTGT CCTTTTCATT AAAGTGCAAA ACGTCTCGGA ATTGTAATTA191,045,585 ACCTTGCAAA CAAACTGATG CCCTTTGTGA GCCAGAAATA GTGTCTGCCT191,045,635 TTTGAACTAA ATTCATTAAC AATTCTTTAA AATACCCTAG TGATTATAGG191,045,685 TAGCCCTGCC CTTAGTTGTA AAACTAGTAG ATACGGTCAG ATTAATGGAC191,045,735 GAAACTGCTC AGTACATGAG GTTTAAATGT TAGGTGGATA AGACTTATTT191,045,785 GAAGAGTTCT TGCTTTGCCT TATGCGGTTT GTCTCTAGTT ACTGGGTGAC191,045,835 TTTATTTGGT AAAAATGCGT TCAGCTGCAG TAGCATATTC AAGTGTTGCT rs2746073191,045,885 AGTTAGTAAT TATCTTTTTA ATTTTTTGTT TTAG
Association studies
Controls Cases Quantitative Traits
Single Nucleotide Polymorphisms
Association More sensitive to small effects Need to “guess” gene/alleles (“candidate
gene”) or be close enough for linkage disequilibrium with nearby loci
May get spurious association (“stratification”) – need to have genetic controls to be convinced
Variation: Single Nucleotide Polymorphisms
Differences (between subjects) in DNA sequence are responsible for (structural) differences in proteins.
Human OCA2 and eye colour
Zhu et al., Twin Research 7:197-210 (2004)
OCA2 Mutations
Missense Mutations
Polymorphisms
G27R S86RC112F R290G
NW273KV H549QP385IM395L
A368V
T404M
A481TI473S
M446VV443I
R720C
W679C
W652R
K614N
T592IK614E
I617L
A724P
A787VS736L
G795RP743L
H/R615 I/T722L/F440
R/Q419
R/W305
D/A257
R290GNW273KV H549Q
P385IM395L
A368V
T404M
A481TI473S
M446VV443I
R720C
W679C
W652R
K614N
T592IK614E
I617L
A724P
A787VS736L
G795RP743L
W679R
H/R615 I/T722L/F440
R/Q419
R/W305
D/A257
N489D
Albinism data base www.cbc.umn.e
OculotaneousAlbinism type 2
LD blocks in OCA2
Association with eye color
Sturm et al., AJHG 82: 424-431, 2008
Sulem et al, Nat Genet 39: 1443, 2007; Kayser et al, AJHG 82: 411-423, 2008; Eiberg et al, Hum Genet 123: 177-187, 2008
A single SNP within intron 86 of HERC2 determines Blue-Brown eye colour
rs12913832 C = Blue rs12913832 T = Brown
HLTF binding
HLTF = Helicase-like transcription factorUses ATP for energy to initiate unwinding of DNA, so allowing transcription
OCA2 promoter
…21 kb
SWI/SNF Chromatin remodelling model for OCA2 gene activation
ATP
Brown eye colour
HLTFPol II
THERC2 intron 86
ADPHLTF
T
Locus Control Region
Heterochromatin
OCA2 open promoter
Pol II
Euchromatin
…21 kb
MITF LEF1
SWI/SNF Chromatin remodelling model for OCA2 gene activation
Blue eye colour
CHERC2 intron 86
OCA2 promoter
…21 kb
ATP
HLTFPol II
Heterochromatin
…21 kb
OCA2 closed promoter
Pol IIHeterochromatinATP
HLTF
MITF LEF1
X
C
Genetic architecture
Effect size
Number of variants
Number of variants
Number of variants Effect size of variants Frequency of risk allele Genetic mode of action
Genetic variance
Allele Frequency
Effect size
Very very Rare
Common
VeryverySmall
Large
What is the genetic architecture of complex genetic disorders?
Not possibleMendelianDisorders
Possible and detectable spectrum of common complex genetic disorders
Not detectable/Not useful
Allele Frequency
Effect size
Very very Rare
Common
VeryverySmall
Large
A potted history of genetic studies
Not possibleMendelianDisorders
Not detectable/Not useful
Linkage studies
Allele Frequency
Effect size
Very very Rare
Common
VeryverySmall
Large
A potted history of genetic studies
Not possibleMendelianDisorders
Not detectable/Not useful
Linkage studies RR >3
Candidate association studies: Effect size RR ~2sample size- hundreds
Association studies until 2007• Candidate regions
• Some successes but generally:– Poor replication– Small sample sizes
• Conclusion – effect sizes are smaller than expected– Poor selection of candidate regions
500 000 - 1. 000 000 SNPsHuman Genome - 3,1x109 Base Pairs
Genome-Wide Association Studies
High density SNP arrays – up to 1 million SNPs
Genome-wide Association Studies• Multiple testing !
– Declare significance at 0.05/500,000 = 10-8
• Follow-up significant SNPs by genotyping in independent replication samples
• Validated associations:- significant across replication samples- same SNP- same allele
5 x 10-8
CACNA1CAnkryin-G (ANK3)
Bipolar GWAS of 10,648 samples
Sample Cases Controls P-valueSTEP 7.4% 5.8% 0.0013WTCCC 7.6% 5.9% 0.0008EXT 7.3% 4.7% 0.0002Total 7.5% 5.6% 9.1×10-9
Sample Case Controls P-valueSTEP 35.7% 32.4% 0.0015WTCCC 35.7% 31.5% 0.0003EXT 35.3% 33.7% 0.0108Total 35.6% 32.4% 7×10-8
X
>1.7 million genotyped and (high confidence) imputed SNPs
Ferreira et al (Nature Genetics, 2008)
GWAS for major depression
PCLO
GWAS for Melanoma Association analysis of SNPs across a region of chromosome 20q11.22 for the combined sample. The x-axis is chromosomal position, the left y-axis –log10(p) for genotyped SNPs. Nature Genetics 2008 Jul;40(7):838-40.
200520062007 first quarter2007 second quarter2007 third quarter2007 fourth quarterFirst quarter 2008
Manolio, Brooks, Collins, J. Clin. Invest., May 2008 Stephen Channock
Functional Classification of 284 SNPs Associated with Complex Traits
0 10 20 30 40 50 60
Other
Intronic
Missense
Synonymous
3' UTR
5' UTR
Percent of Associated SNPs
http://www.genome.gov/gwastudies/
n = 1
n = 2
n = 3
n = 13
n = 119
n = 146
Stephen Channock
Allele Frequency
Effect size
Very very Rare
Common
VeryverySmall
Large
A potted history of genetic studies
Not possibleMendelianDisorders
Not detectable/Not useful
Linkage studies
Candidate association studies: Effect size RR ~2sample size- hundreds
Genome-wide association studies Effect size RR ~1.2Sample size - thousands
Conclusions about genetic architecture of complex genetic traits based on GWAS results
Genetic variation is not tagged by GWAS SNPs
Effect sizes are smaller
Effects sizes of validated variants from 1st 16 GWAS studies
Allele Frequency
Effect size
Very very Rare
Common
VeryverySmall
Large
A potted history of genetic studies
Not possibleMendelianDisorders
Not detectable/Not useful
Linkage studies
Candidate association studies: Effect size RR ~2sample size- hundreds
Genome-wide association studies Effect size RR ~1.2Sample size - thousands
Next Generation GWAS Effect size RR ~1.05Sample size –tens of thousands
GWA of Height
Weedon et al. (in press) Nat Genet
Large numbers are needed to detect QTLs !!!
Collaboration is the name of the game !!!
1914 Cases (WTCCC T2D)4892 Cases (DGI)
6788 Cases (WTCCC HT) 8668 Cases (WTCCC CAD)
2228 Cases (EPIC) 13665 Cases
Significant result
Other loci?
Usual paradigm of association studies
• Identify significant SNPs (not many)• Genotype in independent replication samples• Find causal variants to inform on biology
But only a small proportion of the known genetic variance is explained
What would we expect under a polygenic model of many, many variants each of small effect?
A 5% risk increasing common variant
Relative risk 1.05
Population MAF 0.2
MAF in cases 0.2079
MAF in controls 0.1999
Power α=1×10-6
(N=10,000 cases, 10,000 controls)0.2%
Proportion ranked in top half(N=ISC)
72%
Schizophrenia (ISC) Q-Q plot
Consistent with:
Stratification?
Genotyping bias?
Distribution of true polygenic effects?
Obs
erve
d -lo
g10(
p)
Expected -log10(p)
λ = 1.092
Indexing polygenic variance with large sets of weakly associated alleles
Individuals’“polygenic scores”
Top 20% independent
SNPs
Discovery set
Target set
Score # of “nominal risk
alleles”
Do target cases have a higherallele load?
→ Independent SCZ studies (MGS, O’Donovan)→ Bipolar disorder (STEP-BD, WTCCC)→ Non-psychiatric disease (WTCCC)
→ ISCISC
Douglas Levinson, Pablo Gejman,Jianxin Shi and colleagues
0
0.01
0.02
0.03P < 0.1
P < 0.2
P < 0.3
P < 0.4
P < 0.5
R2 P=2×10-28
0.008
5×10-11
7×10-9
1×10-12
0.71 0.050.30 0.65 0.23 0.06
Schizophrenia Bipolar disorder Non-psychiatric (WTCCC)
CAD CD HT RA T1D T2DMGSEuro.
MGSAf-Am O’Donovan STEP-BD WTCCC
A greater load of “nominal” schizophrenia alleles (from ISC)?
ISC X Test
Can predict bipolar from SzSNPs, but not other diseases
Predictive information on Risk from up to 50% of Top 20% SNPs in a GWAS !
Vink et al, 2009 AJHG in press
Pathway (Ingenuity) analysis of GWAS for smoking
Even for “simple” diseasesthe number of alleles is large
Ischaemic heart disease (LDR) >190 Breast cancer (BRAC1) >300 Colorectal cancer (MLN1) >140
[Science 2004]
Complex disease: common or rare alleles?
Increasing evidence for Common Disease – Rare Variant
hypothesis (CDRV)
62
Human 1M HapMap Coverage by Population
Human 1M CEU (mean 0.96 median 1.0)
Human 1M CHB+JPT (mean 0.95 median 1.0)
Human 1M YRI (mean 0.85 median 1.0)
GENOME COVERAGE ESTIMATED FROM 990,000 HAPMAP SNPs IN HUMAN 1M
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
>0 >0.1 >0.2 >0.3 >0.4 >0.5 >0.6 >0.7 >0.8 >0.9
MAX r2
CO
VER
AG
E O
F H
APM
AP
REL
EASE
21
~95%~94%
~74%
http://www.genemapping.org/
We also run two journals (1)
• Editor: John Hewitt• Editorial assistant
Christina Hewitt• Publisher: Kluwer
/Plenum• Fully online• http://www.bga.org
We also run two journals (2) Editor: Nick Martin Editorial assistant +
subscriptions: Lorin Grey
Publisher: Australian Academic Press
Fully online http://www.ists.qimr.e
du.au/journal.html