lecture 25: association geneticssdifazio/popgen_12/lectures/nov30_assoc.pdf · approximately...
TRANSCRIPT
Lecture 25: Association Genetics
November 30, 2012
Announcements Final exam on Monday, Dec 10 at 11 am, in
3306 LSB
2010 exam and study sheets posted on website
Exam is mostly non-cumulative
Review session on Friday, Dec. 7
Extra credit lab next Wednesday: up to 10 points
Extra credit report due at final exam
Last Time
Quantitative traits
Genetic basis
Heritability
Linking phenotype to genotype
QTL analysis introduction Limitations of QTL
Today
Association genetics
Effects of population structure
Transmission Disequilibrium Tests
Quantitative Trait Locus Mapping
HEIG
HT
GENOTYPE BB Bb bb
♦ ♦
♦ ♦ ♦
♦ ♦ ♦
♦
modified from D. Neale
a b c
A B C
A B C
Parent 1 Parent 2
X a b c
F1 F1
X A B C
a b c
A B C
a b c
A B c
a B c
a B c
A b c
A B c
a B c
A b c
A b c
a b c
A b c
A B C
A B c
A b c
a B c
a B c
A b c
a B c
a B c
B b
Bb BB BB BB bb bb BB Bb Bb
QTL for aggressive behavior in mice
X chromosome
Monoamine Oxidase A (MAOA)
Brodkin et al. 2002
http://people.bu.edu/jcherry/webpage/pheromone.htm
X A B C
A B C
a b c
a b c
F1 X A B C
a b c
A B C
a b c
A B c
a B c
a B c
A b c
A B c
a B c
A b c
A b c
Monoamine Oxidase A (MAOA) Selectively degrades serotonin, norephinephrine, and
dopamine
Located near QTL for aggressive behavior on the X chromosome
Levels of expression affected by a VNTR (minisatellite) locus in the promoter region
Sabol et al. 1998
MAOA and childhood maltreatment
Caspi et al. 2002
Genotype-by-Environment interaction
QTL Limitations
Biased toward detection of large-effect loci
Need very large pedigrees to do this properly
Limited genetic base: QTL may only apply to the two individuals in the cross!
Genotype x Environment interactions rampant: some QTL only appear in certain environments
Huge regions of genome underly QTL, usually hundreds of genes
How to distinguish among candidates?
Linkage Disequilibrium and Quantitative Trait Mapping
Linkage and quantitative trait locus (QTL) analysis
Need a pedigree and moderate number of molecular markers
Very large regions of chromosomes represented by markers
Association Studies with Natural Populations
No pedigree required Need large numbers of genetic markers Small chromosomal segments can be
localized Many more markers are required than in
traditional QTL analysis Cardon and Bell 2001, Nat. Rev. Genet. 2: 91-99
Association Mapping
ancestral chromosomes
* T G
recombination through evolutionary history
present-day chromosomes in natural population * T G
* T A
C G C A
* T G C A
Slide courtesy of Dave Neale
HEIG
HT
GENOTYPE CC TC TT
♦ ♦
♦ ♦ ♦
♦ ♦ ♦
♦
Candidate Gene Associations vs. Whole Genome Scans If LD is high and haplotype
blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes
Simplest for case-control studies (e.g., disease, gender)
If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations
Biased by existing knowledge
Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations
P_2852_A 157.3
P_2385_A
AB
OV
E:B
ELO
W
CO
AR
SE RO
OT
P_204_C 0.0 S8_32 8.8 P_2385_C 11.6 T4_10 12.1 S15_8 S5_37 13.8 T4_7 S6_12 15.5 S8_29 17.9 P_2786_A S12_18 20.4 T1_13 22.3 T7_4 23.5 T3_13 T3_36 S17_21 24.1 S15_16 T12_15 25.3 T2_30 26.5 S13_20 29.5 S1_20 36.5 T9_1 S1_19 43.2 S3_13 50.5 S1_24 52.9 S2_7 54.1 P_575_A 59.1 T12_22 60.6 S2_32 85.0 T7_9 95.7 S2_6 107.8 S13_16 T5_25 121.4 T5_12 124.3 T10_4 129.0 T1_26 T7_13 135.7 P_93_A 148.6 S4_20 150.2 S7_13 S7_12 T12_4 152.8 S4_24 T3_10 S6_4 154.1
S3_1 163.4 S6_20 S13_31 T7_15 171.3 T2_31 178.2 S8_4 180.8 S8_28 182.1 O_30_A 184.2 T5_4 193.5 T3_17 198.1 T12_12 206.8 S5_29 210.6 P_2789_A 219.9 P_634_A S17_43 226.5 S17_33 230.3 S17_12 232.7 S4_19 243.1
S17_26 262.9
I QTL Candidate
Region Candidate
Gene Identification
Human HapMap Project and Whole Genome Scans
LD structure of human Chromosome 19 (www.hapmap.org) 1 common SNP genotyped every 5kb for 269 individuals 9.2 million SNP in total
Take advantage of haplotype blocks to efficiently scan genome
NATURE|Vol 437|27 October 2005
Next-Generation Sequencing and Whole Genome Scans The $1000 genome is on the
horizon
Current cost with Illumina HiSeq 2000 is about $2000 for 10X depth
The 1000 genomes project has sequenced thousands of human genomes at low depth
Can detect most polymorphisms with frequency >0.01
True whole genome association studies now possible at a very large scale
http://www.1000genomes.org/
Identifying genetic mechanisms of simple vs. complex diseases
Simple (Mendelian) diseases: Caused by a single major gene
High heritability; often can be recognized in pedigrees Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell
Anemia Tools: Linkage analysis, positional cloning Over 2900 disease-causing genes have been identified thus far: Human
Gene Mutation Database: www.hgmd.cf.ac.uk Complex (non-Mendelian) diseases: Caused by the interaction
between environmental factors and multiple genes with minor effects
Interactions between genes, Low heritability Example: Heart disease, Type II diabetes, Cancer, Asthma Tools: Association mapping, SNPs !! Over 35,000 SNP associations have been identified thus far:
http://www.snpedia.com
Slide adapted from Kermit Ritland
Complicating factor: Trait Heterogeneity Same phenotype has multiple genetic mechanisms underlying it
Slide adapted from Kermit Ritland
Case-Control Example: Diabetes
Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States
High rate of Type II diabetes in these populations
Found significant associations with Immunoglobin G marker (Gm)
Does this indicate underlying mechanisms of disease?
Knowler et al. (1988) Am. J. Hum. Genet. 43: 520
Type 2 Diabetes present absent Total
present 8 29 37 absent 92 71 163 Total 100 100 200
Gm Haplotype
(1) Test for an association
Χ21 = (ad - bc)2N .
(a+c)(b+d)(a+b)(c+d)
Case-control test for association (case=diabetic, control=not diabetic)
Question: Is the Gm haplotype associated with risk of Type 2 diabetes???
(2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes
= [(8x71)-(29x92)]2 (200) (100)(100)(37)(163)
= 14.62
Slide adapted from Kermit Ritland
Index of indian Heritage
Gm Haplotype
Percent with diabetes
0 Present
Absent
17.8
19.9
4 Present
Absent
28.3
28.8
8 Present
Absent
35.9
39.3
Case-control test for association (continued)
Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???
The real story: Stratify by American Indian heritage
0 = little or no indian heritage; 8 = complete indian heritage
Conclusion: The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage
Slide adapted from Kermit Ritland
Assume populations are historically isolated
One has higher disease frequency by chance
Unlinked loci are differentiated between populations also
Unlinked loci show disease association when populations are lumped together
Population structure and spurious association
Alleles at neutral locus
Alleles causing susceptibility to disease
Population with low disease frequency
Population with high disease frequency
Gen
e flo
w b
arri
er
Association Study Limitations
Population structure: differences between cases and controls
Genetic heterogeneity underlying trait
Random error/false positives
Inadequate genome coverage
Poorly-estimated linkage disequilibrium
a=# times M transmitted
b=# times M not transmitted
(a-b)2/(a+b)
Approximately distributed as χ2 with 1 degree of freedom
Transmission Disequilibrium Test (TDT) (Spiegelman et
al 1993)
Mm
Mm
mm
Mm
mm
mm
Slide adapted from Kermit Ritland
Compare diseased offspring genotypes to parental genotypes to test if loci violate Mendelian expectations
Controls for population structure
Compared with “standard” association tests:
Still need to have tight LD, so need many markers:
Is not affected by population stratification
Only detects signal if there is both linkage and association, does not depend on mode of inheritance
Uses only affected progeny (and parental genotypes), so method is efficient
Transmission Disequilibrium Test (TDT)
Association Tests and Population Structure Transmission disequilibrium
tests have limited power and range of application
sample size limitations restricted allelic diversity
“Genomic Control” uses random markers throughout genome to control for false associations
“Mixed Model” approach allows incorporation of known relatedness and population structure simultaneously Cardon and Bell 2001 Nature Reviews Genetics 2:91
ANOVA/Regression Model
(monotonic) transformation
phenotype (response variable)
of individual i
effect size (regression coefficient)
coded genotype (feature) of individual i
p(β=0) error
(residual)
Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes
and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)
http://www2.unil.ch/cbg/index.php?title=Genome_Wide_Association_Studies
Mixed Model
phenotype (response variable)
of individual i
effect of target SNP Family effect (Kinship
coefficient)
Population Effect (e.g., Admixture coefficient from
Structure or values of Principal Components)
effects of background
SNPs
Implemented in the Tassel program (Wednesday in lab)
Commercial Services for Human Genome-Wide SNP Characterization
NATURE|Vol 437|27 October 2005
Assay 1.2 million “tag SNPs” scattered across genome using Illumina BeadArray technology
Ancestry analyses and disease/behavioral susceptibility