lecture 25: association geneticssdifazio/popgen_12/lectures/nov30_assoc.pdf · approximately...

27
Lecture 25: Association Genetics November 30, 2012

Upload: others

Post on 09-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Lecture 25: Association Genetics

November 30, 2012

Page 2: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Announcements   Final exam on Monday, Dec 10 at 11 am, in

3306 LSB

  2010 exam and study sheets posted on website

  Exam is mostly non-cumulative

  Review session on Friday, Dec. 7

  Extra credit lab next Wednesday: up to 10 points

  Extra credit report due at final exam

Page 3: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Last Time

 Quantitative traits

 Genetic basis

 Heritability

  Linking phenotype to genotype

 QTL analysis introduction  Limitations of QTL

Page 4: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Today

  Association genetics

  Effects of population structure

  Transmission Disequilibrium Tests

Page 5: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Quantitative Trait Locus Mapping

HEIG

HT

GENOTYPE BB Bb bb

♦ ♦

♦ ♦ ♦

♦ ♦ ♦

modified from D. Neale

a b c

A B C

A B C

Parent 1 Parent 2

X a b c

F1 F1

X A B C

a b c

A B C

a b c

A B c

a B c

a B c

A b c

A B c

a B c

A b c

A b c

a b c

A b c

A B C

A B c

A b c

a B c

a B c

A b c

a B c

a B c

B b

Bb BB BB BB bb bb BB Bb Bb

Page 6: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

QTL for aggressive behavior in mice

X chromosome

Monoamine Oxidase A (MAOA)

Brodkin et al. 2002

http://people.bu.edu/jcherry/webpage/pheromone.htm

X A B C

A B C

a b c

a b c

F1 X A B C

a b c

A B C

a b c

A B c

a B c

a B c

A b c

A B c

a B c

A b c

A b c

Page 7: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Monoamine Oxidase A (MAOA)   Selectively degrades serotonin, norephinephrine, and

dopamine

  Located near QTL for aggressive behavior on the X chromosome

  Levels of expression affected by a VNTR (minisatellite) locus in the promoter region

Sabol et al. 1998

Page 8: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

MAOA and childhood maltreatment

Caspi et al. 2002

Genotype-by-Environment interaction

Page 9: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

QTL Limitations

 Biased toward detection of large-effect loci

 Need very large pedigrees to do this properly

 Limited genetic base: QTL may only apply to the two individuals in the cross!

 Genotype x Environment interactions rampant: some QTL only appear in certain environments

 Huge regions of genome underly QTL, usually hundreds of genes

 How to distinguish among candidates?

Page 10: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Linkage Disequilibrium and Quantitative Trait Mapping

  Linkage and quantitative trait locus (QTL) analysis

  Need a pedigree and moderate number of molecular markers

  Very large regions of chromosomes represented by markers

  Association Studies with Natural Populations

  No pedigree required   Need large numbers of genetic markers   Small chromosomal segments can be

localized   Many more markers are required than in

traditional QTL analysis Cardon and Bell 2001, Nat. Rev. Genet. 2: 91-99

Page 11: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Association Mapping

ancestral chromosomes

* T G

recombination through evolutionary history

present-day chromosomes in natural population * T G

* T A

C G C A

* T G C A

Slide courtesy of Dave Neale

HEIG

HT

GENOTYPE CC TC TT

♦ ♦

♦ ♦ ♦

♦ ♦ ♦

Page 12: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Candidate Gene Associations vs. Whole Genome Scans   If LD is high and haplotype

blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes

  Simplest for case-control studies (e.g., disease, gender)

  If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations

  Biased by existing knowledge

  Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations

P_2852_A 157.3

P_2385_A

AB

OV

E:B

ELO

W

CO

AR

SE RO

OT

P_204_C 0.0 S8_32 8.8 P_2385_C 11.6 T4_10 12.1 S15_8 S5_37 13.8 T4_7 S6_12 15.5 S8_29 17.9 P_2786_A S12_18 20.4 T1_13 22.3 T7_4 23.5 T3_13 T3_36 S17_21 24.1 S15_16 T12_15 25.3 T2_30 26.5 S13_20 29.5 S1_20 36.5 T9_1 S1_19 43.2 S3_13 50.5 S1_24 52.9 S2_7 54.1 P_575_A 59.1 T12_22 60.6 S2_32 85.0 T7_9 95.7 S2_6 107.8 S13_16 T5_25 121.4 T5_12 124.3 T10_4 129.0 T1_26 T7_13 135.7 P_93_A 148.6 S4_20 150.2 S7_13 S7_12 T12_4 152.8 S4_24 T3_10 S6_4 154.1

S3_1 163.4 S6_20 S13_31 T7_15 171.3 T2_31 178.2 S8_4 180.8 S8_28 182.1 O_30_A 184.2 T5_4 193.5 T3_17 198.1 T12_12 206.8 S5_29 210.6 P_2789_A 219.9 P_634_A S17_43 226.5 S17_33 230.3 S17_12 232.7 S4_19 243.1

S17_26 262.9

I QTL Candidate

Region Candidate

Gene Identification

Page 13: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Human HapMap Project and Whole Genome Scans

  LD structure of human Chromosome 19 (www.hapmap.org)   1 common SNP genotyped every 5kb for 269 individuals   9.2 million SNP in total

  Take advantage of haplotype blocks to efficiently scan genome

NATURE|Vol 437|27 October 2005

Page 14: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Next-Generation Sequencing and Whole Genome Scans   The $1000 genome is on the

horizon

  Current cost with Illumina HiSeq 2000 is about $2000 for 10X depth

  The 1000 genomes project has sequenced thousands of human genomes at low depth

  Can detect most polymorphisms with frequency >0.01

  True whole genome association studies now possible at a very large scale

http://www.1000genomes.org/

Page 15: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Identifying genetic mechanisms of simple vs. complex diseases

  Simple (Mendelian) diseases: Caused by a single major gene

  High heritability; often can be recognized in pedigrees   Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell

Anemia   Tools: Linkage analysis, positional cloning   Over 2900 disease-causing genes have been identified thus far: Human

Gene Mutation Database: www.hgmd.cf.ac.uk   Complex (non-Mendelian) diseases: Caused by the interaction

between environmental factors and multiple genes with minor effects

  Interactions between genes, Low heritability   Example: Heart disease, Type II diabetes, Cancer, Asthma   Tools: Association mapping, SNPs !!   Over 35,000 SNP associations have been identified thus far:

http://www.snpedia.com

Slide adapted from Kermit Ritland

Page 16: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Complicating factor: Trait Heterogeneity Same phenotype has multiple genetic mechanisms underlying it

Slide adapted from Kermit Ritland

Page 17: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Case-Control Example: Diabetes

 Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States

 High rate of Type II diabetes in these populations

 Found significant associations with Immunoglobin G marker (Gm)

 Does this indicate underlying mechanisms of disease?

Knowler et al. (1988) Am. J. Hum. Genet. 43: 520

Page 18: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Type 2 Diabetes present absent Total

present 8 29 37 absent 92 71 163 Total 100 100 200

Gm Haplotype

(1) Test for an association

Χ21 = (ad - bc)2N .

(a+c)(b+d)(a+b)(c+d)

Case-control test for association (case=diabetic, control=not diabetic)

Question: Is the Gm haplotype associated with risk of Type 2 diabetes???

(2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes

= [(8x71)-(29x92)]2 (200) (100)(100)(37)(163)

= 14.62

Slide adapted from Kermit Ritland

Page 19: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Index of indian Heritage

Gm Haplotype

Percent with diabetes

0 Present

Absent

17.8

19.9

4 Present

Absent

28.3

28.8

8 Present

Absent

35.9

39.3

Case-control test for association (continued)

Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???

The real story: Stratify by American Indian heritage

0 = little or no indian heritage; 8 = complete indian heritage

Conclusion: The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage

Slide adapted from Kermit Ritland

Page 20: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

  Assume populations are historically isolated

  One has higher disease frequency by chance

  Unlinked loci are differentiated between populations also

  Unlinked loci show disease association when populations are lumped together

Population structure and spurious association

Alleles at neutral locus

Alleles causing susceptibility to disease

Population with low disease frequency

Population with high disease frequency

Gen

e flo

w b

arri

er

Page 21: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Association Study Limitations

 Population structure: differences between cases and controls

 Genetic heterogeneity underlying trait

 Random error/false positives

 Inadequate genome coverage

 Poorly-estimated linkage disequilibrium

Page 22: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

a=# times M transmitted

b=# times M not transmitted

(a-b)2/(a+b)

Approximately distributed as χ2 with 1 degree of freedom

Transmission Disequilibrium Test (TDT) (Spiegelman et

al 1993)

Mm

Mm

mm

Mm

mm

mm

Slide adapted from Kermit Ritland

  Compare diseased offspring genotypes to parental genotypes to test if loci violate Mendelian expectations

  Controls for population structure

Page 23: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

 Compared with “standard” association tests:

 Still need to have tight LD, so need many markers:

  Is not affected by population stratification

 Only detects signal if there is both linkage and association, does not depend on mode of inheritance

 Uses only affected progeny (and parental genotypes), so method is efficient

Transmission Disequilibrium Test (TDT)

Page 24: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Association Tests and Population Structure   Transmission disequilibrium

tests have limited power and range of application

 sample size limitations  restricted allelic diversity

  “Genomic Control” uses random markers throughout genome to control for false associations

  “Mixed Model” approach allows incorporation of known relatedness and population structure simultaneously Cardon and Bell 2001 Nature Reviews Genetics 2:91

Page 25: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

ANOVA/Regression Model

(monotonic) transformation

phenotype (response variable)

of individual i

effect size (regression coefficient)

coded genotype (feature) of individual i

p(β=0) error

(residual)

Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes

and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

http://www2.unil.ch/cbg/index.php?title=Genome_Wide_Association_Studies

Page 26: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Mixed Model

phenotype (response variable)

of individual i

effect of target SNP Family effect (Kinship

coefficient)

Population Effect (e.g., Admixture coefficient from

Structure or values of Principal Components)

effects of background

SNPs

Implemented in the Tassel program (Wednesday in lab)

Page 27: Lecture 25: Association Geneticssdifazio/popgen_12/lectures/Nov30_assoc.pdf · Approximately distributed as χ2 with 1 degree of freedom Transmission Disequilibrium Test (TDT) (Spiegelman

Commercial Services for Human Genome-Wide SNP Characterization

NATURE|Vol 437|27 October 2005

  Assay 1.2 million “tag SNPs” scattered across genome using Illumina BeadArray technology

  Ancestry analyses and disease/behavioral susceptibility