some current issues in qtl identification lon cardon wellcome trust centre for human genetics...

Post on 04-Jan-2016

221 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Some current issues in QTL identification

Lon CardonWellcome Trust Centre for Human Genetics

University of Oxford

Acknowledgements: Goncalo AbecasisStacey ChernyTwin course faculty

Positional Cloning

LO

D

Sib pairs Chromosome Region Association Study

Genetics

GenomicsPhysical Mapping/Sequencing

Candidate Gene Selection/Polymorphism Detection

Mutation Characterization/Functional Annotation

Inflammatory Bowel Disease Genome Screen

Hampe et al., Am J Hum Genet, 64:808-816, 1999

Inflammatory Bowel Disease Genome Screen

Hampe et al., Am J Hum Genet, 64:808-816, 1999

Susceptibility locus mapped for Crohn’s Disease

Genome Screens for Linkage in Sib-pairs

1997/98- Diabetes (IDDM + NIDDM)- Asthma/atopy- Osteoporosis- Obesity- Multiple Sclerosis- Rheumatoid arthritis- Systemic lupus erythematosus- Ankylosing spondylitis- Epilepsy- Inflammatory Bowel Disease- Celiac Disease- Psychiatric Disorders (incl. Scz, bipolar)- Behavioral traits (incl. Personality, panic)- others missed...

1999- NIDDM- Asthma/atopy- Psoriasis- Inflammatory Bowel Disease- Osteoporosis/Bone Mineral Density- Obesity- Epilepsy- Thyroid disease- Pre-eclampsia- Blood pressure- Psychiatric disorders (incl. Scz, bipolar)- Behavioral traits (incl. smoking, alcoholism,

autism)- Familial combined hyperlipidemia- Tourette syndrome- Systemic lupus erythematosus- others missed…

Human QTL Linkage Gene Identification Successes

0Well, at least < 5

Why so few successes in human QTL mapping?

Many valid reasons proposed:• Phenotypic complexity (not measured well)• Genetic complexity (many genes of small effect, GxE, epistasis)• Genotype error• Sampling design• Statistical methods• ….

Most linkage studies have been under-powered (and over-hyped)

QTL Mapping has very low power !1000 sibs, no parents: markers every 10 cM, each marker H=0.8

QTLh2=0.33

Kruglyak L, Lander ES. (1995). Am J Hum Genet 57: 439-454

Increasing power to detect linkage in sib-pairs

• Phenotypic selection– Carey & Williamson, 1991, AJHG

– Eaves & Meyer, 1994, Behav Genet

– Cardon & Fulker, 1994, AJHG

– Risch & Zhang, 1996, AJHG

Equivalent full sample N for 200 selected pairs from 10,000 (QTL allele freq = .2)

Concordant Discordant Combined

Additive 1400 3300 5000

Recessive 6000 3100 9500

Dominant 1400 3100 4400

1 2 3 4 5 6 7 8 9 102

46

810100

150

200

250

300

350

Decile ranking - Sib 1

Sib 2

Info

rma

tio

n s

core

Information Score for Additive Gene Action (p=0.5)

Linkage Analysis of QTLs-Summary-

• Spotted history. Few, if any, bona fide successes• Power has been large problem

• Of the few replicated loci, most have used some form of selection• EDAC, other selection schemes from large cohorts now underway• Genome-scans coming soon

Promising beginning for QTL linkage mapping

Positional Cloning

LO

D

Sib pairs Chromosome Region Association Study

Genetics

GenomicsPhysical Mapping/Sequencing

Candidate Gene Selection/Polymorphism Detection

Mutation Characterization/Functional Annotation

Association Analysis

• Simple genetic basis

Short unit of resemblancePopulation-specific

• One of easiest genetic study

designs

Correlate allele frequencies with traits/diseasesAt core of monogenic & oligo/polygenic trait models

• Widely used in past 20 years

HLA, candidate genes, pharmacogenetics, positional cloning

Angiotensin-1 Converting Enzyme

Keavney et al. (1999) Hum Mol Gen, 7:1745-1751

Evidence for Linkage

0

5

10

LO

D

A-5466C A-240T T1237C I/D 4656(CT)3/2

T-5991C T-3892C T-93C G2215A G2350A

Results of ACE analysis using VC association model

A-5466C A-240T T1237C I/D 4656(CT)3/2

T-5991C T-3892C T-93C G2215A G2350A0

5

10

15

LOD

for Linkage for Association

Alzheimers and ApoE4

Roses, Nature 2000

Association Resolution by Position

Roses, Nature 2000

Decay of Linkage Disequilibrium in a Small Set of Genes

Toward a linkage disequilibrium map of the human genome

• > 10 year ago, emphasis mainly on theory - LD measures, decay, population comparisons, …

• 1989: 1st use of LD for disease mapping: Cystic Fibrosis

• Recent years, gene-based haplotypes used widely for monogenic mapping

• Last 2 years: larger scale assessment of common alleles in reference populations

LD/haplotype map objective: find regions of high and low ancestral conservation to clarify signal/noise in allelic association studies

History of LD studies in humans:

Haplotype Map: Data/Interpretations

Distribution of pairwise LD ‘average extent of LD’

LD differences in genes

Eaves et al, Nat Genet 2000 Taillon-Miller et al, Nat Genet 2000

Stephens et al, Science 2001

Reich et al, Nature 2001

Johnson et al, Nat Genet 2001 Abecasis et al, AJHG 2001

Haplotype Map: Data/Interpretations

Local patterns of LD … Conserved haplotype segments ... ‘Blocks’

5q31. Daly et al, Nat Genet 2001

MHC class II. Jeffreys et al, Nat Genet 2001

Chr21. Patil et al, Science 2001

Current Status: Data/Interpretations

• How to define ‘useful’ LD is still unclear

• Easier to focus on pairwise LD rather than haplotypes. Is this efficient?

• For common alleles, D’ measure, LD extends ~ 50-60 kb on averageFor rare alleles, ?

• There is great variability in regional patterns of LDExplanations, predictors yet unknown

• Haplotype blocks are detectable and present broadly

• Size of blocks? How best to define them? Utility of htSNPs?

Human Genome Haplotype Map

1. NIH/TSC/Wellcome Trust funded international collaboration (likely)- follow-on from human sequencing project & SNP consortium

2. Hierarchical strategy- ‘sparse-map’ then more fine- Initially use available SNPs

3. Multiple populations- some family-based, most likely to be unrelateds

4. Aim is to catalog regions of high LD down to very fine-scale (ie., find big and small blocks)

Human Chromosome 22• First human chromosome to be “fully” sequenced

• Extensive knowledge of genomic landscape

• Abundance of SNPs and other variants/bp

~34.5 Mb on q-arm; p-arm mostly structural RNA; 679 genes on qDunham et al, Nature, 1999

Samples

• 7 x 3 generation CEPH families– 77 Individuals– 59 founder chromosomes– 1505 SNPs successfully genotyped

• 90 Unrelated Caucasian Individuals– 1286 SNPs genotyped (1261 overlapping with CEPHs)

• 51 Unrelated Estonian Individuals– 908 SNPs genotyped (594 overlapping with CEPHs)

N = 1505 markers. Median spacing = 15.07kb. 4 gaps > 200 kb. Smallest = 12 bp; largest = 293 kb.

Marker spacing

0

100

200

300

400

500

600

< 5k

b5-

1011

-20

21-3

0

31-4

0

41-5

0

51-6

0

61-7

0

71-8

0

81-9

0

91-1

00

101-

110

111-

120

121-

130

131-

140

> 15

0kb

Spacing bin

Co

un

t

N=1505

Allele frequencies on Chromosome 22Ceph founders

0

0.05

0.1

0.15

0.2

0.25

< 0.10 .11-.20 .21-.30 .31-.40 < 0.50

Category

Fre

qu

ency

0.00

0.20

0.40

0.60

0.80

1.00

0 200 400 600 800 1000

Physical Distance (kb)

D'

0.00

0.20

0.40

0.60

0.80

1.00

0 200 400 600 800 1000

Physical Distance (kb)

r2

D’

r2

Variability in Pairwise LD

Decay of LD on chromosome 22Means in CEPHs, Unrelateds, Combined & Estonian Samples

Representing LD along a chromosome

Following several trends in genetics, genotyping technology outpaced ability to analyze LD information…

How to characterize regions of ‘interesting’ linkage disequilibrium?

1. Simply examine average levels across region/chromosome?2. Fit models to data, look at expectations & specific predictions3. Consider ‘interesting’ LD tracts as long runs of LD – borrow from

extant statistical approaches4. Look for ‘blocks’ of LD in the genome

LD Along Chromosome 22

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20 25 30

D'

0

200

400

600

0 5 10 15 20 25 30

Position (Mb)

Pre

dic

ted

Hal

f-L

ife

(kb

)Average D’

D’ Half-Life

Disequilibrium Fingerprint

Plus 3 individual blocks:Position SNPs Haplos Length4.6-4.8 M 11 6 231 kb8.2-8.4 M 8 4 264 kb34.3 M 11 3 82 kb

Chromosome 22 Haplotype Blocks

Chr22 High LD: 22-27 Mb

Chr22 Low LD: 27-32 Mb

Recombination Pattern on Chromosome 22

1 Mb/cM

Microsatellite distance

0

10

20

30

40

50

60

0 5 10 15 20 25 30 35

Sequence Position (Mb)

cM

1 Mb/cM

Microsatellite distance

0

10

20

30

40

50

60

0 5 10 15 20 25 30 35

Sequence Position (Mb)

cM

GeneDensity

Recombination and Gene Density on Chromosome 22

Linkage Disequilibrium Map of Chromosome 22 - Summary -

• LD ‘half-length’ ~ 50 kb, but depends on measure & what is “useful” LD

• Family & unrelated samples yield consistent patterns

• Different analytical tools provide complementary views of long blocks

• 15% chromosome 22 in long LD blocks in these samples (40% in shorter blocks) Why? Selection, selective sweeps? Chromosome structure? Popln age?

• LD correlated with gene-density, GC content and related repeats.Gene/GC correlations almost entirely collinear with genetic distance.

LD patterns can immediately assist positional association studies:

Prioritise candidate regions.Use extant genetic maps and simple repeat structures in design & power.

Mapping QTLs in families:Summary

• Linkage and association studies follow directly from fundamental biometrical principles.

• Linkage studies of complex traits can work: All principles of this course apply

- power, study design, careful phenotype selection/modelling, comparison of statistical models

• New information about LD patterns should facilitate association studies

- help form a priori hypotheses and guide replication.

16th Annual Course on Methodology for Twins and FamiliesAdvanced workshop: Boulder, Colorado, March 2003

Monday, 5 March 2001

Eaves 9:00-10:30 Introduction: Cause of human variation

Amos & Heath 11:00-12:00 Basic Statistics: Likelihood models

Lessem 12:00-12:30 Introduction: Computer System P

Eaves & Sham 13:30-15:00 Genetic Theory

Neale, Martin & Boomsma 15:30-17:00 MX practical P

Tuesday, 6 March 2001

Sham 9:00-10:30 Linkage: Basic Principles

Abecasis, Cherny & Cardon 11:00-12:30 IBD estimation: Theory and Practice P

Martin & Maes 13:30-15:00 QTL Linkage Analysis in Sibships P

Eaves 15:30-17:00 Introduction to Bayesian Methods P

Wednesday, 7 March 2001

Neale & Heath 9:00-10:30 Linkage on Selected Samples P

Purcell & Sham 11:00-12:30 Power Calculation in Linkage Analysis P

Boomsma & van Baal 13:30-15:00 Multivariate Applications P

Purcell & Sham 15:30-17:00 Epistasis/Multi-locus modelling P

Thursday, 8 March 2001

Rice & Heath 9:00-10:30 Association Study Principles P

Cherny & Abecasis 11:00-12:30 Family Based Association Studies P

van den Oord 13:30-15:00 Population Stratification and General Association

P

Sham & Abecasis 15:30-17:00 Power for Association Analysis P

Friday, 9 March 2001

Cardon & Sham 9:00-10:30 Bioinformatics and Genome Patterns of Disequilibrium

Rice 11:00-12:30 Multiple Testing: Power and Type I Error

Flint 13:30-15:00 Animal models of complex traits

Cherny, Purcell & Abecasis 15:30-17:00 General computational issues P

http://ibgwww.colorado.edu/twins2001/schedule.html

top related