trait mapping recombination mapping snp mapping bio520 bioinformaticsjim lund

32
Trait Mapping •Recombination Mapping •SNP mapping BIO520 Bioinformatics Jim Lund

Upload: avice-gilmore

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Trait Mapping

•Recombination Mapping•SNP mapping

BIO520 Bioinformatics Jim Lund

Page 2: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Why do we care about variations?

underlie phenotypic differences

cause inherited diseases

allow tracking human history (ancient and

modern)

Page 3: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Traits

• Mendelian– single locus, few alleles– high penetrance, high expressivity– eg color, enzyme, molecular, genetic

diseases (CF, hemophilia…)

• Quantitative– multiple allele, multilocus– variable penetrance, expressivity– epistasis, environmental effects– eg. blood pressure, weight, IQ...

Page 4: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

TraitsHow do we find their basis?

• Association of variance in trait with variance in gene

• Genetic linkage

Page 5: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Basic Concepts

A B

a b

A Ba b

High LD -> No Recombination(r2 = 1) SNP1 “tags” SNP2

A B

A B

A B

a b

a b

a b

Low LD -> RecombinationMany possibilities

A b

A ba Ba b

A BA B

a B

A b

etc…

A B

A B

X

OR

Parent 1 Parent 2

Page 6: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Mapping Issues

• Need many arbitrary, polymorphic markers for dense map– Molecular markers: RFLP, STS, SNP

• Need many progeny– 100 progeny for 1 cM map– 1000/0.1 cM map, 100 kb in mouse

• Map distance varies (the ratio of kb/cM not constant)– centromere suppression– inversion suppression

Page 7: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Genetic crosses

• Model organisms, e.g. Fungi, no problem

• Humans– rare woman who will bear >5, >10 children

– controlled breeding problematic

Page 8: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Alternate Mapping

• Pedigree analyses– likelihood estimation

– The original method, now less common

• Population-based mapping– association studies

– linkage disequilibrium

Page 9: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Pedigree Analysis

• Likelihood Method (LOD scores)• LOD 3-4, 1/1000 – 1/10000 odds

of linkage– genome-wide p-value of p < .05

• Hard to extend to <1 cM

Page 10: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Cloning Human Genes

• Positional• Positional/Candidate• Candidate Only• Functional

Page 11: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Complex diseasesAssociation mapping• Disease gene: D, d• Marker: M, m

M associated with D ifthe probability of an individual having the disease given that they have allele M is much greater than the chance of having the disease if the individual has allele m. Written as: P(D|M) > P(D|m)

Linkage between the gene and marker increases the likelihood of association.

Association can be caused by– Causation– Population subdivision– Statistical artifact– Linkage disequilibrium

D M1 M2 M3 M4 M5 M6

Page 12: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

•Pedigree sampled

•Many Meiosis (>104)

•Resolution: 10-5 Morgans (Kbases)

•Limited by number of markers

2N gen

erations

rM

D

Association Mapping

Page 13: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

At time t

Now

D M

D M

Gene Mapping & the single mutation case

Page 14: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

+ + ++ ++

Major Disease Causing Mutation.

+ has the disease.

Incomplete penetrance

Minor Disease Causing Mutation

Non-genetic cause Oversampled

Complicating factors

Page 15: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Alzheimers & Apolipoproteins E

Page 16: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Definition of QTL?Definition of QTL?

A quantitative trait locus (QTL) is the location of individual or multiple loci that affects a trait that is measured on a quantitative (linear) scale. Examples of quantitative traits are blood pressure and grain yield (measured on a balance). These traits are typically affected by more than one gene, and also by the environment. Thus, mapping QTL is not as simple as mapping a single gene that affects a qualitative trait (such as an inborn error of metabolism).

http://gnome.agrenv.mcgill.ca/tinker/pgiv/whatis.htm

Page 17: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

QTLs-interesting traits

• Heritability often ~0.5• Traits like:

– Heart disease– Depression– Type II diabetes– High blood pressure– Arthritis – Most diseases!

Page 18: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

QTLs-simple problems

• 30,000 markers– P-value=0.01

– 299 false hits, 1 real one

– Correct for multiple testing

• 2 QTLS near one another– “ghost” QTL between them

Page 19: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Factors that lead to success in mapping QTLs

• Simple, easily quantified trait• Genes of major effect

– distinct chromosomal loci

• Well-defined map• Large numbers of progeny

– inbred

– outbred

Page 20: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Significance Thresholds by PermutationChurchill and Doerge, 1994

1.Permute the data (create the null hypothesis)

H0: there is no QTL in the tested intervalH1: there is QTL in the tested interval

2.Perform interval mapping

3. Repeat (1) and (2) many times 4.Choose Threshold

                                                                                                       

                                                                                            

                                                  

Page 21: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Human SNPs

• About 10 million SNPs exist in human populations where the rarer SNP allele has a frequency of at least 1%.

• A set of associated SNP alleles in a region of a chromosome is called a "haplotype".

• SNPs are arranged in groups– SNPs within groups show little recombination– Nonrandom association of SNPs results in only a few common

haplotypes– Patterns capture most of the variation in a region

• The HapMap will describe the common patterns of genetic variation in humans.

• The HapMap Project will identify the associations between SNPs and identify the SNPs that tag them (tagSNPs).

Page 22: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

SNPs identification methods

• Pairwise sequence comparison• Deep resequencing• High throughput mismatch detection

methods– Denaturing high-performance liquid

chromatography (DHPLC)– Single-strand Conformational

Polymorphism (SSCP)

Page 23: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

HapMap• Blocks of adjacent SNPs that show little

recombination are called haplotype blocks.• Mean haplotype block length is tens of kb.• HapMap project started examining 270

individuals from 4 ethnic groups.• Now expanding to a more comprehensive

sample.

Characterization of haplotype blocks means that fewer SNPs will need to be typed.

500,000 SNPs will identify 90% of haplotype blocks.

Page 24: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

HapMap Glossary• LD (linkage disequilibrium): For a pair of SNP

alleles, it’s a measure of deviation from random association (i.e., a measure of lack of recombination). Measured by D’, r2, LOD

• Phased haplotypes: Estimated distribution of SNP alleles. Alleles transmitted from Mom are in same chromosome haplotype, while Dad’s form the paternal haplotype.

• Tag SNPs: Minimum SNP set to identify a haplotype. r2= 1 indicates two SNPs are redundant, so each one perfectly “tags” the other.

Page 25: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

HapMap Project

Phase 1

Phase 2

Phase 3

Samples & POP panels

269 sample

s(4

panels)

270 sample

s(4

panels)

1,115 sample

s (11

panels)

Genotyping

centers

HapMap

International

Consortium

Perlegen

Broad &

Sanger

Unique QC+ SNPs

1.1 M 3.8 M(phase

I+II)

1.6 M (Affy 6.0 &

Illumina 1M)

Reference

Nature (2005) 437:p1

299

Nature (2007) 449:p8

51

Draft Rel. 1 (May 2008)

Page 26: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Phase 3 Samples

label population sample # samples QC+ Draft 1ASW* African ancestry in Southwest USA 90 71

CEU*Utah residents with Northern and Western

European ancestry from the CEPH collection180 162

CHB Han Chinese in Beijing, China 90 82CHD Chinese in Metropolitan Denver, Colorado 100 70GIH Gujarati Indians in Houston, Texas 100 83JPT Japanese in Tokyo, Japan 91 82LWK Luhya in Webuye, Kenya 100 83MEX* Mexican ancestry in Los Angeles, California 90 71MKK* Maasai in Kinyawa, Kenya 180 171TSI Toscans in Italy 100 77

YRI* Yoruba in Ibadan, Nigeria 180 1631,301 1,115

* Population is made of family trios

Page 27: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

SNP databases

• dbSNP (NCBI)– 12 million human SNPs– 5 million validated SNPs– http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=overview

• SNP frequency information• Mapped to the current genome build• HapMap (haplotypes)

Page 28: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

How to use markers to find disease?

• problem: genotyping cost precludes using millions of markers simultaneously for an association study

genome-wide, dense SNP marker map

• depends on the patterns of allelic association (haplotypes) in the human genome

• question: how to select from all available markers a subset that captures most mapping information (marker selection, marker prioritization)

Page 29: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

The promise for medical genetics

CACTACCGACACGACTATTTGGCGTAT

• within blocks a small number of SNPs are sufficient to distinguish the few common haplotypes significant marker reduction is possible

• if the block structure is a general feature of human variation structure, whole-genome association studies will be possible at a reduced genotyping cost

• this motivated the HapMap project

Gibbs et al. Nature 2003

blocks

chromosome

Page 30: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

The promise for medical genetics

•Discover genes contributing to complex diseases

•Use these markers to test for inherited disease risk

• Find SNPs associated with drug side effects•Make drugs safer.•Rescue drugs abandoned due to significant side effects.

Page 31: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Pathway of Drug Development

• Lead or Target (Clinical Candidate)

• Animal Model Testing– Toxicity, Efficacy

• Phase I Pre-Clinical (toxicity)

• Phase II (efficacy)• Phase III (efficacy)• NDA (new drug

application)

• $100M 2000

• $0.5M 100

• $0.5M 20

• $5M 3

• $50M 2

1

Page 32: Trait Mapping Recombination Mapping SNP mapping BIO520 BioinformaticsJim Lund

Why pharmacogenomics?

• Where do you find the next profitable drug?– The 19/20 drugs that failed AFTER phase 1,

but are still efficacious!

• How do you decrease the cost of clinical trials?– Don’t enroll people of the “wrong” genotype!

• Only give drugs to patients likely to benefit and at a low genetic risk of side effects!