picking snps application to association studies

77
Picking SNPs Application to Association Studies Dana Crawford, PhD Dana Crawford, PhD SeattleSNPs PGA SeattleSNPs PGA University of Washington University of Washington March 20, 2006 March 20, 2006

Upload: chesna

Post on 25-Feb-2016

30 views

Category:

Documents


3 download

DESCRIPTION

Picking SNPs Application to Association Studies. Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006. Outline of Tutorial. Concepts of tagSNPs LD and haplotype definitions Haplotype blocks and definitions Tools to identify tagSNPs. Ex: E2F2. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Picking SNPs  Application to Association Studies

Picking SNPs Application to Association Studies

Dana Crawford, PhDDana Crawford, PhD

SeattleSNPs PGASeattleSNPs PGAUniversity of WashingtonUniversity of Washington

March 20, 2006March 20, 2006

Page 2: Picking SNPs  Application to Association Studies

Outline of Tutorial

• Concepts of tagSNPs

• LD and haplotype definitions

• Haplotype blocks and definitions

• Tools to identify tagSNPs

Page 3: Picking SNPs  Application to Association Studies

Why Do We Need tagSNPs?

Whole Genome:

• 15,000,000 SNPs

• 6,000,000 SNPs > 5% MAF

Too Many SNPs to Genotype!

Ex: E2F2

Average Gene:• 26.5 kb• 130 SNPs• 44 SNPs ≥5% MAF

Page 4: Picking SNPs  Application to Association Studies

SNPs Are Correlated(aka linkage disequilibrium)

“the nonindependence of alleles at different sites.” Pritchard and Przeworski 2001

Genotype at one site can predict genotype at another site

Proportion of sites are correlated

Page 5: Picking SNPs  Application to Association Studies

Measuring Pair-wise SNP Correlations

• SNP correlation described by linkage disequilibrium (LD)

• Pair-wise measures of LD: D´ and r2

D = pAB - pApB; D´ = D/Dmax Recombination

r2 = D2

f(A1)f(A2)f(B1)f(B2) Power

Page 6: Picking SNPs  Application to Association Studies

• r2 is inversely related to power

1/r2

1,000 cases 1,250 cases1,000 controls r2=1.0 1,250 controls r2 = 0.80

• D´ is related to recombination history

D´ = 1 no recombinationD´ < 1 historical recombination

LD Statistics: Practical Uses

Page 7: Picking SNPs  Application to Association Studies

Where to Find Population LD Statistics

For your gene or region of interest, search

• HapMap www.hapmap.org

• Perlegen genome.perlegen.com

• Environmental Genome Project egp.gs.washington.edu

• SeattleSNPs PGA pga.gs.washington.edu

Page 8: Picking SNPs  Application to Association Studies

Where to Find Population LD Statistics

For your gene or region of interest, search

• HapMap www.hapmap.org

• Perlegen genome.perlegen.com

• Environmental Genome Project egp.gs.washington.edu

• SeattleSNPs PGA pga.gs.washington.edu

Page 9: Picking SNPs  Application to Association Studies

Visualizing Pair-wise LD

Page 10: Picking SNPs  Application to Association Studies

Visualizing Pair-wise LD

Page 11: Picking SNPs  Application to Association Studies

USF1

2500

1500

Visualizing Pair-wise LD

Page 12: Picking SNPs  Application to Association Studies
Page 13: Picking SNPs  Application to Association Studies
Page 14: Picking SNPs  Application to Association Studies

Visualizing Pair-wise LD

Page 15: Picking SNPs  Application to Association Studies
Page 16: Picking SNPs  Application to Association Studies
Page 17: Picking SNPs  Application to Association Studies

SeattleSNPs + Perlegen SeattleSNPs

Page 18: Picking SNPs  Application to Association Studies

Visualizing Pair-wise LD: Beyond the Gene

Page 19: Picking SNPs  Application to Association Studies

Visualizing Pair-wise LD: Beyond the Gene

Page 20: Picking SNPs  Application to Association Studies
Page 21: Picking SNPs  Application to Association Studies
Page 22: Picking SNPs  Application to Association Studies

SeattleSNPs

Visualizing Pair-wise LD: Beyond the Gene

Page 23: Picking SNPs  Application to Association Studies

Multi-SNP Correlations(aka Haplotypes)

“…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997

Page 24: Picking SNPs  Application to Association Studies

Constructing Haplotypes

C TA G

T TG G

C CA G

C/T, A/G

C/C, A/GT/T, G/G

C/T, A/AC/C, A/G

Collect pedigrees Somatic cell hybrids

Human Rodent

Hybrid

SNP 1 SNP 2

C/T A/G

Allele-specific PCR

Page 25: Picking SNPs  Application to Association Studies

Constructing HaplotypesExamples of Haplotype Inference Software:

EM AlgorithmHaploview http://www.broad.mit.edu/mpg/haploview/index.php Arlequinhttp://lgb.unige.ch/arlequin/

PHASE v2.1http://www.stat.washington.edu/stephens/software.html

HAPLOTYPERhttp://www.people.fas.harvard.edu/~junliu/Haplo/docMain.htm

Page 26: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs• >250 genes re-sequenced in inflammation response

• 2 populations: European- and African-descent

• PHASEv2.0 results posted on website

• Interactive tool (VH1) to visualize and sort haplotypes

http://pga.gs.washington.edu

Page 27: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 28: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 29: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 30: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 31: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 32: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 33: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 34: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 35: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 36: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 37: Picking SNPs  Application to Association Studies

Haplotypes in SeattleSNPs

Page 38: Picking SNPs  Application to Association Studies

• r2 is inversely related to power

1/r2

1,000 cases 1,250 cases1,000 controls r2=1.0 1,250 controls r2 = 0.80

• D´ is related to recombination history

D´ = 1 no recombinationD´ < 1 historical recombination

Example: LDSelect in GVS

Example: Haplotype “blocks”

Using LD and Haplotypes to Pick tagSNPs

Page 39: Picking SNPs  Application to Association Studies

• r2 is inversely related to power

1/r2

1,000 cases 1,250 cases1,000 controls r2=1.0 1,250 controls r2 = 0.80

Example: LDSelect

Using LD and Haplotypes to Pick tagSNPs

Discovery genotype data pair-wise LD pick tagSNPs

Page 40: Picking SNPs  Application to Association Studies

LDSelect: Using LD to Pick tagSNPsLDSelect

• Uses SNP discovery data (not haplotypes)• Finds all correlated SNPs to minimize the total number• Maintains genetic diversity of locus

Carlson et al. AJHG (2004)

Page 41: Picking SNPs  Application to Association Studies

TagSNPs Are Population Specific

European-AmericansCRP

African-AmericansCRP

Page 42: Picking SNPs  Application to Association Studies

SNP Selection Using GVS

Page 43: Picking SNPs  Application to Association Studies

SNP Selection Using GVS

22 SNPs (>5% MAF)7 tagSNPs

Page 44: Picking SNPs  Application to Association Studies
Page 45: Picking SNPs  Application to Association Studies

SNP Selection: tagSNP Data

Page 46: Picking SNPs  Application to Association Studies

Side Note: Categorizing tagSNPs

• SNP contextNonrepetitive > repetitive

• Location of SNPCoding > noncoding

• FunctionNonsynonymous > synonymous

Page 47: Picking SNPs  Application to Association Studies

Categorizing tagSNPs

Page 48: Picking SNPs  Application to Association Studies

Haplotypes in Genetic Association Studies

Two main approaches with haplotypes:

Haplotypes Pick tagSNPs Genotype samples

Pick tagSNPs Infer haplotypes Test for association

Page 49: Picking SNPs  Application to Association Studies

Haplotypes in Genetic Association Studies

Two main approaches with haplotypes:

Haplotypes Pick tagSNPs Genotype samples

Pick tagSNPs Infer haplotypes Test for association

RecombinationNatural selectionPopulation historyPopulation demography

Haplotype block definition

Page 50: Picking SNPs  Application to Association Studies

Haplotype “Blocks”

Strong LD Few Haplotypes Represent most chromosomes

Daly et al 2001Daly et al Nat. Genet. (2001)

Page 51: Picking SNPs  Application to Association Studies

Block DefinitionsDaly et al 2001

D´ [Gabriel et al Science (2002)]

Daly et al Nat. Genet. (2001)

Page 52: Picking SNPs  Application to Association Studies

Block Definitions

A Ba bA ba B

Four-gamete test:

A B

a b

<4 haplotypes, D´=1 block

4 haplotypes, D´<1 boundary

Page 53: Picking SNPs  Application to Association Studies

Haplotype Blocks and tagSNPs

Identifying blocks and tagSNPs:

• Manually

• Algorithms – Haploview

Page 54: Picking SNPs  Application to Association Studies

Haplotype Blocks and tagSNPs

IL1B:19 SNPs (MAF >5%)

4 “common” haplotypes

tagSNPs

Page 55: Picking SNPs  Application to Association Studies

Haplotype Blocks and tagSNPs

Identifying blocks and tagSNPs:

• Manually

• Algorithms– HaploView

Page 56: Picking SNPs  Application to Association Studies

HapMap Data and Haploview

Page 57: Picking SNPs  Application to Association Studies
Page 58: Picking SNPs  Application to Association Studies
Page 59: Picking SNPs  Application to Association Studies
Page 60: Picking SNPs  Application to Association Studies

HapMap Data and Haploview

Page 61: Picking SNPs  Application to Association Studies
Page 62: Picking SNPs  Application to Association Studies
Page 63: Picking SNPs  Application to Association Studies
Page 64: Picking SNPs  Application to Association Studies

Import HapMap Data into Haploview

Page 65: Picking SNPs  Application to Association Studies
Page 66: Picking SNPs  Application to Association Studies
Page 67: Picking SNPs  Application to Association Studies
Page 68: Picking SNPs  Application to Association Studies
Page 69: Picking SNPs  Application to Association Studies

May not be minimal set

Page 70: Picking SNPs  Application to Association Studies

Minimal set of tagSNPs based on r2

Page 71: Picking SNPs  Application to Association Studies

Note: HapMap is not complete variation data

Page 72: Picking SNPs  Application to Association Studies

HapMap

Variation data, LD, and tagSNPs for ABCE1 in European-Americans

7 SNPs 35 SNPs

SeattleSNPs

4 tagSNPs4 tagSNPs

Page 73: Picking SNPs  Application to Association Studies

Where to Find Tagging Software

HaploBlockFinder http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi

LDSelect http://pga.gs.washington.edu

SNPtagger http://www.well.ox.ac.uk/~xiayi/haplotype/index.html

TagIT http://popgen.biol.ucl.ac.uk/software.html

tagSNPs http://www-rcf.usc.edu/~stram/tagSNPs.html

Haploview http://www.broad.mit.edu/personal/jcbarret/haplo/

Page 74: Picking SNPs  Application to Association Studies

Haplotypes, TagSNPs, and Caveats

• Haplotypes are inferred

• Block-like structure assumed for some software

• Different block definitions

• Block boundaries sensitive to marker density

• Genotype savings may not be great (recombination)

Page 75: Picking SNPs  Application to Association Studies

• Small sample size

• Subgroup analysis and multiple testing

• Random error

• Poorly matched control group

• Failure to attempt study replication

Failure to detect LD with adjacent loci

• Overinterpreting results and positive publication bias

• Unwarranted ‘candidate gene’ declaration after identifying association in arbitrary genetic region

Common Errors in Association StudiesBell and Cardon (2001)

e.g., Second case/control studyGene expression studies

Page 76: Picking SNPs  Application to Association Studies

• Resources available for pair-wise LD and haplotypes

• Software for tagSNP selection available

• Be aware the limitations of the approach you choose

• Replication required by several journals

Picking SNPsApplication to Association Studies

Summary

Page 77: Picking SNPs  Application to Association Studies

SeattleSNPs Genotyping Service

• Free genotyping (BeadArray)

• Emphasis on young investigators

• Research related to heart, lung, blood, or sleep disorders

• Moderate to large population samples

• Apply at pga.gs.washington.edu

• Due date: TBA