snp discovery and genotyping workshop snp discovery strategies debbie nickerson identifying snps by...

94
SNP Discovery and Genotyping Workshop • SNP discovery strategies Debbie Nickerson • Identifying SNPs by association for genotype- phenotype analysis of candidate genes Chris Carlson • Identifying haplotypes for genotype- phenotype analysis of candidate genes Dana Crawford • SNP genotyping strategies Debbie Nickerson

Upload: britney-harris

Post on 21-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Discovery and Genotyping Workshop

• SNP discovery strategies Debbie Nickerson

• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson

• Identifying haplotypes for genotype-phenotype analysis of candidate genes

Dana Crawford

• SNP genotyping strategies Debbie Nickerson

Page 2: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Discovery and Genotyping Strategies

Debbie Nickerson - [email protected]

• Overview of Variation in the Human Genome

• SNP Discovery Strategies and Status

• SNP Data in the PGAs

• Genotyping SNPs

Page 3: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Total sequence variation in humans

Population size: 6x109 (diploid)

Mutation rate: 2x10–8 per bp per generation

Expected “hits”: 240 for each bp

Every variant compatible with life exists in the population

BUT: Most are vanishingly rare

Compare 2 haploid genomes: 1 SNP per 1331 bp*

*The International SNP Map Working Group, Nature 409:928 - 933 (2001)

Page 4: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Strategies to Find SNPs

• Mine them from Existing Genome Resources

• Targeted SNP Discovery in Candidate Genes

CardioGenomicsCardioGenomics - - http://www.cardiogenomics.org

InnateImmunityInnateImmunity - - http://innateimmunity.net

Berkeley PGABerkeley PGA - - http://pga.lbl.gov

SouthwesternSouthwestern - - http://pga.swmed.edu

SeattleSNPsSeattleSNPs - - http://pga.mbt.washington.edu

Page 5: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Sequence Overlap SNP discovery

GTTTAAATAATACTGATCAGTTTAAATAATACTGATCAGTTTAAATAGTACTGATCAGTTTAAATAGTACTGATCA

Genomic DNA mRNA

BAC library RRS Libraryor Sampling

cDNA Library

EST OverlapShotgun Overlap

Sequence-based SNP Mining

BAC Overlap

~ 4.1 Million SNPs Available http://www.ncbi.nlm.gov/SNP/

Page 6: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Mining Finds Only A Small Fraction of the SNPs

0.0 0.2 0.3 0.4 0.50.10.0

0.5

1.0

Minor Allele Frequency

Fra

ctio

n o

f SN

Ps

Dis

cove

red

2

4824

16

8

96

A G

Page 7: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

minimal allelefrequency

expected SNPs(millions)

expected SNPfrequency (bp)

expected % indatabase

1% 11.0 290 11-12

5% 7.1 450 15-17

10% 5.3 600 18-20

20% 3.3 960 21-25

30% 2.0 1570 23-27

40% 0.97 3280 24-28

Total Estimated SNPs and Fraction in dbSNP

L. Kruglyak and D. Nickerson, Nat Genet 27:234-236 2001

Page 8: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Surfactant B - Locus Link

dbSNP (http://www.ncbi.nlm.nih.gov/SNP/)

Page 9: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Surfactant B - dbSNP

Page 10: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Confirmation of SNP Resource in New SamplePotential Pitfalls

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

BAC

RRS

EST

PCR

Oth

er

Any M

ultiple

Rep

ort

BRE Multi

ple R

eport

Confirmed Multiple Method Report in dbSNP

Confirmed Unique Method Report in dbSNP

Page 11: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Strategies to Find SNPs

• Mine them from Existing Resources

• Targeted SNP Discovery in Candidate Genes

CardioGenomicsCardioGenomics - - http://www.cardiogenomics.org

InnateImmunityInnateImmunity - - http://innateimmunity.net

Berkeley PGABerkeley PGA - - http://pga.lbl.gov

SouthwesternSouthwestern - - http://pga.swmed.edu

SeattleSNPsSeattleSNPs - - http://pga.mbt.washington.edu

Page 12: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Sequence each end

of the fragment.

Base-calling

Quality determination

Contig assembly

Final quality determination

Sequence viewing

Polymorphism tagging

Polymorphism reporting

Individual genotyping

Polymorphism detection

PolyPhred

Consed

Analysis

Sequence Phred PhrapAmplify DNA5’ 3’

Sequence-based SNP Identification

Phylogenetic analysis

ATAGACG ATACACG ATAGACG ATACACG

ATAGACGATACACG

Homozygotes Heterozygote

Page 13: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Sequence-Based Detection and Genotyping of SNPs

Jim Sloan, Tushar Bhangle (PolyPhred)Matthew Stephens, Paul Scheet (Quality Scores for SNPs)Phil Green, Brent Ewing, David Gordon (Phred, Phrap, Consed)

Page 14: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate
Page 15: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

PGA SNPs

• The PGAs provide a validated SNP resource (Allele Frequency Data)

• Novel Views of the Variation Data Emerging Pathway Interfaces Color Fasta Formats Gene Structure Views Visual Genotypes Linkage Disequilibrium Views TagSNPs Haplotypes

• Many New Formats Under Development

Page 16: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Toward comprehensive association studies

• 5-7 million common variants exist in genome

• Testing all for association is impractical today

• Can the list be reduced w/o loss of power?

– SNPs in Coding (Amino Acid Changes)

– Linkage disequilibrium (SNPs in other functional regions, i.e.

regulatory elements)

Page 17: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

cSNPs - Both Deep and Average Coverage Available from the PGAs

CD36 - Southwestern PGA - Deep cSNP Discovery Strategy - Healthy, High Cholesterol, High Triglycerides, Congential Cardiac Abnormalities, Left Ventricular Hypertrophy …….

CD36 - SeattleSNPs PGA - Average cSNP Discovery Strategy -Healthy only

Page 18: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SIFT (Sorting Intolerant From Tolerant) Coding Changes

CYP4F2

Trp (W) Gly (G)Predicted to be tolerated

Val (V) Gly (G)Predicted not to be tolerated

Ng and Henikoff, Gen. Res. 2002

Page 19: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP-Based Association Studies

5’ 3’

Arg-Cys Val-Val

Collins, Guyer, Chakravarti Science 278:1580-81, 1997

Indirect: Use dense map of SNPs and test for linkage disequilibrium (use association to find sites in entire sequence (non-coding) with function)

Page 20: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Discovery and Genotyping Workshop

• SNP discovery strategies Debbie Nickerson

• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson

• Identifying haplotypes for genotype-phenotype analysis of candidate genes

Dana Crawford

• SNP genotyping strategies Debbie Nickerson

Page 21: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Selecting SNPs for Genotype-Phenotype Analysis

Using Allelic Association(Linkage Disequilibrium)

Christopher Carlson

[email protected]

Page 22: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Candidate Gene Association Analysis

• Describe existing genetic variation– Rare SNPs (deep exonic resequencing)– Common SNPs (complete resequencing)

• Select a subset of SNPs for genotyping– cSNPs (amino acid changes)– htSNPs (resolve haplotypes)– tagSNPs (patterns of genotype)

• Test for genotype/phenotype correlations

Page 23: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SeattleSNPs Resequencing Strategy I

• Resequence the complete genomic region of each gene – 2000 bp upstream of first exon– 1500 bp downstream of poly-A signal– All exons and introns for genes below 35 kbp

Image courtesy of GeneSNPs

Page 24: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

VG2

• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity

between sites– Sort on similarity

between samples– Visualize LD

Page 25: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SeattleSNPs Resequencing Strategy II

• Resequence candidate genes from inflammation and coagulation pathways

• Resequence 47 individuals– 24 African American– 23 European American

Homozygote common Heterozygote Homozygote rare Missing Data

Page 26: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

VG2

• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity

between sites– Sort on similarity

between samples– Visualize LD

Page 27: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

VG2

• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity

between sites– Sort on similarity

between samples– Visualize LD

Page 28: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

VG2

• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity

between sites– Sort on similarity

between samples– Visualize LD

Page 29: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

VG2

• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity

between sites– Sort on similarity

between samples– Visualize LD

Page 30: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

VG2

• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity

between sites– Sort on similarity

between samples– Visualize LD

Page 31: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

VG2

• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity

between sites– Sort on similarity

between samples– Visualize LD

Page 32: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Preliminary Analyses

• Hardy Weinberg Equilibrium

• Population specificity• Nucleotide diversity• Pop genetics statistics

(e.g. Tajima’s D)

Page 33: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Selection: cSNPs

• Genotype SNPs which change amino acids• Genotype other “good story” SNPs

– SNPs in known regulatory elements– SNPs in Conserved Noncoding Sequences

Image courtesy of GeneSNPs

Page 34: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Selection: htSNPs

• Genotype “haplotype tagging” SNPs which resolve existing common haplotypes

Page 35: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Selection: htSNPs

• Genotype “haplotype tagging” SNPs which resolve existing common haplotypes

Page 36: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Selection: tagSNPs

• Resequence a modest number of samples

– Describe patterns of genotype at all common SNPs

– Genotype tagSNPs which efficiently capture existing patterns of genotype

Page 37: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Linkage Disequilibrium A B

Haplotype is the pattern of alleles

on a single chromosome– 4 possible haplotypes

Linkage Disequilibrium (LD) describes the allelic association between two SNPs

Two popular LD statistics: D´ r2

Page 38: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Complete LD

A B

Unequal allele frequencyAllelic association is as strong as

possible– 3 haplotypes observed – No detected recombination

between SNPs– Genotype is not perfectly

correlated

D´ = 1 r2 < 1

Page 39: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Perfect LD

A B

Equal allele frequency

Allelic association is as strong as possible– 2 haplotypes observed

– No detected recombination between SNPs

– Genotype is perfectly correlated

D´ = 1

r2 = 1

Page 40: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Select SNPs to genotype on the basis of LD

Rational SNP Selection

• Some SNPs are in LD with many other SNPs

• SNPs between a pair of associated SNPs are not necessarily associated with the flanking SNPs

• Some SNPs are in LD with no other SNPs

Page 41: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

LD SNP Selection Example

CSF3 in European Americans•5200 bp•17 SNPs

Page 42: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

LD SNP Selection Example

CSF3 in European Americans•5200 bp•17 SNPs•10 common SNPs (above 10% minor allele frequency)

Page 43: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

LD Site Selection Algorithm• Find minimal set of SNPs

for assay, such that each SNP is either assayed directly or above r2 threshold with an assayed SNP

•Calculate all pairwise r2 values

•Set r2 threshold based on power estimates for study

Page 44: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

LD Site Selection Algorithm• Find minimal set of SNPs

for assay, such that each SNP is either assayed directly or above r2 threshold with an assayed SNP

•Calculate all pairwise r2 values

•Set r2 threshold based on power estimates for study

Page 45: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

CSF3 Site Selection

• Threshold LD: r2 > 0.64– Bin 1: 4 sites– Bin 2: 4 sites– Bin 3: 2 sites

• Genotype 1 SNP from each bin, chosen for biological intuition or ease of assay design

Page 46: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Power and LD

• Given– All common SNPs described

– Patterns of LD between common SNPs are known

• Select SNPs such that every SNP is either– Directly assayed

– Associated with an assayed SNP

• Test for disease associations with assayed SNPs• Power to detect disease associations at unassayed

SNPs depends on r2 between assayed and unassayed SNPs

Page 47: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

LD Selection and Haplotype

• LD selected SNPs provide the highest possible haplotype diversity for a given number of SNPs assayed

• LD selection is robust to recombination and hotspot structure

• LD selection is sensitive to population stratification

Page 48: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Selection Summary

• It is possible to test all common variants in a candidate gene directly for risk association (main effects) with meaningful null negative results

• Caveat: Higher order risks unaddressed– Haplotype (G X G effects within a locus)– Epistasis (G X G effects between loci)– Environment (G X E effects)

Page 49: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Discovery and Genotyping Workshop

• SNP discovery strategies Debbie Nickerson

• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson

• Identifying haplotypes for genotype-phenotype analysis of candidate genes

Dana Crawford

• SNP genotyping strategies Debbie Nickerson

Page 50: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Identifying Haplotypes for Genotype-Phenotype

Analysis

Dana C. [email protected]

Page 51: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Outline of discussion

• Constructing or inferring haplotypes

• Haplotype tools available in PGA

• Description of haplotypes in SeattleSNPs genes

• Use of VH1 tool to visually inspect– Haplotype blocks– Haplotype diversity– Hotspots of recombination

• Summary of SeattleSNPs haplotype data

Page 52: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

What is a Diplotype ?

• Humans are diploid

• At each SNP there are two alleles, which are observed as a genotype

• At each gene there are two haplotypes, which are observed as a multi-site genotype, or diplotype

Page 53: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

What is a Haplotype?

A: “…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997

VH1 – haplotype visualization tool

Page 54: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

How Do You Construct Haplotypes?

1. Collect extended family members

C TA G

T TG G

C CA G

C/T, A/G

C/C, A/GT/T, G/G

C/T, A/AC/C, A/G

Page 55: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

How Do You Construct Haplotypes?

2. Go from diploid to haploid via

somatic cell hybrids

e.g. Patil et al 2001

Page 56: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

How Do You Construct Haplotypes?

3. Allele-specific PCR

SNP 1 SNP 2

C/T A/G

Page 57: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

4. Statistical inference

• Clark Algorithm

• EM (Arlequin)

• Phase Ligation (HAPLOTYPER)

• PHASE

How Do You Construct Haplotypes?

Page 58: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Clark Algorithm

• Find unambiguous haplotypes– Homozygotes– Single Heterozygotes

Page 59: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Clark Algorithm

• Find ambiguous diplotypes formed from two unambiguous genotypes

Page 60: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Clark Algorithm

• Find ambiguous diplotypes formed from one unambiguous genotype and one new genotype

Page 61: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Clark Algorithm

• Iterate until either all haplotypes resolve, or ambiguous haplotypes are inconsistent with any inferred haplotype

Page 62: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Haplotype Algorithm Comparison

• Clark– Intuitive– Fast

• EM– Complete solution– Slightly more

accurate than Clark– Robust to ambiguity

• PHASE– Complete solution– Slightly more

accurate than EM– Slow version 2 faster

• Haplotyper (Ligation)– Fast– Better than Clark– Less accurate than

EM or PHASE

Page 63: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Haplotype Tools in the PGA

InnateImmunity• 25 genes re-sequenced in innate immunity pathway• 4 populations: European and African-Americans,

Hispanics, Asthmatics• PHASE and Haplotyper results posted on website

http://innateimmunity.net

Page 64: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Haplotype Tools in the PGA

SeattleSNPs• 120 genes re-sequenced in inflammation response• 2 populations: European- and African-Americans• PHASE results posted on website• Interactive tool (VH1) to visualize and sort haplotypes

http://pga.gs.washington.edu

Page 65: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70 80 90 100

Number of genes

Number of haplotypes

Distribution of Haplotypes in100 SeattleSNPs Genes

AD

ED

Page 66: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Common Haplotypes in 100 SeattleSNPs Genes

(Frequency >5%)

Population >5% MAF

Average Range

ED 4.54 1 - 8

AD 4.99 0 - 11

Page 67: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Haplotype Sharing Between Populations in 100 SeattleSNPs Genes

00.10.20.30.40.50.60.70.80.9

1

ED AD

Non-sharedShared

Page 68: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Number of Haplotypes From Two Different Discovery Strategies

0

5

10

15

20

25

30

35

AD ED Combined

Average number of haplotypes per gene

All SNPs>5%

CodingSNPs,>5%

Page 69: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

FGB – African-Americans

Haplotype Structures Are Similar Across Discovery Strategies…

29 SNPs >5% 13 SNPs >5%

Coding SNPs

Page 70: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

…But, Not For All GenesF10 – African-Americans

48 SNPs >5% 13 SNPs >5%

Coding SNPs

Page 71: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Are Blocks Preserved Using Different Discovery Strategies?

Fewer “blocks” with fewer SNPs/kb

Yes*, for some: 10% of genes in AD

25% of genes in ED

*>75% of the blocks are preserved

A B

a bA b

a B

Four-gamete test:

A B

a b

HaploBlockFinder; Zhang and Jin 2003

Page 72: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Using Visualization Tools (VH1) To Identify Haplotype Blocks

IL10:

• Rare sites removed

• Sorted by related sites

• “Block” structure evident

Page 73: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Using VH1 to Identify Highly Divergent Haplotypes

• Some haplotypes are highly divergent

• More likely to have functional consequences?

• Mixed Blessing:– Easier to detect– Harder to dissect

Page 74: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

CD36 haplotypes, sorted by sample

Using Haplotypes To

IdentifyHotspots Of

Recombination

Page 75: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Linkage Disequilibrium and Hotspots

Associated Sites

Hotspot in betweensites need to betyped from bothends

CD36

Page 76: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Detection of Recombination HotspotsIn Candidate Genes

HOTSPOTTER

• Developed by Na Li and Matthew Stephens

• Multilocus model for LD:Does not rely on “block-like” patterns

Relates LD to underlying recombination process

Incorporated into new version of PHASE (v2.0)

students.washington.edu/lina/software/

Page 77: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

CD36 – combined population

Page 78: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

CD36 – AD and ED populations

Page 79: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

HOTSPOTTERPreliminary Results

AGTR1APOBCD36IL1BIL21RIL4NOS3PLAUR

PON1SERPIN45SELPSFPA2SFTPBVCAM1VEGF

15 out of 100 genes have evidence of a hotspot:

Page 80: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SeattleSNPs Haplotype Summary

• More haplotypes per gene than previously described

• Block structure is preserved across discovery strategies for only a fraction of the genes

• <50% of African-American chromosomes are representedby common shared haplotypes

• Evidence for hotspots of recombination in human genes

Page 81: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Discovery and Genotyping Workshop

• SNP discovery strategies Debbie Nickerson

• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson

• Identifying haplotypes for genotype-phenotype analysis of candidate genes

Dana Crawford

• SNP genotyping strategies Debbie Nickerson

Page 82: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Ideals for SNP Genotyping

• High Sensitivity - PCR but moving towards direct genomic DNA detection

• High Specificity - Accurate

• Simple process - Easy to automate - High Throughput

• Multiplexing - Perform many assays at once - decrease costs

• Cheap

Page 83: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

C Allele T AlleleProbe and Target

C C CTarget G A

Cleave Fail to cleave

CC C

Target G ADegrade Fail to degrade

CTarget G A

C incorporated C Fails to incorporate

Target G AC C

C

Hybridize Fail to hybridize

Target G A

C

C C

Amplify Fail to amplify

Target G ACC

C

Ligate Fail to ligate

+ddCTP

SNP Genotyping

Allele-Specific Hybridization

Polymerase Extension

Oligonucleotide Ligation

Invader

Taqman

Allele-Specific PCR

Matched Mis-Matched

Page 84: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Typing Formats

Microtiter Plates - Fluorescence

Size Analysis by Mass or Electrophoresis

Arrays - Custom or Universal

eg. Taqman - Good for a few markers - lots of samples - PCR

eg. Sequenom or SnapShot - Moderate Multiplexing reducing costs

eg. Affymetrics, Illumina or ParAllele - Highly multiplexed - HighThroughput - Genotype directly on genomic DNA

Page 85: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Taqman

Genotyping with fluorescence-based homogenous assays (single-tube assay)

A

G

Reporter Quencher

Page 86: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Genotype Calling - Cluster Analysis

Page 87: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Genotyping by Mass Spectrometry

Multiplex ~ 5 SNPs

Page 88: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Polymorphism Polymorphism

60/40 85/15

Population 1 Population 2Pooled DNA Pooled DNA

PCR Pooled DNA Quantitative AssayEstimate Allele Frequency

PCR Pooled DNA Quantitative AssayEstimate Allele Frequency

Comparative Genotyping in Populations

Page 89: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Pooled Genotyping

Advantages:

Speed, Cost

Major Disadvantages:

Loss of haplotype information Loss of stratification by phenotype

or environmental factors

Page 90: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Genotyping

Custom SNP Genotyping Chips:

Page 91: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Locus 1 Specific Sequence

cTag1 sequenceTag1 sequence

SubstrateBead or Chip

Tag 1

Tag 2

Tag 3

Tag 4

Chip ArrayBead Array

Multiplexed Genotyping - Universal Tag Readouts

Locus 2 Specific Sequence

cTag2 sequenceTag2 sequence

SubstrateBead or Chip

C T A G

Multiplex ~1,000 SNPs

Not dependent on primary PCRIllumina ParAllele

Page 92: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

Illumina Genotyping - Gap Ligation

Page 93: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

1,000 SNPs Assayed on 96 Samples

Page 94: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate

SNP Genotyping

Lots of systems - Still costly but dropping

Offering Moderate to High throughputs

Systems vary in price $$ -$$$$

Laboratory Information Management Systems (Key: Track - Samples,

- Assays - Completion rate

- Reproducibility/Error Analysis)