snp resources: variation discovery, hapmap and the egp mark j. rieder department of genome sciences...

43
SNP Resources: Variation SNP Resources: Variation Discovery, HapMap and the EGP Discovery, HapMap and the EGP Mark J. Rieder Mark J. Rieder Department of Genome Sciences Department of Genome Sciences [email protected] [email protected] NIEHS SNPs Workshop NIEHS SNPs Workshop Jan 10-11, 2008 Jan 10-11, 2008

Post on 21-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

SNP Resources: Variation Discovery, SNP Resources: Variation Discovery, HapMap and the EGPHapMap and the EGP

Mark J. RiederMark J. RiederDepartment of Genome SciencesDepartment of Genome Sciences

[email protected]@u.washington.edu

NIEHS SNPs WorkshopNIEHS SNPs WorkshopJan 10-11, 2008Jan 10-11, 2008

Page 2: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Complex inheritance/disease

Variant Gene

Disease

Diabetes Heart Disease SchizophreniaObesity Multiple Sclerosis Celiac DiseaseCancer Asthma Autism

Many OtherGenes

Environment

Two hypotheses:Two hypotheses:1- common disease/common variant?1- common disease/common variant?2- common disease/many rare variants?2- common disease/many rare variants?

Page 3: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Cases Controls

40% T, 60% C 15% T, 85% C

Multiple Genes

Common Variants

Polymorphic Markers > 300,000 -1,000,000Single Nucleotide Polymorphisms (SNPs)

Strategies for Genetic Analysis

Populations Association Studies

C/C C/T

C/C C/T C/C

C/C

C/TC/C C/C

C/T C/CC/TC/TC/C

Single Gene

Rare Variants

~600 Short Tandem Repeat Markers

FamiliesLinkage Studies

Simple Inheritance Complex Inheritance

Page 4: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Samples with phenotype data (continuous, case-control) (n = 100's - 1000's)

Genotype samples with commercial 'chips'• Affymetrix - Random SNP design (v.5, v.6)• lllumina - preselected tagSNP design (650Y, 1M)

Perform statistical association with each SNPCalculate p-value for each SNP

Region with plausible statistical association

Strategies for Genetic Analysis (GWAS)

Page 5: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Development of a genome-wide SNP map: How many SNPs?Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1- 5% MAF) - 1/300 bp~ 10 million common SNPs (> 1- 5% MAF) - 1/300 bp

How has SNP discovery progressed toward this goal?How has SNP discovery progressed toward this goal?

Page 6: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

SNP Resources: SNP discovery and catalogingSNP Resources: SNP discovery and cataloging

1. SNP discovery/genotyping: Genome-wide approaches SNP Consortium HapMap

2. The current state of SNP resources (dbSNP)

3. Comprehensive SNP discoveryNIEHS SNPs - Environmental Genome Project

SNP Databases - “How to” Manual for finding SNPs“How to” Manual for finding SNPsIn class - TutorialIn class - Tutorial

Page 7: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

HapMap Project: Create a genome-wide SNP mapHapMap Project: Create a genome-wide SNP map

Genotype SNPs in four populations:

• CEPH (CEU) (Europe - n = 90, trios)• Yoruban (YRI) (Africa - n = 90, trios)• Japanese (JPT) (Asian - n = 45)• Chinese (HCB) (Asian - n =45)

To produce a genome-wide map of common variation

Common Variant/Common Disease

Page 8: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Phase I -Phase I - 1M SNPs1M SNPs

Phase II -Phase II - 4M SNPs4M SNPsDensity ~ 1 SNP/kbDensity ~ 1 SNP/kb

626 genes626 genes15 Mbp15 Mbp91,000 SNPs91,000 SNPs

Density - 1 SNP/166 bpDensity - 1 SNP/166 bp

Page 9: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Finding SNPs: Marker Discovery and MethodsFinding SNPs: Marker Discovery and Methods

SNP discovery has proceeded in two distinct phases:SNP discovery has proceeded in two distinct phases:

1 - SNP Identification/Discovery1 - SNP Identification/DiscoveryDefine the allelesDefine the allelesMap this to a unique place in the genomeMap this to a unique place in the genome

2 - SNP Characterization (HapMap genotyping)2 - SNP Characterization (HapMap genotyping)Determination of the genotype in many Determination of the genotype in many

individualsindividualsPopulation frequency of SNPsPopulation frequency of SNPs

Page 10: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Finding SNPs: Sequence-based SNP MiningFinding SNPs: Sequence-based SNP Mining

Sequence Overlap - SNP DiscoverySequence Overlap - SNP Discovery

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC

DNADNASEQUENCINGSEQUENCING

mRNAmRNA

cDNAcDNALibraryLibrary

ESTESTOverlapOverlap

Genomic Genomic

BACBACLibraryLibrary

RRSRRSLibraryLibrary

BACBACOverlapOverlap

ShotgunShotgunOverlapOverlap

RT errors?RT errors?

SequencingSequencingQualityQuality

Page 11: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Finding SNPs: Sequence-based SNP MiningFinding SNPs: Sequence-based SNP Mining

BAC = Bacterial Artificial ChromosomeBAC = Bacterial Artificial ChromosomePrimary vector for DNA cloning in the HGPPrimary vector for DNA cloning in the HGP

Fragment DNAFragment DNA

DNA from multiple individualsDNA from multiple individuals

Clone large fragments into BACsClone large fragments into BACs (unknown sequence)(unknown sequence)

Sequence and Reassemble Sequence and Reassemble (known sequence)(known sequence) Assembly with other overlapping BACsAssembly with other overlapping BACs

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC

Page 12: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Initiation of project planning (July 2001): Initiation of project planning (July 2001): 2.8 million SNPs (1.4 million validated) - 1/1900 bp2.8 million SNPs (1.4 million validated) - 1/1900 bp

HapMap SNP Discovery: Prior to GenotypingHapMap SNP Discovery: Prior to Genotyping

TACGCCTACGCCTTATAATA TCTCAAAGGAGATAGGAGAT

Generate more SNPs:Generate more SNPs:

Nov 2003 - 5.7 million (2 million validated) - 1/1500 bpNov 2003 - 5.7 million (2 million validated) - 1/1500 bpFeb 2004 - 7.2 million (3.3 million validated) - 1/900 bpFeb 2004 - 7.2 million (3.3 million validated) - 1/900 bp

Other Sources of SNPs:Other Sources of SNPs:Perlegen (Affymetrix chips) SNP data (chr22)Perlegen (Affymetrix chips) SNP data (chr22)Sequence chromatograms from Celera projectSequence chromatograms from Celera project

GTTACGCCGTTACGCCAAATACAGGATCATACAGGATCCCAGGAGATTACCAGGAGATTACC Draft Human GenomeDraft Human Genome

Genomic DNAGenomic DNA(multiple individuals)(multiple individuals)

Sequence and align Sequence and align (reference sequence)(reference sequence)

Random Shotgun SequencingRandom Shotgun Sequencing

Page 13: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

dbSNP -NCBI SNP database

SNP Discovery: dbSNP database

Page 14: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

SNP data submitted to dbSNP: ClusteringSNP data submitted to dbSNP: Clustering

dbSNP processing of SNPsdbSNP processing of SNPs

SNPs submittedSNPs submittedby research communityby research community((ssubmitted SNPs = ss#)ubmitted SNPs = ss#)

Unique mappingUnique mappingto a genome locationto a genome location((reference eference SNP = rs#)NP = rs#)

(by 2hit-2allele)(by 2hit-2allele)

ValidatedValidated

UnvalidatedUnvalidated

Page 15: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

Jan-03 Mar-03 Jun-03 Aug-03 Oct-03 Jan-04 Mar-04 Jun-04 Sep-04 Jan-05 Mar-05 Jun-06 Mar-07

dbSNP Release

SNPs(millions)

Total Reference SNPs

Validated SNPs

HapMap SNP DiscoveryHapMap SNP Discovery

HapMap Discovery Increased SNP Density and Validated SNPsHapMap Discovery Increased SNP Density and Validated SNPs

11+ million11+ millionrs SNPsrs SNPs

5.6 million5.6 millionvalidatedvalidatedrs SNPsrs SNPs

Page 16: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Finding SNPs: Marker Discovery and MethodsFinding SNPs: Marker Discovery and Methods

SNP discovery has proceeded in two distinct phases:SNP discovery has proceeded in two distinct phases:

1 - SNP Identification/Discovery1 - SNP Identification/DiscoveryDefine the allelesDefine the allelesMap this to a unique place in the genomeMap this to a unique place in the genome

2 - SNP Characterization (HapMap genotyping)2 - SNP Characterization (HapMap genotyping)Determination of the genotype in many Determination of the genotype in many

individualsindividualsPopulation frequency of SNPsPopulation frequency of SNPs

Page 17: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

HapMap Project: Create a genome-wide SNP mapHapMap Project: Create a genome-wide SNP map

Genotype SNPs in four populations:

• CEPH (CEU) (Europe - n = 90, trios)• Yoruban (YRI) (Africa - n = 90, trios)• Japanese (JPT) (Asian - n = 45)• Chinese (HCB) (Asian - n =45)

To produce a genome-wide map of common variation

Page 18: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Finding SNPs: Genotype Data Adds Value to SNPsFinding SNPs: Genotype Data Adds Value to SNPs

Confirms SNP as “real” and “informative”Confirms SNP as “real” and “informative” Minor Allele Frequency (MAF) - common or rareMinor Allele Frequency (MAF) - common or rare MAF differs by different populationMAF differs by different population Detection of SNP x SNP correlations Detection of SNP x SNP correlations

(Linkage Disequilibrium)(Linkage Disequilibrium) Determine tagging SNPs (tagSNPs)Determine tagging SNPs (tagSNPs) Determine haplotypesDetermine haplotypes

Indiv.#1:Indiv.#1: GTTACGCCAATACAG[ GTTACGCCAATACAG[GG//GG]]ATCCAGGAGATTACCATCCAGGAGATTACCIndiv.#2:Indiv.#2: GTTACGCCAATACAG[ GTTACGCCAATACAG[CC//GG]]ATCCAGGAGATTACCATCCAGGAGATTACCIndiv.#3:Indiv.#3: GTTACGCCAATACAG[ GTTACGCCAATACAG[CC//CC]]ATCCAGGAGATTACCATCCAGGAGATTACC

Page 19: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Interleukin 6 – Collapsed Dataset: Minor Allele Frequencey (MAF) > 10 %

African-Am.African-Am.

European-Am.European-Am.

• Most of the SNPs in the genome are rareMost of the SNPs in the genome are rare• Significant differences in SNP frequency can existSignificant differences in SNP frequency can exist• Correlation between SNPs can be used to select the most informative SNPsCorrelation between SNPs can be used to select the most informative SNPs

Page 20: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

0.00

1.00

2.00

3.00

4.00

5.00

6.00

Jan-03 Mar-03 Jun-03 Aug-03 Oct-03 Jan-04 Mar-04 Jun-04 Sep-04 Jan-05 Mar-05 May-06 Mar-07

dbSNP Release

SNPs(millions)

Validated SNPs

SNPs with Genotypes

dbSNP: All validated SNPs now have been genotyped!dbSNP: All validated SNPs now have been genotyped!

HapMap GenotypingHapMap Genotyping

HapMap release = 4 M validated QC SNPsHapMap release = 4 M validated QC SNPs

5.6 million5.6 millionvalidatedvalidatedrs SNPsrs SNPs

Page 21: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (> 1-5% MAF) - 1/300 bp~ 10 million common SNPs (> 1-5% MAF) - 1/300 bp

When will we have them all?When will we have them all?

Feb 2001Feb 2001 - 1.42 million (1/1900 bp) - 1.42 million (1/1900 bp)Nov 2003Nov 2003 - 2.0 million (1/1500 bp) - 2.0 million (1/1500 bp)Feb 2004 Feb 2004 - 3.3 million (1/900 bp)- 3.3 million (1/900 bp)Mar 2007Mar 2007 - 5.6 million (validated - 1/535 bp) - 5.6 million (validated - 1/535 bp)

Page 22: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Finding SNPs: Sequence-based SNP MiningFinding SNPs: Sequence-based SNP Mining

RANDOM Sequence Overlap - SNP DiscoveryRANDOM Sequence Overlap - SNP Discovery

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC

Genomic Genomic

RRSRRSLibraryLibrary

ShotgunShotgunOverlapOverlap

BACBACLibraryLibrary

BACBACOverlapOverlap

DNADNASEQUENCINGSEQUENCING

mRNAmRNA

cDNAcDNALibraryLibrary

ESTESTOverlapOverlap

RandomRandomShotgunShotgun

Align toAlign toReferenceReference

Page 23: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

SNP discovery is dependent on your sample population sizeSNP discovery is dependent on your sample population size

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC{{2 chromosomes2 chromosomes

0.0 0.2 0.3 0.4 0.50.10.0

0.5

1.0

Minor Allele Frequency (MAF)

Fra

ctio

n o

f S

NP

s D

isco

vere

d

2

888

HapMap data doesn’t capture HapMap data doesn’t capture low frequency SNPslow frequency SNPs

Page 24: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

SNP Characterization/GenotypingSNP Characterization/Genotyping

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million common SNPs (>1-5% MAF) - 1/300 bp~ 10 million common SNPs (>1-5% MAF) - 1/300 bpMar 2007Mar 2007 - 5.6 million (validated/mapped - 1/535 bp) - 5.6 million (validated/mapped - 1/535 bp)

When will we have them all?When will we have them all? HapMap data doesn’t capture low frequency allelesHapMap data doesn’t capture low frequency allelesUsed as a proxy for all common variation through linkage disequilibriumUsed as a proxy for all common variation through linkage disequilibrium

Page 25: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Targeted SNP DiscoveryTargeted SNP Discovery

Directed analysis: cSNPs

5’ 3’

Complete analysis: cSNPs, Linkage Disequilbrium and Haplotype Data

Arg-Cys Val-Val

5’ 3’Arg-Cys Val-Val

PCR amplicons

PCR amplicons

•Generate SNP data from complete genomic resequencing

(i.e. 5’ regulatory, exon, intron, 3’ regulatory sequence)

Page 26: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Increasing Sample Size Improves SNP DiscoveryIncreasing Sample Size Improves SNP Discovery

GTTACGCCAATACAGGTTACGCCAATACAGGGATCCAGGAGATTACCATCCAGGAGATTACCGTTACGCCAATACAGGTTACGCCAATACAGCCATCCAGGAGATTACCATCCAGGAGATTACC{{2 chromosomes2 chromosomes

0.0 0.2 0.3 0.4 0.50.10.0

0.5

1.0

Minor Allele Frequency (MAF)

Fra

ctio

n o

f S

NP

s D

isco

vere

d

2

88

4824

16

8

96

HapMapHapMapBased on Based on ~ 8 ~ 8 chromosomeschromosomes

ResequencingResequencing(n=48)(n=48)

Page 27: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Increasing SNP Density: HapMap ENCODE ProjectIncreasing SNP Density: HapMap ENCODE Project

ENCODEENCODE = = ENCENCyclopedia yclopedia OOf f DDNA NA EElementslementsCatalog all functional elements in 1% of the genome (30 Mb)Catalog all functional elements in 1% of the genome (30 Mb)

10 Regions x 500 kb/region (Pilot Project)10 Regions x 500 kb/region (Pilot Project) David Altschuler (Broad), Richard Gibbs (Baylor)David Altschuler (Broad), Richard Gibbs (Baylor) 16 CEU, 16 YRI, 8 HCB, 8 JPT16 CEU, 16 YRI, 8 HCB, 8 JPT Comprehensive PCR based resequencing across these regionsComprehensive PCR based resequencing across these regions

15,367 dbSNP15,367 dbSNP16,248 New SNPs16,248 New SNPs50% of SNPs in dbSNP50% of SNPs in dbSNP

5 Mb/31,500 SNPs = 5 Mb/31,500 SNPs = 1/160 bp1/160 bp

Page 28: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Goal: Comprehensively identify all common sequence variation in candidate genes

Initial biological focus: Candidate environmental response genes involved in DNA repair, cell cycle, apoptosis, metabolism, cell signaling, and oxidative stress.

Approach: Direct resequencing of genes

Samples: PDR-90 ethnically diverse individuals representative of U.S. population (397 genes)EGP95-95 samples from four ethnic groups (227 genes)

(24 HapMap Asians, 22 HapMap Europeans, 12 HapMap Yorubans, 15 African Americans, 22 Hispanic)

Page 29: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Nov 2005 - Zaitlen et al. Genome Research 15:1594-1600Nov 2005 - Zaitlen et al. Genome Research 15:1594-1600

Summary of NIEHS SNPs genotypes in dbSNPSummary of NIEHS SNPs genotypes in dbSNP

626 genes sequenced626 genes sequenced15 Mb scanned15 Mb scanned> 90,000 genotyped SNPs identified> 90,000 genotyped SNPs identified> 8 million genotypes deposited in dbSNP> 8 million genotypes deposited in dbSNP

Page 30: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Development of a genome-wide SNP map: How many SNPs?Development of a genome-wide SNP map: How many SNPs?

Nickerson and Kruglyak, Nature Genetics, 2001

~ 10 million SNPs (>1% MAF-estimated) - 1/300 bp~ 10 million SNPs (>1% MAF-estimated) - 1/300 bp

NIEHS SNPs = 1/166 bp (n = 95, 4 pops)NIEHS SNPs = 1/166 bp (n = 95, 4 pops)HapMap ENCODE = 1/160 bp (n = 48, 3 pops)HapMap ENCODE = 1/160 bp (n = 48, 3 pops)

Comprehensive resequencing can identify theComprehensive resequencing can identify the vast majority of SNPs in a regionvast majority of SNPs in a region

Page 31: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

SNP Discovery: dbSNP database

Minor Allele Freq. (MAF)

dbSNP (Perlegen/HapMap)

Rarer and population specific SNPs are found by resequencing

60%60%

NIEHS SNPs

15%15%{ 25%25% {

Page 32: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Pathways

Mismatch repair

Double strand break repair

Apoptosis

Cell cycle

Transcription coupled repair

Page 33: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

nsSNPs are found in ~60% of genesnsSNPs are found in ~60% of genes

Page 34: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Protein disrupting SNPs in EGP dataProtein disrupting SNPs in EGP data

Page 35: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

SIFT = Sorting Intolerant From TolerantSIFT = Sorting Intolerant From TolerantEvolutionary comparison of non-synonymous SNPsEvolutionary comparison of non-synonymous SNPs

Classifications: Classifications: Tolerant, IntolerantTolerant, IntolerantNg and Henikoff, Genome Research, 12:436-446, 2002

PolyPhen - PolyPhen - PolyPolymorphism morphism PhenPhenotypingotypingStructural protein characteristics and evolutionary comparisonStructural protein characteristics and evolutionary comparison

Classifications:Classifications:benign, possibly damaging, probably damagingbenign, possibly damaging, probably damagingRamensky, et al., Nucleic Acids Research, 30:17:3894-3900, 2002

Computational inference of nsSNP functionComputational inference of nsSNP function

Page 36: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Deleterious nsSNPs are preferentially low frequencyDeleterious nsSNPs are preferentially low frequency

Page 37: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

EGP and HapMap GenotypingEGP and HapMap GenotypingPDR = 90 ethnically diverse individuals representative of U.S. population

PDRPDRethnically diverseethnically diverse

representation of USrepresentation of USn = 90 (anonymous)n = 90 (anonymous)

397 genes/55,000 SNPs397 genes/55,000 SNPs

European (CEU,n=60)European (CEU,n=60)

African (YRI, n =60)African (YRI, n =60)

Asian (HCB, n = 45)Asian (HCB, n = 45)

Asian (JPT, n = 45)Asian (JPT, n = 45)

Hispanic (n = 60)Hispanic (n = 60)

Afr.-American (n = 62)Afr.-American (n = 62)

Select SNPsSelect SNPs

~7600 SNPs~7600 SNPsIllumina Illumina

GenotypesGenotypesAllele FrequencyAllele Frequency

HWEHWE

EGP p1 panelEGP p1 panel

Page 38: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Illumina NIEHS SNPs Genotyping

Each well samples 1536 SNPs in one individual

For each HapMap sample 5 x 1536 (7680 genotyped SNPs)

3,000,000 genotypes generated (total ~400 samples)

Page 39: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

NIEHS SNPs Genotype Data

PDR (397 genes)PDR (397 genes)SNPs characterized in six SNPs characterized in six different major populationsdifferent major populations..

Page 40: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Yoruban

Japanese

CEPH

Chinese

AfricanAmerican

Hispanic

CEPH Japanese ChineseAfricanAmerican Hispanic

0.341 0.9310 . 2820.274

0.8410.626 0.533

0.415

0 . 410 0 . 7610 . 952

0 . 610

0 . 420

0 . 589

0 . 765

Population Allele Frequency CorrelationsIllumina NIEHS SNPs Genotyping

Page 41: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

EGP and HapMap CompatibilityEGP and HapMap Compatibility

Panel 2 (HapMap based, n=95)Panel 2 (HapMap based, n=95)

European (CEU,n=60)European (CEU,n=60)

African (YRI, n =60)African (YRI, n =60)

Asian (HCB, n = 45)Asian (HCB, n = 45)

Asian (JPT, n = 45)Asian (JPT, n = 45)

Hispanic (n = 60)Hispanic (n = 60)

Afr.-American (n = 62)Afr.-American (n = 62)

European (CEU,n=22)European (CEU,n=22)

African (YRI, n =12)African (YRI, n =12)

Afr.-American (n = 15)Afr.-American (n = 15)

Asian (JPT & HCB, n = 24Asian (JPT & HCB, n = 24

Hispanic (n = 22)Hispanic (n = 22)

HapMap/US Pop.HapMap/US Pop.n = 332n = 332

Page 42: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

EGP p2 DNA panelHapMap Integration (~4 million SNPs)

= EGP SNP discovery (1/200 bp)= HapMap SNPs (~1/1000 bp)

High Density Genic Coverage (EGP)High Density Genic Coverage (EGP)

Low Density Genome Coverage (HapMap)Low Density Genome Coverage (HapMap)

Page 43: SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences mrieder@u.washington.edu NIEHS SNPs Workshop Jan 10-11,

Summary: The Current State of SNP ResourcesSummary: The Current State of SNP Resources

Approximately 10 million Approximately 10 million commoncommon SNPs exist in the human SNPs exist in the human genome genome

(1/300 bp).(1/300 bp).

Random SNP discovery processes generate many SNPs (HapMap)Random SNP discovery processes generate many SNPs (HapMap)

Random approaches to SNP discovery have reached limits of Random approaches to SNP discovery have reached limits of discoverydiscovery

and validation (1/600 bp; 50% SNP validation)and validation (1/600 bp; 50% SNP validation)

Most validated SNPs (5+ million) have been genotyped by the Most validated SNPs (5+ million) have been genotyped by the HapMap (3 pops)HapMap (3 pops)

Resequencing approaches continue to catalog important variants Resequencing approaches continue to catalog important variants (rare (rare

and common not captured by the HapMap)and common not captured by the HapMap)

NIEHS SNPs has generated SNP data on >600 candidate genes NIEHS SNPs has generated SNP data on >600 candidate genes and 90 K SNPs along with large-scale HapMap+ and 90 K SNPs along with large-scale HapMap+ characterization data for of SNPs generated using the PDRcharacterization data for of SNPs generated using the PDR