nhlbi exome sequencing program: myocardial infarction
TRANSCRIPT
NHLBI Exome Sequencing Program: ���Myocardial Infarction Project Team Update
Sekar Kathiresan, MD
April 7, 2010
Myocardial infarction
• Leading cause of death in US
• 565,000 new MI cases annually
• Average age first MI – Male 66y – Female 70y
Thom, Circulation 2006
ESP MI Project Team Heart GO Seattle GO WHISP Broad GO
Chris O’Donnell
(co-convener)
Bruce Psaty
Debbie Nickerson
Chris Carlson
Sekar Kathiresan
(co-convener)
John Spertus
Eric Boerwinkle
Alex Reiner
Becky Jackson
Goncalo Abecasis
Nathaniel Stitziel
Greg Burke Herman Taylor
Gerardo Heiss
Sharon Cresci Shamil Sunyaev
Adrienne Cupples
Russ Tracy
Charles Kooperberg
Ron Do Hooman Allayee
Aaron Folsom
Alex Reiner Deb Farlow David Altshuler
David Herrington
Jacques Rossouw
Stacey Gabriel David Siscovick
Leslie Lange Stanley Hazen Steve Schwartz
Ethan Lange Daniel Rader
Jim Wilson Muredach Reilly
Outline
1. Study design
2. Samples
3. Sequencing Update
Outline
1. Study design
2. Samples
3. Sequencing Update
Study design informed by three observations
• The younger the age at MI, the greater the heritability
• Selecting extremes of a trait distribution likely to improve power
• Genetic discovery may be enhanced by studying multiple ethnicities (i.e., low-frequency PCSK9 nonsense mutations only present in blacks)
What MI age threshold to set?
• ‘Extremeness’
• Availability in population-based
cohorts
Lloyd-Jones, Circulation 2009
Males ≤50 years of age & Females ≤60 years of age
Which referent group to go with cases?
Young MI
Old Without MI
“hyper-normal” controls
Potential advantages: 1. Controls may be enriched for protective alleles
2. Cases enriched for risk alleles
Which referent group to go with cases?
Young MI
Old Without MI
“hyper-normal” controls
1. Highest predicted Framingham risk score 2. Absence of prevalent or incident MI
Outline
1. Study design
2. Samples
3. Sequencing Update
Cases and controls Source of Cases:
FHS ARIC JHS
WHISP MGH Premature Coronary Artery
Disease Study Heart Attack Risk in Puget Sound
Cleveland Clinic GeneBank TRIUMPH
PennCATH
Source of Controls:
FHS ARIC JHS CHS
WHISP
Allocation of sample number
• Initial allocation by ESP Steering Committee – 800 cases and controls (whites and blacks)
• Subsequently expanded by Steering Committee – ~1300 cases and controls
Sample progress
Between last SC 12/09 and March 2010, all samples arrived in Broad lab!
Samples being sequenced in the order that they arrived in lab
Cohort N Arrived in Lab Status
MGH PCAD, HARPS, TRIUMPH
152 Jan 15, 2010 Off sequencers
PennCATH 42 Jan 15, 2010 On sequencers
TRIUMPH 85 Jan 15, 2010 On sequencers
WHISP 166 Feb 1, 2010 On sequencers
Cleveland Clinic 96 Feb 1, 2010 In lab
CHS 103 Feb 17, 2010 In lab
FHS 184 Feb 24, 2010 In lab
JHS 229 Mar 9, 2010 In lab
ARIC 212 March 24, 2010 In lab; 410 sent
Total 1,269
Outline
1. Study design
2. Samples
3. Sequencing Update
Sequencing update
March 15 target: Delivery of first 40 samples for
MI Project Team
First 40 samples sequenced across exome (all European ancestry)
Mean age at MI Males N=22
Mean age at MI Females N=18
33 years 37 years
Sample PF HQ Aligned Bases (Paired & Unpaired)
Target Territory (HS) Mean Target Coverage (HS) PCT Target Bases 20x (HS) PCT Target Bases 30x (HS)
1 10,320,683,544 32,971,708 167.59 89.25 86.31
2 9,606,986,204 32,971,708 154.37 88.32 85.14
3 11,010,551,751 32,971,708 181.64 89.62 86.88
4 11,167,296,563 32,971,708 189.78 90.48 88.01
5 10,631,043,988 32,971,708 173.34 89.55 86.73
6 12,068,244,230 32,971,708 208.18 90.24 87.92
7 11,777,326,372 32,971,708 192.87 90.32 87.74
8 10,350,201,075 32,971,708 169.24 89.73 86.84
9 12,201,227,331 32,971,708 197.17 90.21 87.70
10 10,300,748,023 32,971,708 166.04 89.68 86.80
11 9,581,075,563 32,971,708 159.42 88.83 85.81
12 11,696,544,575 32,971,708 191.39 89.56 86.97
13 12,138,358,605 32,971,708 197.11 89.55 86.94
14 9,906,318,743 32,971,708 147.49 88.88 85.57
15 8,873,299,764 32,971,708 129.13 87.98 84.60
16 10,150,639,745 32,971,708 165.97 88.87 85.93
17 10,999,051,041 32,971,708 174.50 89.86 87.09
18 12,525,435,711 32,971,708 203.56 90.32 87.91
19 11,750,069,741 32,971,708 189.34 89.17 86.50
20 10,616,102,095 32,971,708 173.44 88.98 86.10
21 9,883,832,298 32,971,708 159.95 89.43 86.40
22 11,421,540,649 32,971,708 191.53 89.66 87.05
23 9,647,339,469 32,971,708 150.35 88.73 85.63
24 12,491,458,855 32,971,708 213.70 91.12 88.85
25 11,214,832,975 32,971,708 180.40 89.89 87.17
26 10,722,047,765 32,971,708 175.01 89.93 87.20
27 11,592,214,047 32,971,708 186.55 89.45 86.73
28 13,630,697,959 32,971,708 219.26 90.50 88.15
29 10,848,040,416 32,971,708 176.13 88.87 86.07
30 12,857,424,099 32,971,708 207.22 89.57 87.04
31 10,423,913,384 32,971,708 171.02 89.75 86.93
32 10,165,910,403 32,971,708 159.60 88.47 85.50
33 11,014,794,689 32,971,708 179.47 89.93 87.24
34 11,589,784,066 32,971,708 187.08 89.39 86.69
35 7,313,314,264 32,971,708 125.73 85.46 81.16
36 10,577,442,893 32,971,708 175.70 89.34 86.60
37 12,483,255,877 32,971,708 200.99 89.54 86.93
38 12,734,293,183 32,971,708 203.62 89.77 87.21
39 11,464,914,441 32,971,708 183.10 89.68 87.07
40 12,750,019,766 32,971,708 201.08 90.19 87.63
G S A A N A L Y S I S T E A R S H E E T
Analysis Team ([email protected])"in Genome Sequencing and Analysis"
Program in Medical and Population Genetics"Broad Institute"
ESP: EOMI Project Summary (C315)!
Used samples! 147"
Unused samples! 0 low quality, 0 with no usable lanes, 53 in flight"
Sequencing protocol! Hybrid selection"
Bait design! whole_exome_agilent_designed_120"
Target size! 32,206,937 bases"
Sequencing Summary!
Sequencer! Illumina GA2"
Used lanes! 433/436"
Unused lanes! 3 rejected by sequencing, 0 by analysis"
Used lanes/sample! 3.0 ± 0.2 (median=3)"
Lane parities! 433 paired"
Read lengths! 75.8 ± 0.2"
Sequencing dates! 2010-03-03 to 2010-03-15"Bases Summary (excluding unused lanes/samples)!
Per lane! Per sample!
Reads! 65M ± 6M" 190M ± 24M"
Used bases! 3.5B ± 0.4B" 6B ± 1B"
Target coverage! 63x ± 11x" 177x ± 37x"
% loci > 10x covered! 85% ± 11%" 90% ± 14%"
% loci > 20x covered! 76% ± 12%" 86% ± 15%"
% loci > 30x covered! 68% ± 11%" 83% ± 15%"
Variant Summary!
Found! Est. FP rate!
All SNPs! ~115K" ~5%"
Known SNPs! ~57K" <1%"
Novel SNPs! ~57K" ~10%"
Indels/CNVs! (functionality not available yet)"
3/9/10
Breakdown of functional SNPs!Ti/Tv ratio for known and novel SNPs!
Summary
• Exome sequencing study designed to evaluate extremes of MI trait distribution
• ESP MI Project study samples have all been assembled at sequencing center
• Robust sequencing results for first 40 exomes
Next steps
• Assemble phenotypes from all contributing sites • Deposit phenotypes into dbGAP anteroom • 1st freeze with 400 cases and controls – end June • Define initial analysis plan (Goncalo, Shamil, Stat Gen
Working Group) • Decision re 96 MI cases sequenced through Exome 1 • Decision re additional controls sent by cohorts