the ashkenazi genome project
DESCRIPTION
The Ashkenazi Genome Project. Shai Carmi Pe’er lab, Columbia University and The Ashkenazi Genome Consortium (TAGC). Personal Genomes & Medical Genomics Cold Spring Harbor, NY November 2012. Recent History of Ashkenazi Jews. Mediterranean origin (?) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/1.jpg)
The Ashkenazi Genome Project
Shai CarmiPe’er lab, Columbia University
andThe Ashkenazi Genome Consortium (TAGC)
Personal Genomes & Medical GenomicsCold Spring Harbor, NY
November 2012
![Page 2: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/2.jpg)
Recent History of Ashkenazi Jews
• Mediterranean origin (?)• Ca. 1000: Small communities
in N. France, Rhineland
• Migration east
• Expansion
• ~10M today, mostly
in US and Israel
• Relative isolation
![Page 3: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/3.jpg)
Ashkenazi Jewish Genetics
Behar et al., Nature 2010.Bray et al., PNAS 2010.Guha et al., Genome Biology 2012.
300 Jewish individuals; SNP arrays
• Recently, AJ shown to be a genetically distinct group• Close to Middle-Eastern & South-European populations
Price et al., PLoS Genetics 2008.Olshen et al., BMC Genetics 2008.Need et al., Genome Biology 2009.Kopelman et al., BMC Genetics, 2009.
AJ
Atzmon et al., AJHG 2010
Jewish non-AJ
Middle-Eastern
Europeans
![Page 4: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/4.jpg)
Recent Demography & IBDIn small populations, common ancestors are likely recent.
A B
![Page 5: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/5.jpg)
Recent Demography & IBDIn small populations, common ancestors are likely recent.
A B
AB
A shared segment
• IBD is highly informative on recent history!
• IBD common in AJ.(Gusev et al., MBE 2011)
Many long haplotypes identical-by-descent
![Page 6: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/6.jpg)
AJ Genetic History
Expansion rate ≈34% per generation
2,300
N
t
Effective size
45,000270
4,300,000
Years ago
800
Present
Palamara et al., AJHG 2012
High potential for genetic studies!
0%
20%
40%
60%
80%
100%
0 50 100 150 200 250 300 350 400 450 500
# of Sequenced Individuals
% A
dditi
onal
Info
rmati
on P
oten
tial
WTCCC AJ_SCZ AJUK
Pow
er o
f im
puta
tion
by IB
D
![Page 7: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/7.jpg)
The Ashkenazi Genome Consortium
Phase I:• 58 AJ personal genomes (86 under way)• ~60yo, healthy controls• Unrelated, PCA-validated AJ• Technology: Complete Genomics
Goal:• Sequence to high coverage hundreds of healthy AJ
o Use as a reference panel for association studies, imputation, and clinical interpretation
o Understand population history and functional genetic variation in AJ
![Page 8: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/8.jpg)
Quality ControlProperty Genome (exome)Coverage ~55x
Fraction called 96.5±0.003% (98%)Fraction with coverage > 20x 92.4±0.018% (94.9%)Concordance with SNP array 99.87±0.1%
Ti/Tv ratio 2.14±0.003 (3.05)
Ti/T
v
![Page 9: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/9.jpg)
Variant Statistics &Comparison to Europeans
TAGC
14 Flemish genomes (Belgium)
All SNPs 3000000
3200000
3400000
3600000
Het/hom1.4
1.6
In-ser-
tions
Deletions MNPs0
100000200000
(M)
(k) Similar results in 13 CG European public genomes.
![Page 10: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/10.jpg)
Comparison to Europeans• Allele frequency spectrum:– No excess singletons.– Slight excess of doubletons.
• More novel SNPs in AJ (3.8% vs. 3.1%).
singletons
doubletons
![Page 11: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/11.jpg)
Quality Control (2)False positive rate assessment by runs of homozygosity:• Assume hets in high confidence roh are FP.
• Genome wide extrapolation: ~20,000 per genome.• QC: – Discard putatively low-quality variants– Discard HWE violations, low call rate
FP after QC: ~5,000 per genome.
hets
PaternalMaternal
![Page 12: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/12.jpg)
Applicability to Clinical Genomics
• Variants of unknown significance– Technical false positives– True variants without health impact
All After QC Not in panel
020000400006000080000
100000120000140000
Total
All After QC Not in panel
0
100
200
300
400
500
600
Non-synonymous
Nov
el v
aria
nts p
er sa
mpl
e
Not in TAGC
Not in TAGC
![Page 13: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/13.jpg)
Demographic Inference• Use allele frequency spectrum and coalescent simulations.• Assume the demographic model previously mentioned.
• Parameters qualitatively similar to those inferred from IBD• Bottleneck 35gbp of size 500; Pre-bottleneck size 90,000
100
10
1
0.1
%sit
es
![Page 14: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/14.jpg)
Summary• IBD reveals AJ population bottleneck and expansion and
potential for genetics studies.• High quality genomes sequenced by TAGC indicate
utility in clinical setting.• Confirm demography and demonstrate subtle
differences from Europeans.
• Ongoing analysis:– Imputation power using TAGC vs. 1kG as ref panels– Local ancestry inference– Functional variants; AJ disease genes– Mobile element insertions
![Page 15: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/15.jpg)
Thank you!TAGC consortium members:Columbia University Computer Science:Itsik Pe’er, Pier Francesco PalamaraUndergrads: Fillan Grady, Ethan Kochav, James XueIT: Shlomo HershkopLong-Island Jewish Medical Center:Todd Lencz, Semanti Mukherjee, Saurav GuhaColumbia University Medical Center:Lorraine Clark, Xinmin LiuAlbert Einstein College of Medicine:Gil Atzmon, Harry OstrerMount Sinai School of Medicine:Inga Peter, Laurie OzeliusMemorial Sloan Kettering Cancer Center:Ken Offit, Vijai JosephYale School of Medicine:Judy Cho, Ken Hui, Monica BowenThe Hebrew University of Jerusalem:Ariel Darvasi
Funding:Human Frontiers Science program.
VIB, Gent, BelgiumHerwig Van Marck, Stephane PlaisanceComplete GenomicsJason Laramie
![Page 16: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/16.jpg)
Formal Inference Using IBD• Assume a population of historical size . • Total shared segments of length :
A B
AB
A shared segment
Palamara et al., AJHG 2012
• Detect IBD in sample Infer history .
![Page 17: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/17.jpg)
Data processing• CGA tools VCF generator: called sites only.• Correct multi-nucleotide substitution bug.• Compress, index, and distribute.• Generate high-quality genotypes set for population genetic analyses.
– Remove indels and multi-nucleotide substitutions.– Remove low-quality SNPs.– Remove multi-alleic SNPs.– Remove half-calls.– Remove SNPs with high no-call rate.– Remove SNPs not in Hardy-Weinberg equilibrium.– Remove monomorphic reference SNPs.– Remove an inbred individual.– Format as Plink file.
![Page 18: The Ashkenazi Genome Project](https://reader035.vdocument.in/reader035/viewer/2022062222/5681674a550346895ddbfae2/html5/thumbnails/18.jpg)
Variant statisticsStatistic Per genome (exome)
Total SNPs 3.4M (22k)
Novel SNPs 3.7% (4%)
Het/hom ratio 1.64 (1.67)
Insertions count 223k (246)
Deletions count 237k (218)
Substitutions count 83k (374)
Synonymous SNPs 10525
Non-synonymous SNPs 9695
Nonsense SNPs 71
Other disrupting 241
CNV count 336
SV count 1486
MEI count 3475