perspectives of identifying korean genetic variations
DESCRIPTION
Single Nucleotide Polymorphism (SNP) is the genetic variation most frequently occurred in human genome. SNP is considered as one of the well characterized genetic marker which is useful for the research on human disease genomics as well as the human population stratification. Currently a type of structural variation in the genome, so called Copy Number Variation (CNV), have received public attention in the hope to get additional genetic information that can not be answered by SNPs. To gain insight into Korean specific genetic markers, we analyzed 54,794 SNPs from 159 individuals in 10 regional areas in Korea (CheonAn, NaJu, GimJe, UlSan, Jeju, YeonCheon, JeCheon, GoRyeong, GyeongJu, PyeongChang) and obtained from 1,629 individuals in Pan-Asia (70 population) data set. In addition, we analyzed considerable number of CNVs typed from 16 pairs of twins in Korea. In our study we were able to identify several informative SNP markers that are valuable to distinguish Korean from other ethnic groups. In addition, the investigation of the distribution of identity by descent (IBD) distance within a large Korean family provided a way to examine relationship between individuals. Another interesting finding resulted from this study include the differences in CNV patterns between identical twins. Possible application of genotype data to figuring out individual phenotypes (such as pigmentation, eye color, hair color, height, blood type, etc.) would be an additional profit obtained from this study in the hope to montage or even identify individuals (such as criminal suspects) using genotype data in the near future. In this presentation, I will discuss results generated from this study which may represent the most comprehensive characterization to date for the Korean genome.TRANSCRIPT
Perspectives of identifying Korean genetic variationsChang Bum HongCenter for Genome Science KNIH, KCDCDec 1, 2009
• Population Structure Based on SNP Genotypes
• SNP Based Kinship
• Identify Monozygotic Twins using CNV
• SNP Based Physical Traits
Contents
Population Structure Based on SNP Genotypes
♥
Hello
♥
こんにちは
Objectives• Basic study of Korean population stratification• Evidence of gene flow between Korean and neighbor country• Informative marker of east asian
East Asia
Body mass indexWaist-hip ratioHeightBlood pressurePulse rateBone density
0.1 % difference btw individuals> 10M SNPs in population
Confounding in genetic studies
SNP Individual PopulationPASNP 54,794 1,928 75
HGDP 2,834~ 1,056 52
HapMap 1,481,135 1,397 11
SGVP 268,667 292 3
Korean 58,625 159 10
China(Yanbian) 58,625 16 1
Japan(Kobe) 58,625 5 1
Korea-Japan 58,625 6 1
Vietnam 58,625 16 1
Korean-Vietnam 58,625 8 1
Cambodia 58,625 16 1
Mongol 58,625 16 1
East Asia - Public genotype data
a. Pan-Asian SNP Consortium(http://www4a.biotec.or.th/PASNP)
b. Singapore Genome Variation Project(http://www.nus-cme.org.sg/SGVP)
a
b
HGDP(Human Genome Diversity Project)
PASNP(Pan-Asian SNP Consortium)
GyeongJu
UlSanGoryeong
GimJe
NaJu
Jeju
YeonCheon
Cheonan
JeCheon
PyeongChang
SESW
MWaverage >70 year oldlong settlementAffymetrix 50K Xba
58,960 SNPs
15
16
16
16
16
16
16
16
16
16
China(Yanbian)Japan(Kobe)Korea-Japan
VietnamKorean-Vietnam
CambodiaMongol
Korean Data
Quality Control
58,960 SNPsn = 242 (Korean n = 159)
autosomal 54,794 SNPs
high missing individual gentoype call rate(>3%, mind 0.03)
high missing genotype call rate(>4%, geno 0.04)
low MAF(<0.0.1, maf 0.01)hardy-weinberg test
(p < 1x10-6, hwe 0.000001)
n = 230(Korean n = 153)46,559 SNPs
join HapMap CHBn=367(230+137)
26,189 SNPs
join HapMap JPTn=480(367+113)
25,796 SNPs 1
2
GyeongJu
GoRyeong
GimJe
Before QC 58,960 SNPs All Asian
Missing genotype individuals
Before QC 58,960 SNPsKorean
SNP Individual QC
Korean 46,559 159 153
China(Yanbian) 46,559 16 16
Japan(Kobe) 46,559 5 2
Korea-Japan 46,559 6 4
Vietnam 46,559 16 16
Korean-Vietnam 46,559 8 8
Cambodia 46,559 16 16
Mongol 46,559 16 15
Total 242 230
Quality Control All Asian
Relatedness between the 153 Korean(10 region) Individuals
PCA analysis using autosomal 46,559 SNP markers (n=153, Korean)
GyeongJu
UlSan
JeJu
GimJe
NaJu
GoRyeong
CheonAn
JeCheon
PyeongChang
YeonCheon
LD-based SNP PruningGenerate subset of SNPs that are in approximate linkage equilibriumSliding window 50 SNPs and calculate LDSelect representative SNPs which have low LD(R ≤ 0.2)2
5 SNPs
50 SNPs
5 SNPs
50 SNPs
5 SNPs
First Step Second Step
PCA analysis using pruned 23,290 SNP markers (n=153)
PCA analysis using 46,559 SNP markers (n=153)
PCA using Pruned SNPs
Fst(Fixation index): measure of the genetic differentiation(allele frequency) over subpopulation
Fst of population
Tishoff SA and Kidd KK.(2004). Nature Genetics Suplement 36:S21-S27.
0 ≤ Fst ≤ 0.05: 무시할 정도Fst ≥ 0.25: 유전적 분화의 정도가 큼Fst = 1: 완전히 고립
Paired Fst values for Korean Population Groups
0 ≤ Fst ≤ 0.05: 무시할 정도Fst ≥ 0.25: 유전적 분화의 정도가 큼Fst = 1: 완전히 고립
SNPs Showing Significant Differences in Genotype Frequencies between Korea and Jejua
SNPs for which P values less than 10-3 are listed
a. p values for the Cochran-Armitage trend test of genotype frequencies
b. The KARE are indicated
b
Differences between Korea(9 Region) and Jeju
PCA analysis using 46,559 SNP markers (n=230)
Substructure of East Asian descent
Mongol
Korea
Kobe
Korea-Vietnam
Korea-Japan
Vietnam
Cambodia
YanBian
International HapMap
POP Num_samples Num_SNPs_QC Num_SNPs_QC_poly-------------------------------------------------------------------ASW 87 1623986 1543115CEU 165 1623122 1397814CHB 137 1626122 1341772CHD 109 1620198 1311767GIH 101 1630857 1408904JPT 113 1634041 1294406LWK 110 1625159 1526783MEX 86 1604948 1453054MKK 184 1611733 1532002TSI 102 1632607 1419970YRI 203 1625669 1493761
HapMap 3 Release 3
PCA analysis using 25,796 SNPs(n = 480)
Substructure with HapMap
PCA analysis using pruned 8,347 SNPs(n = 480)
Mongol
Jeju
KobeJPT-HapMapCHB-HapMap
Korea-Vietnam
Korea-Japan
Vietnam
Cambodia
YanBian
Korea-Vietnam Korea-Japan
Jeju Kobe JPT-HapMap
Yanbian
Mongol
CHB-HapMap
Vietnam
Cambodia
PCA analysis of East Asian descent
illustration of geographic correspondence of ethnic group
locations
Relationship between Eigenvector values and Latitude
R = 0.8621y = 36.65 + 166.33x
2
37.53
47.81
39.98
14.72
EAS-AIMs(Ancestry Informative Marker)
Calculate ln value using infocalc
1) All population(KOR, CHB, JPT, MON, CAM): top 300 SNPs 2) Korean and Japanese: top 900 SNPs
3) Korean and Chinese: top 900 SNPs 4) Korean and Vietnam: top 900 SNPs
3,000 East Asian Ancestry Informative Markers
Best performance 1,500 SNP using PCA
List of East Asian Ancestry Informative Markers
a. All Asian(Korea, China, Vietnam, Cambodia, Mongol)
In, informativeness for assignment
Ia, informative for ancestry coefficients
ORCA, optimal rate for correct assignment
a
3,000 East Asian AIM
PCA analysis using 1500 AIMs PCA analysis using 1500 Random SNPs
AIM Sets for determining East Asia
A, Human Genome Diversity Project. B, SNP information with allele frequency
A B
WiKi Based SNP Annotation
SNP Based Kinship
♥ ♥ ♥♥
Identity-by state(IBS) sharingExclude individuals from pairs of samples identified as cryptic first degree relatives(parent-offspring, twins, or siblings concordant for phenotype) or more distant relationships if clusters were linked by a first-degree relative(Science, 2007)
2
Pair from same population
Individual 1 A/C G/T A/G A/A G/G
Individual 2 C/C T/T A/G C/C G/G
IBS 1 1 2 0 2
Identical twinor
redundantsamples
Cryptic Firstdegree
relatives
autosomal 60,959 SNPs (n=608, unrelated individuals + 5 families)
형제조부모-손자
삼촌-조카
IBS value in Korean large family
Identify Monozygotic Twins using CNV
♥ ♥
Twins CNV(Copy Number Variation)24 families(24 monozygotic twins and their parent or brothers)
Agilent Human CNV Microarray 244K X 2 array
Region: chr1
twin
parent
gain
loss
Region: chr2
Region: chrX
SNP Based Physical Traits
♥ ♥
SNPedia & Promethease
April 16, 2009 in Seoul
SNPediahttp://www.snpedia.com
Promethease Report
Pictures of Lilly: 23andMe Contest
Thank you
Questions?
Hong ChangBumCenter for genome ScienceNIH, KCDChttp://cgs.cdc.go.krhttp://ksnp.cdc.go.kr