perspectives of identifying korean genetic variations

44
Perspectives of identifying Korean genetic variations Chang Bum Hong Center for Genome Science KNIH, KCDC Dec 1, 2009

Upload: hong-changbum

Post on 22-Nov-2014

976 views

Category:

Technology


2 download

DESCRIPTION

Single Nucleotide Polymorphism (SNP) is the genetic variation most frequently occurred in human genome. SNP is considered as one of the well characterized genetic marker which is useful for the research on human disease genomics as well as the human population stratification. Currently a type of structural variation in the genome, so called Copy Number Variation (CNV), have received public attention in the hope to get additional genetic information that can not be answered by SNPs. To gain insight into Korean specific genetic markers, we analyzed 54,794 SNPs from 159 individuals in 10 regional areas in Korea (CheonAn, NaJu, GimJe, UlSan, Jeju, YeonCheon, JeCheon, GoRyeong, GyeongJu, PyeongChang) and obtained from 1,629 individuals in Pan-Asia (70 population) data set. In addition, we analyzed considerable number of CNVs typed from 16 pairs of twins in Korea. In our study we were able to identify several informative SNP markers that are valuable to distinguish Korean from other ethnic groups. In addition, the investigation of the distribution of identity by descent (IBD) distance within a large Korean family provided a way to examine relationship between individuals. Another interesting finding resulted from this study include the differences in CNV patterns between identical twins. Possible application of genotype data to figuring out individual phenotypes (such as pigmentation, eye color, hair color, height, blood type, etc.) would be an additional profit obtained from this study in the hope to montage or even identify individuals (such as criminal suspects) using genotype data in the near future. In this presentation, I will discuss results generated from this study which may represent the most comprehensive characterization to date for the Korean genome.

TRANSCRIPT

Page 1: Perspectives of identifying Korean genetic variations

Perspectives of identifying Korean genetic variationsChang Bum HongCenter for Genome Science KNIH, KCDCDec 1, 2009

Page 2: Perspectives of identifying Korean genetic variations

• Population Structure Based on SNP Genotypes

• SNP Based Kinship

• Identify Monozygotic Twins using CNV

• SNP Based Physical Traits

Contents

Page 3: Perspectives of identifying Korean genetic variations

Population Structure Based on SNP Genotypes

Hello

こんにちは

Page 4: Perspectives of identifying Korean genetic variations

Objectives• Basic study of Korean population stratification• Evidence of gene flow between Korean and neighbor country• Informative marker of east asian

East Asia

Page 5: Perspectives of identifying Korean genetic variations

Body mass indexWaist-hip ratioHeightBlood pressurePulse rateBone density

0.1 % difference btw individuals> 10M SNPs in population

Page 6: Perspectives of identifying Korean genetic variations

Confounding in genetic studies

Page 7: Perspectives of identifying Korean genetic variations

SNP Individual PopulationPASNP 54,794 1,928 75

HGDP 2,834~ 1,056 52

HapMap 1,481,135 1,397 11

SGVP 268,667 292 3

Korean 58,625 159 10

China(Yanbian) 58,625 16 1

Japan(Kobe) 58,625 5 1

Korea-Japan 58,625 6 1

Vietnam 58,625 16 1

Korean-Vietnam 58,625 8 1

Cambodia 58,625 16 1

Mongol 58,625 16 1

East Asia - Public genotype data

a. Pan-Asian SNP Consortium(http://www4a.biotec.or.th/PASNP)

b. Singapore Genome Variation Project(http://www.nus-cme.org.sg/SGVP)

a

b

Page 8: Perspectives of identifying Korean genetic variations

HGDP(Human Genome Diversity Project)

Page 9: Perspectives of identifying Korean genetic variations

PASNP(Pan-Asian SNP Consortium)

Page 10: Perspectives of identifying Korean genetic variations

GyeongJu

UlSanGoryeong

GimJe

NaJu

Jeju

YeonCheon

Cheonan

JeCheon

PyeongChang

SESW

MWaverage >70 year oldlong settlementAffymetrix 50K Xba

58,960 SNPs

15

16

16

16

16

16

16

16

16

16

China(Yanbian)Japan(Kobe)Korea-Japan

VietnamKorean-Vietnam

CambodiaMongol

Korean Data

Page 11: Perspectives of identifying Korean genetic variations

Quality Control

58,960 SNPsn = 242 (Korean n = 159)

autosomal 54,794 SNPs

high missing individual gentoype call rate(>3%, mind 0.03)

high missing genotype call rate(>4%, geno 0.04)

low MAF(<0.0.1, maf 0.01)hardy-weinberg test

(p < 1x10-6, hwe 0.000001)

n = 230(Korean n = 153)46,559 SNPs

join HapMap CHBn=367(230+137)

26,189 SNPs

join HapMap JPTn=480(367+113)

25,796 SNPs 1

2

Page 12: Perspectives of identifying Korean genetic variations

GyeongJu

GoRyeong

GimJe

Before QC 58,960 SNPs All Asian

Missing genotype individuals

Before QC 58,960 SNPsKorean

Page 13: Perspectives of identifying Korean genetic variations

SNP Individual QC

Korean 46,559 159 153

China(Yanbian) 46,559 16 16

Japan(Kobe) 46,559 5 2

Korea-Japan 46,559 6 4

Vietnam 46,559 16 16

Korean-Vietnam 46,559 8 8

Cambodia 46,559 16 16

Mongol 46,559 16 15

Total 242 230

Quality Control All Asian

Page 14: Perspectives of identifying Korean genetic variations

Relatedness between the 153 Korean(10 region) Individuals

PCA analysis using autosomal 46,559 SNP markers (n=153, Korean)

GyeongJu

UlSan

JeJu

GimJe

NaJu

GoRyeong

CheonAn

JeCheon

PyeongChang

YeonCheon

Page 15: Perspectives of identifying Korean genetic variations

LD-based SNP PruningGenerate subset of SNPs that are in approximate linkage equilibriumSliding window 50 SNPs and calculate LDSelect representative SNPs which have low LD(R ≤ 0.2)2

5 SNPs

50 SNPs

5 SNPs

50 SNPs

5 SNPs

First Step Second Step

Page 16: Perspectives of identifying Korean genetic variations

PCA analysis using pruned 23,290 SNP markers (n=153)

PCA analysis using 46,559 SNP markers (n=153)

PCA using Pruned SNPs

Page 17: Perspectives of identifying Korean genetic variations

Fst(Fixation index): measure of the genetic differentiation(allele frequency) over subpopulation

Fst of population

Tishoff SA and Kidd KK.(2004). Nature Genetics Suplement 36:S21-S27.

0 ≤ Fst ≤ 0.05: 무시할 정도Fst ≥ 0.25: 유전적 분화의 정도가 큼Fst = 1: 완전히 고립

Page 18: Perspectives of identifying Korean genetic variations

Paired Fst values for Korean Population Groups

0 ≤ Fst ≤ 0.05: 무시할 정도Fst ≥ 0.25: 유전적 분화의 정도가 큼Fst = 1: 완전히 고립

Page 19: Perspectives of identifying Korean genetic variations

SNPs Showing Significant Differences in Genotype Frequencies between Korea and Jejua

SNPs for which P values less than 10-3 are listed

a. p values for the Cochran-Armitage trend test of genotype frequencies

b. The KARE are indicated

b

Differences between Korea(9 Region) and Jeju

Page 20: Perspectives of identifying Korean genetic variations

PCA analysis using 46,559 SNP markers (n=230)

Substructure of East Asian descent

Mongol

Korea

Kobe

Korea-Vietnam

Korea-Japan

Vietnam

Cambodia

YanBian

Page 21: Perspectives of identifying Korean genetic variations

International HapMap

POP Num_samples Num_SNPs_QC Num_SNPs_QC_poly-------------------------------------------------------------------ASW 87 1623986 1543115CEU 165 1623122 1397814CHB 137 1626122 1341772CHD 109 1620198 1311767GIH 101 1630857 1408904JPT 113 1634041 1294406LWK 110 1625159 1526783MEX 86 1604948 1453054MKK 184 1611733 1532002TSI 102 1632607 1419970YRI 203 1625669 1493761

HapMap 3 Release 3

Page 22: Perspectives of identifying Korean genetic variations

PCA analysis using 25,796 SNPs(n = 480)

Substructure with HapMap

PCA analysis using pruned 8,347 SNPs(n = 480)

Mongol

Jeju

KobeJPT-HapMapCHB-HapMap

Korea-Vietnam

Korea-Japan

Vietnam

Cambodia

YanBian

Page 23: Perspectives of identifying Korean genetic variations

Korea-Vietnam Korea-Japan

Jeju Kobe JPT-HapMap

Yanbian

Mongol

CHB-HapMap

Vietnam

Cambodia

PCA analysis of East Asian descent

illustration of geographic correspondence of ethnic group

locations

Page 24: Perspectives of identifying Korean genetic variations

Relationship between Eigenvector values and Latitude

R = 0.8621y = 36.65 + 166.33x

2

37.53

47.81

39.98

14.72

Page 25: Perspectives of identifying Korean genetic variations

EAS-AIMs(Ancestry Informative Marker)

Calculate ln value using infocalc

1) All population(KOR, CHB, JPT, MON, CAM): top 300 SNPs 2) Korean and Japanese: top 900 SNPs

3) Korean and Chinese: top 900 SNPs 4) Korean and Vietnam: top 900 SNPs

3,000 East Asian Ancestry Informative Markers

Best performance 1,500 SNP using PCA

Page 26: Perspectives of identifying Korean genetic variations

List of East Asian Ancestry Informative Markers

a. All Asian(Korea, China, Vietnam, Cambodia, Mongol)

In, informativeness for assignment

Ia, informative for ancestry coefficients

ORCA, optimal rate for correct assignment

a

3,000 East Asian AIM

Page 27: Perspectives of identifying Korean genetic variations

PCA analysis using 1500 AIMs PCA analysis using 1500 Random SNPs

AIM Sets for determining East Asia

Page 28: Perspectives of identifying Korean genetic variations

KDGV(Korean Database of Genomic Variants)

http://ksnp.cdc.go.kr

Page 29: Perspectives of identifying Korean genetic variations
Page 30: Perspectives of identifying Korean genetic variations

A, Human Genome Diversity Project. B, SNP information with allele frequency

A B

WiKi Based SNP Annotation

Page 31: Perspectives of identifying Korean genetic variations

SNP Based Kinship

♥ ♥ ♥♥

Page 32: Perspectives of identifying Korean genetic variations

Identity-by state(IBS) sharingExclude individuals from pairs of samples identified as cryptic first degree relatives(parent-offspring, twins, or siblings concordant for phenotype) or more distant relationships if clusters were linked by a first-degree relative(Science, 2007)

2

Pair from same population

Individual 1 A/C G/T A/G A/A G/G

Individual 2 C/C T/T A/G C/C G/G

IBS 1 1 2 0 2

Page 33: Perspectives of identifying Korean genetic variations

Identical twinor

redundantsamples

Cryptic Firstdegree

relatives

autosomal 60,959 SNPs (n=608, unrelated individuals + 5 families)

Page 34: Perspectives of identifying Korean genetic variations

형제조부모-손자

삼촌-조카

IBS value in Korean large family

Page 35: Perspectives of identifying Korean genetic variations

Identify Monozygotic Twins using CNV

♥ ♥

Page 36: Perspectives of identifying Korean genetic variations

Twins CNV(Copy Number Variation)24 families(24 monozygotic twins and their parent or brothers)

Agilent Human CNV Microarray 244K X 2 array

Region: chr1

twin

parent

gain

loss

Page 37: Perspectives of identifying Korean genetic variations

Region: chr2

Page 38: Perspectives of identifying Korean genetic variations

Region: chrX

Page 39: Perspectives of identifying Korean genetic variations

SNP Based Physical Traits

♥ ♥

Page 40: Perspectives of identifying Korean genetic variations

SNPedia & Promethease

April 16, 2009 in Seoul

SNPediahttp://www.snpedia.com

Page 41: Perspectives of identifying Korean genetic variations

Promethease Report

Page 42: Perspectives of identifying Korean genetic variations

Pictures of Lilly: 23andMe Contest

Page 43: Perspectives of identifying Korean genetic variations

Thank you

Page 44: Perspectives of identifying Korean genetic variations

Questions?

Hong ChangBumCenter for genome ScienceNIH, KCDChttp://cgs.cdc.go.krhttp://ksnp.cdc.go.kr