chapter 31 high resolution phylogeographic map of y ... volume-journal/t-anth-00-special...

CHAPTER 31

High Resolution Phylogeographic Map of Y-ChromosomesReveal the Genetic Signatures of Pleistocene Origin of

Indian Populations

R. Trivedi, Sanghamitra Sahoo, Anamika Singh, G. Hima Bindu, Jheelam Banerjee,Manuj Tandon, Sonali Gaikwad, Revathi Rajkumar, T Sitalaximi, Richa Ashma,

G. B. N. Chainy and V. K. Kashyap

INTRODUCTION

The origins of modern humans in South Asiahave been obscure. Archeological and paleo-anthropological evidences are few and fragmen-tary. Human remains dating back to the LatePleistocene provide limited but conclusive evi-dence for early human occupation in the Indiansubcontinent (Deraniyagala, 1992; Kennedy,2000; James et al., 2005). A number of artefacts ofMiddle and Upper Paleolithic cultures in NarmadaValley and the remains of Acheulian culture havebeen extensively found through out South Asia.Mesolithic microliths and evidences of Neolithicsettlements found in diverse parts of thesubcontinent also testify towards occupation ofIndia by early humans (Misra, 2001). Most of theprevailing genetic records further corroborate withthe hypothesis that Homo sapiens colonizedSouth Asia as a part of an early southern dispersalfrom Africa (Quintana-Murci et al., 1999; Cann,2001; Macaulay et al., 2005). This paper examinesthe current genetic diversity of Indian Y chromo-somes in context to place the genetic origin/(s)and time of settlement of the earliest humanpopulations in India.

The present-day populations of India belongto 4635 endogamous communities (Singh, 1998)and speak as many as 350 living languages(ethnologue), which fall under the four majorsupra-language families, i.e., Indo-European,Dravidian, Sino-Tibetan and Austric. The natureof extensive diversity among varied groupsreported with 54 classical markers showed a typi-cally north-south geographic division of popula-tions and placed Indians closer to Europeanpopulations than either with east-Asians orAfricans in the genetic distance trees (Cavalli-Sforza et al., 1994). A number of studies based onmt DNA, Y-chromosome and other nuclear DNAmarkers have invariably supported these obser-vations. Numerous surveys of genetic variationhave generally portrayed the differences between

caste and tribes, and the extent of gene flowamong ranked caste clusters. Most of the studiesconclude that maternal gene pool Indianpopulations are proto-Asian in origin with limitedwest-Eurasian admixture. While the Y-chromo-somes of the caste populations were found to bemore similar to Europeans than Asians; withgreater west-Eurasian admixture in castes ofhigher rank (Bamshad et al., 2001), recent studiesprovide congruent evidence against any majorinflux of Indo-European speakers into the Indiangene pool and have ascertained a late PleistoceneSouth Asian origin for majority of Indianpopulations (Sahoo et al., 2006; Sengupta, 2006).These new findings are consistent with archeo-botanical evidences (Fuller, 2003) and linguisticdata (Renfrew, 1989) which suggest a recentcommon root for Elamite and Dravidic languages.It is hypothesized that the same prehistoric genepool of southern Asian Pleistocene coastal settlersfrom Africa provided inocula for both Indian castesand tribes, and subsequent diversification of thegene pools was probably due to the genetic imprintslaid down by later migrants, such as Huns, Greeks,Kushans, Moghuls, and others (Kivisild et al.,2003). However, much speculations remain aboutwhich of the population groups are amongst theearliest settlers of the Indian subcontinent. Whilethe Austro-Asiatic tribes have been presumed tobe descendants of the early modern humansbased on nucleotide diversity of mitochondrialM haplogroup (Roychoudhary et al., 2001; Basuet al., 2003), analysis of the Indian Y chromosomesundertaken in this study depicts a differentscenario.

In this study, we have assessed a total of 1434unrelated male individuals belonging to 87different Indian populations, of which 936 Y-chromosomes have been previously analysed for38 Y-SNP markers (Sahoo et al., 2006). Additional216 samples are included in the present analysis,while Y-chromosomal haplogroup data for 282additional samples from seven other Indian

394 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.

populations were collated from the literature(Kivisild et al., 2003; Cordaux et al., 2004). Thepresent study is based on simultaneous analysisof 38 SNP and 20-STR markers on the Y-chromo-some to provide the age estimates and describetheir phylogeographic distribution. Apart fromdetermining antiquity of various populationsgroups in South Asia, we also discuss the geneticstructure and peopling of the subcontinent inlight of present molecular evidences.

SUBJECTS AND METHODS

Populations Analyzed

A total of 1152 unrelated male individualsbelonging to 80 different populations wereanalyzed in the present study. Samples includepopulations from various linguistic families (Indo-European, Austro-Asiatic, Dravidian and Tibeto-Burman) and sixteen geographical areas of India.Blood samples were collected with informedconsent using a protocol approved by the ethicalcommittee of CFSL, Kolkata. DNA was extractedusing standard protocols (Sambrook, 1989) fromperipheral blood lymphocytes. Informationconcerning their geographic origin, linguistic andsocio-ethnic affiliation for each population isgiven in table 1. Additional data on 282 samplesfrom seven Indian populations (Punjab,Konkanstha Brahmin, Koya, Yerava, Mullukunan,Kuruchian, Koraga) and from 76 populations ofWestern Europe (51), Russia (281), Middle East(102), Caucasus (122), Central Asia (584), Siberia(66), North East Asia (334), South East Asia (552),Oceania (225), Pakistan (691) and Sri Lanka (39)were collated from literature and included in thegenetic distance analysis.

Markers Analyzed

38 binary polymorphisms included in thepresent analysis have been previously described(Sahoo et al. 2006). Analysis of 20 Y-STRs (twelvetetranucleotide repeats DYS19, DYS385a/b, DYS389I/II, DYS390, DYS391, DYS393, DYS460, H4,DYS437 and DYS439 and three trinucleotiderepeats, DYS392, DYS388 and DYS426, twodinucleotide YCAIIa/b, two pentanucleotiderepeat loci, DYS438 and DYS447 and a hexa-repeatnucleotide DYS448) was carried out in the sameDNA samples by using an in-house standardizedprotocol using primers described elsewhere

(Butler et al., 2002). Y-STR haplotypes wereconstructed in a sequential order of loci keepingan ascending numerical order for the minimalhaplotype to facilitate Y-chromosome compari-sons with other world populations.

Statistical Analyses

Several population genetic parameters,including mean haplogroup and haplotypediversity and their standard errors, mean numberof pair wise differences (MPD), pairwise F

ST values

for haplogroups and associated p values werecalculated using ARLEQUIN ver.2.0 softwarepackage (Schneider et al. 2000). To test fordifferences in the proportions, χ2- test forsignificance was employed.

Apportionment of genetic differences amongvarious socio-ethnic, geographic and linguisticgroups at different levels of hierarchical sub-divisions; between individuals within populations,between populations within groups and betweengroups of populations were calculated usinganalysis of molecular variance (AMOVA)(Excoffier et al., 1992). To examine the factor/(s)responsible for genetic differentiation of Y-chromosomes, AMOVA was done both on binarymarkers as well as with Y-STRs within the lineages.Significance levels of the genetic variancecomponents as well as Ö

ST values were estimated

by using 10000 iterations.Median-joining network algorithm of

haplogroup associated haplotypes (Bandelt et al.,1999; Forster et al., 2000) was performed usingthe software NETWORK 4.1.0.8 version (LifeSciences and Engineering Technology SolutionsWeb site), with epsilon value set to zero. Fornetwork calculation, seven Y-STR (DYS19,DYS389I, DYS389II, DYS390, DYS391, DYS392and DYS393) loci were used, where weightage toeach locus was given according to the estimatedvariance. Y-STR loci with highest variance wasgiven the lowest weights.

To estimate the time to the most recentcommon ancestor (TMRCA), we calculated theages to STR variation within the correpondinghaplogroup observed in the Indian populationsusing the average square difference (ASD)method. We used the same seven Y-STRs as thoseused in Network analysis and and a generationtime of 25 years and mutation rate of 6.9 X 10-4 asdescribed by (Zhivotovsky et al., 2004).

Neighbor-joining tree based on FST values of

395 EARLIEST SETTLERS OF INDIAN SUBCONTINENT

87 Indian populations were used to illustrate thegenetic affinity between the studied groups usingMEGA 3.0. Genetic relationship betweenpopulations of India and other parts of the worldwas estimated based on pairwise geneticdistances (FST values) calculated from haplogroupfrequencies. Multidimensional scaling (MDS)analysis of pairwise FST values was performedusing XL STAT pro 7.5 to decipher the geneticaffinities of populations. The Indian populationswere pooled into their regional boundaries forcomparison of genetic similarity with worldpopulations, to obtain a better resolution in theMDS plot.

RESULTS

Approximately 1152 individuals belonging to80 extant human populations from 16geographical regions of India were analysed with38 Y-SNPs and 20 Y-STR markers to evaluate thepossibility that Austro-Asiatic speakers are theearliest settlers of the Indian subcontinent. 24different paternal lineages were observed, out ofwhich, haplogroups H-M69, R1a1-M17, R2-M124and O2a-M95 together account for 69% of thepaternal diversity in South Asia. Another 20.9%of the genetic variation in Indian males isdescribed by haplogroups L-M11, J2-M172, O3e-M134, K2-M70, F-M89 and C-RPS4Y

711, while the

presence of other haplogroups- R1b3-M269 andG-M201 could be attributed to recent admixturewith Europeans.

Haplogroups and Extent of Y-ChromosomeDiversity in Indians

The Y-SNPs used in this study were based onprevious reports of polymorphisms in Eurasianand Oceanic populations. Overall haplogroupdiversity among Indians was relatively high whencompared to European or East Asian populations.Indian populations depicted diversity values from0.133 to a high of 0.914, with Austro-Asiatic andTibeto-Burman tribes generally showing reduceddiversity (Table 1). Twenty-five Dravidianpopulations showed a higher mean haplogroupdiversity (0.723± 0.083) compared to Indo-European speakers (0.684± 0.079) represented bythirty endogamous groups. South Indian groups;Andhra Brahmins, Kallar, Raju, Chenchu andLambadi displayed high lineage diversity values,while populations of North India typically

demonstrated lower mean haplogroup diversity(0.569 ± 0.104). The distribution of the lineagesand Y-STR diversity within the haplogroups aredescribed in detail.

Haplogroup H-M69

Majority of males analyzed from differentgeographic regions of India (23%) carried theM69C haplotype, which is additionally definedby M52C mutation. Distribution of HaplogroupH showed a north-south gradient (24.4% to27.4%), however geographically its totalfrequency was highest (44.4%) in populations ofwestern India (Table 2a). Among the 23% carryingH lineage in their Y-chromosomes, most of themwere representatives of south India (27.4%),speaking Dravidian languages (30.0%). In socio-ethnic groups, the frequency was 27.6% in lowercaste groups, while in the tribal groups itaccounted for 21.2% of their paternal variation.However, the pattern of distribution did not varystatistically between the Dravidian and Indo-European speaking tribes or caste cluster (Table2b). 206 distinct 20-Y-STR haplotype profilesdeciphered out of 221 individuals carried a meanpairwise difference of 12.66 (Table 3), where noneof the hapotypes were shared between groups.In the median–joining network analysis with 7-Y-STRs associated with M69C/ M52C lineagebranch, majority of Y-chromosome STR haplo-types are connected by one-or two-step mutationevents (Fig. 2a). Kora, an Indo-European speakingtribal group from eastern India, branched out ofthe network with more than three mutation steps.The Y-STR based coalescence time of haplogroupH1 chromosomes was estimated to be ~ 43,556years (Table 4).

Haplogroup R1a1-M17

Haplogroup R1a1-M17 characterizes 17.5% ofthe Indians. The Indo-European speakersdemonstrated a significantly higher proportionof this lineage as compared to populationsbelonging to Dravidian linguistic family (29.7%vs. 11.7%; χ2= 7.82, p<0.05) (Table 2a). With theexception of Lodha, Nepali and Bhutia, all otherAustro-Asiatic and Tibeto-Burman speakers lackthis haplogroup in their Y-chromsomes. While theIndo-European and Dravidian caste group depictsignificant variation (χ2= 12.5, p<0.01), the tribalgroups are more akin. Distribution of M17 lineage

396 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.T

able

1:

Des

crip

tion

of

the

Ind

ian

pop

ula

tion

s in

clu

ded

in

th

is s

tud

y

Pop

ulat

ion

Cod

eSt

ate

Reg

ion

La

ng

ua

ge

Soci

al S

tatu

sH

iera

rchy

Hap

logr

oup

Div

ersi

ty

1A

ND

HR

AB

RA

HM

INA

NB

AN

DH

RA

PR

AD

ES

HS

outh

Dra

vidi

anC

AS

TE

Upp

er0.

8538

± 0

.050

42

CH

EN

CH

UC

HU

AN

DH

RA

PR

AD

ES

HS

outh

Dra

vidi

anT

RIB

E0.

8474

± 0

.043

73

KA

MM

A C

HA

UD

HA

RY

KM

CA

ND

HR

A P

RA

DE

SH

Sou

thD

ravi

dian

CA

ST

EL

ower

0.64

33 ±

0.1

078

4K

AP

PU

NA

IDU

KP

NA

ND

HR

A P

RA

DE

SH

Sou

thD

ravi

dian

CA

ST

EL

ower

0.57

37 ±

0.1

213

5K

OM

AT

IK

OM

AN

DH

RA

PR

AD

ES

HS

outh

Dra

vidi

anC

AS

TE

Low

er0.

5000

± 0

.122

26

LA

MB

AD

IL

MD

AN

DH

RA

PR

AD

ES

HS

outh

Indo

-Eur

opea

nT

RIB

E0.

8789

± 0

.043

27

NA

IKP

OD

GO

ND

NP

GA

ND

HR

A P

RA

DE

SH

Sou

thD

ravi

dian

TR

IBE

0.74

21 ±

0.0

584

8R

AJU

RA

JA

ND

HR

A P

RA

DE

SH

Sou

thD

ravi

dian

CA

ST

EM

iddl

e0.

8596

± 0

.039

39

RE

DD

YR

DY

AN

DH

RA

PR

AD

ES

HS

outh

Dra

vidi

anC

AS

TE

Mid

dle

0.80

22 ±

0.0

687

10

YE

RU

KU

LA

YE

RA

ND

HR

A P

RA

DE

SH

Sou

thD

ravi

dian

TR

IBE

0.63

16 ±

0.0

875

11

AD

I PA

SI

AD

IA

RU

NA

CH

AL

PR

AD

ES

HN

orth

-Eas

tT

ibet

o-B

urm

anT

RIB

E0.

3556

± 0

.159

11

2B

IHA

RB

RA

HM

INB

BH

BIH

AR

Eas

tIn

do-E

urop

ean

CA

ST

EU

pper

0.45

10 ±

0.1

174

13

BH

UM

IHA

RB

HU

BIH

AR

Eas

tIn

do-E

urop

ean

CA

ST

EM

iddl

e0.

6368

± 0

.115

11

4R

AJP

UT

BR

JB

IHA

RE

ast

Indo

-Eur

opea

nC

AS

TE

Upp

er0.

4394

± 0

.158

11

5K

AY

AS

TH

AB

KY

BIH

AR

Eas

tIn

do-E

urop

ean

CA

ST

EM

iddl

e0.

7363

± 0

.074

81

6Y

AD

AV

YA

VB

IHA

RE

ast

Indo

-Eur

opea

nC

AS

TE

Low

er0.

6786

± 0

.122

01

7K

UR

MI

KU

IB

IHA

RE

ast

Indo

-Eur

opea

nC

AS

TE

Low

er0.

8590

± 0

.063

31

8B

AN

IYA

BA

NB

IHA

RE

ast

Indo

-Eur

opea

nC

AS

TE

Low

er0.

7273

± 0

.067

91

9G

UJ

PAT

EL

PA

TG

UJR

AT

Wes

tIn

do-E

urop

ean

CA

ST

EM

iddl

e0.

7778

± 0

.110

02

0H

P R

AJP

UT

HR

JH

PN

ort

hIn

do-E

urop

ean

CA

ST

EU

pper

0.91

43 ±

0.0

425

21

OR

OA

NO

RO

JHA

RK

HA

ND

Eas

tD

ravi

dian

TR

IBE

0.40

91 ±

0.1

333

22

HO

HO

JHA

RK

HA

ND

Eas

tA

ustr

o-A

siat

icT

RIB

E0.

0000

± 0

.000

02

3B

HU

MIJ

BH

JJH

AR

KH

AN

DE

ast

Aus

tro-

Asi

atic

TR

IBE

0.45

71 ±

0.1

406

24

KH

AR

IAK

RA

JHA

RK

HA

ND

Eas

tA

ustr

o-A

siat

icT

RIB

E0.

5333

± 0

.180

12

5M

UN

DA

MU

NJH

AR

KH

AN

DE

ast

Aus

tro-

Asi

atic

TR

IBE

0.14

29 ±

0.1

188

26

BIR

HO

RB

IRJH

AR

KH

AN

DE

ast

Aus

tro-

Asi

atic

TR

IBE

0.13

33 ±

0.1

123

27

SA

NT

HA

LSA

NJH

AR

KH

AN

DE

ast

Aus

tro-

Asi

atic

TR

IBE

0.00

00 ±

0.0

000

28

IYE

NG

AR

IYN

KA

RN

AT

AK

AS

outh

Dra

vidi

anC

AS

TE

Upp

er0.

7485

± 0

.061

02

9L

ING

AY

AT

LY

NK

AR

NA

TA

KA

Sou

thD

ravi

dian

CA

ST

EU

pper

0.83

33 ±

0.0

691

30

GO

WD

AG

OW

KA

RN

AT

AK

AS

outh

Dra

vidi

anC

AS

TE

Low

er0.

9524

± 0

.095

53

1B

HO

VI

BH

VK

AR

NA

TA

KA

Sou

thD

ravi

dian

CA

ST

EL

ower

0.75

24 ±

0.0

918

32

CH

RIS

TIA

NC

HR

KA

RN

AT

AK

AS

outh

Dra

vidi

anC

AS

TE

Low

er0.

8333

± 0

.059

73

3M

US

LIM

MU

SK

AR

NA

TA

KA

Sou

thD

ravi

dian

CA

ST

EL

ower

0.83

33 ±

0.2

224

34

KU

RU

VA

KU

RK

AR

NA

TA

KA

Sou

thD

ravi

dian

TR

IBE

0.74

36 ±

0.0

909

35

DE

SA

ST

H B

RA

HM

IND

SBM

AH

AR

AS

TR

AW

est

Indo

-Eur

opea

nC

AS

TE

Upp

er0.

8421

± 0

.065

73

6C

HIT

PAV

AN

BR

AH

MIN

CH

BM

AH

AR

AS

TR

AW

est

Indo

-Eur

opea

nC

AS

TE

Upp

er0.

9048

± 0

.040

53

7M

AR

AT

HA

MA

TM

AH

AR

AS

TR

AW

est

Indo

-Eur

opea

nC

AS

TE

Mid

dle

0.80

00 ±

0.0

681

38

DH

AN

GA

RD

GR

MA

HA

RA

ST

RA

Wes

tIn

do-E

urop

ean

CA

ST

EL

ower

0.81

67 ±

0.0

571

39

PAW

AR

AP

WR

MA

HA

RA

ST

RA

Wes

tIn

do-E

urop

ean

TR

IBE

0.74

17 ±

0.1

053

40

KA

TK

AR

IK

TK

MA

HA

RA

ST

RA

Wes

tIn

do-E

urop

ean

TR

IBE

0.82

46 ±

0.0

648

41

MA

DIA

GO

ND

MG

DM

AH

AR

AS

TR

AW

est

Dra

vidi

anT

RIB

E0.

6264

± 0

.109

8

397 EARLIEST SETTLERS OF INDIAN SUBCONTINENTT

able

1:

Con

td...

.

Pop

ulat

ion

Cod

eSt

ate

Reg

ion

La

ng

ua

ge

Soci

al S

tatu

sH

iera

rchy

Hap

logr

oup

Div

ersi

ty

42

MA

HA

DE

O K

OL

IM

KL

MA

HA

RA

ST

RA

Wes

tIn

do-E

urop

ean

TR

IBE

0.76

36 ±

0.0

833

43

MA

RA

MR

AM

IZO

RA

MN

orth

-Eas

tT

ibet

o-B

urm

anT

RIB

E0.

5333

± 0

.051

54

4H

MA

RH

MR

MIZ

OR

AM

Nor

th-E

ast

Tib

eto-

Bur

man

TR

IBE

0.27

89 ±

0.1

235

45

LA

IL

AI

MIZ

OR

AM

Nor

th-E

ast

Tib

eto-

Bur

man

TR

IBE

0.54

55 ±

0.0

615

46

LU

SE

IL

US

MIZ

OR

AM

Nor

th-E

ast

Tib

eto-

Bur

man

TR

IBE

0.40

94 ±

0.1

002

47

KU

KI

KU

KM

IZO

RA

MN

orth

-Eas

tT

ibet

o-B

urm

anT

RIB

E0.

6818

± 0

.091

04

8M

AN

IPU

RI

MU

SL

IMM

MS

MIZ

OR

AM

Nor

th-E

ast

Tib

eto-

Bur

man

CA

ST

EL

ower

0.83

33 ±

0.0

980

49

OR

IYA

BR

AH

MIN

OB

HO

RIS

SAE

ast

Indo

-Eur

opea

nC

AS

TE

Upp

er0.

8043

± 0

.069

75

0K

AR

AN

KR

NO

RIS

SAE

ast

Indo

-Eur

opea

nC

AS

TE

Mid

dle

0.64

71 ±

0.0

953

51

KH

AN

DA

YA

TK

DY

OR

ISSA

Eas

tIn

do-E

urop

ean

CA

ST

EM

iddl

e0.

7564

± 0

.097

45

2G

OP

EG

PE

OR

ISSA

Eas

tIn

do-E

urop

ean

CA

ST

EL

ower

0.83

33 ±

0.0

720

53

PAR

OJA

PR

JO

RIS

SAE

ast

Dra

vidi

anT

RIB

E0.

8667

± 0

.048

35

4JU

AN

GJU

NO

RIS

SAE

ast

Aus

tro-

Asi

atic

TR

IBE

0.00

00 ±

0.0

000

55

SAO

RA

SAR

OR

ISSA

Eas

tA

ustr

o-A

siat

icT

RIB

E0.

6784

± 0

.088

45

6N

EPA

LI

NE

PS

IKK

IMN

orth

-Eas

tT

ibet

o-B

urm

an/

CA

ST

EU

pper

0.90

48 ±

0.1

033

Indo

-Eur

opea

n5

7B

HU

TIA

BH

TS

IKK

IMN

orth

-Eas

tT

ibet

o-B

urm

anT

RIB

E0.

5000

± 0

.265

25

8C

HA

KK

LIA

RC

HK

TA

MIL

NA

DU

Sou

thD

ravi

dian

CA

ST

EL

ower

0.79

12 ±

0.0

673

59

KA

LL

AR

KA

LT

AM

IL N

AD

US

outh

Dra

vidi

anC

AS

TE

Mid

dle

0.87

88 ±

0.0

751

60

VA

NN

IYA

RV

AN

TA

MIL

NA

DU

Sou

thD

ravi

dian

CA

ST

EM

iddl

e0.

8333

± 0

.059

76

1PA

LL

AR

PAL

TA

MIL

NA

DU

Sou

thD

ravi

dian

CA

ST

EL

ower

0.70

00 ±

0.0

896

62

GO

UN

DE

RG

OU

TA

MIL

NA

DU

Sou

thD

ravi

dian

CA

ST

EU

pper

0.71

24 ±

0.0

650

63

IRU

LA

RIR

UT

AM

IL N

AD

US

outh

Dra

vidi

anT

RIB

E0.

7576

± 0

.122

16

4K

AN

YA

KU

BJ

BR

AH

MIN

KK

BU

TT

AR

PR

AD

ES

HN

ort

hIn

do-E

urop

ean

CA

ST

EU

pper

0.69

09 ±

0.1

276

65

UP

JA

TU

PJ

UT

TA

R P

RA

DE

SH

No

rth

Indo

-Eur

opea

nC

AS

TE

Upp

er0.

0000

± 0

.000

06

6U

P T

HA

KU

RU

PT

UT

TA

R P

RA

DE

SH

No

rth

Indo

-Eur

opea

nC

AS

TE

Upp

er0.

5357

± 0

.123

26

7K

HA

TR

IK

HT

UT

TA

R P

RA

DE

SH

No

rth

Indo

-Eur

opea

nC

AS

TE

Mid

dle

0.28

57 ±

0.1

964

68

BH

OK

SH

AB

KS

UT

TA

R P

RA

DE

SH

No

rth

Tib

eto-

Bur

man

/T

RIB

E0.

6222

± 0

.138

3In

do-E

urop

ean

69

UP

KU

RM

IU

PK

UT

TA

R P

RA

DE

SH

No

rth

Indo

-Eur

opea

nC

AS

TE

Low

er0.

0000

± 0

.000

07

0T

HA

RU

TH

RU

TT

AR

PR

AD

ES

HN

ort

hT

ibet

o-B

urm

an/

TR

IBE

0.80

00 ±

0.1

721

Indo

-Eur

opea

n7

1JA

UN

SA

RI

JUS

UT

TA

R P

RA

DE

SH

No

rth

Tib

eto-

Bur

man

/T

RIB

E0.

7333

±

0.1

552

Indo

-Eur

opea

n7

2M

AH

ISH

IYA

MSY

WE

ST

BE

NG

AL

Eas

tIn

do-E

urop

ean

CA

ST

EM

iddl

e0.

8684

±

0.0

489

73

NA

MA

SU

DR

AN

MS

WE

ST

BE

NG

AL

Eas

tIn

do-E

urop

ean

CA

ST

EL

ower

0.90

00 ±

0

.035

57

4B

AU

RI

BA

UW

ES

T B

EN

GA

LE

ast

Indo

-Eur

opea

nC

AS

TE

Low

er0.

6526

±

0.0

648

75

MA

HE

LI

MH

LW

ES

T B

EN

GA

LE

ast

Aus

tro-

Asi

atic

TR

IBE

0.82

11 ±

0

.058

67

6K

AR

MA

LI

KR

MW

ES

T B

EN

GA

LE

ast

Aus

tro-

Asi

atic

TR

IBE

0.29

24 ±

0

.127

47

7K

OR

AK

OR

WE

ST

BE

NG

AL

Eas

tIn

do-E

urop

ean

TR

IBE

0.75

79 ±

0

.049

57

8L

OD

HA

LO

DW

ES

T B

EN

GA

LE

ast

Aus

tro-

Asi

atic

TR

IBE

0.84

21 ±

0

.059

57

9E

ZH

AV

A H

IND

UE

ZH

KE

RA

LA

Sou

thD

ravi

dian

CA

ST

EL

ower

0.80

56 ±

0

.088

98

0N

AIR

NA

RK

ER

AL

AS

outh

Dra

vidi

anC

AS

TE

Upp

er0.

0000

±

0.0

000


Tab

le 2

a: C

omp

reh

ensi

ve h

aplo

grou

p f

req

uen

cy d

ata

amon

g li

ngu

isti

c, g

eogr

aph

ic a

nd

soc

ial

cate

gori

es o

f In

dia

S

ampl

e Si

zeC

DF

*G

H*

H1

H2

J2*

K*

K2

LL

1M

NO

*

TO

TA

L I

ND

IA1

15

20

.01

40

.00

40

.03

00

.00

10

.06

90

.15

90

.00

20

.05

10

.03

80

.03

10

.04

50

.01

00

.00

00

.00

00

.00

3L

an

gu

ag

eIN

DO

-EU

RO

PE

AN

51

80

.01

20

.00

40

.02

70

.00

20

.07

90

.18

30

.00

20

.05

80

.02

90

.03

30

.03

50

.00

20

.00

00

.00

00

.00

0D

RA

VID

IAN

39

30

.02

00

.00

00

.04

80

.00

00

.08

90.

209

0.0

03

0.0

56

0.0

41

0.0

43

0.0

84

0.0

28

0.0

00

0.0

00

0.0

00

AU

ST

RO

-AS

IAT

IC1

40

0.0

14

0.0

00

0.0

14

0.0

00

0.0

21

0.0

43

0.0

00

0.0

50

0.0

43

0.0

14

0.0

07

0.0

00

0.0

00

0.0

00

0.0

07

TIB

ET

O-B

UR

MA

N1

01

0.0

00

0.0

30

0.0

00

0.0

00

0.0

10

0.0

00

0.0

00

0.0

00

0.0

69

0.0

00

0.0

00

0.0

00

0.0

00

0.0

00

0.0

20

Geo

grap

hyN

OR

TH

18

00

.00

00

.00

60

.01

10

.00

60

.10

60

.13

90

.00

00

.07

80

.00

00

.00

00

.01

70

.00

00

.00

00

.00

00

.00

0W

ES

T1

35

0.0

37

0.0

00

0.0

07

0.0

00

0.0

81

0.35

60

.00

70

.08

10

.00

00

.00

00

.09

60

.00

00

.00

00

.00

00

.00

0E

AS

T3

57

0.0

11

0.0

00

0.0

48

0.0

00

0.0

56

0.1

06

0.0

00

0.0

36

0.0

53

0.0

48

0.0

20

0.0

03

0.0

00

0.0

00

0.0

03

NO

RT

H-E

AS

T1

08

0.0

00

0.0

37

0.0

00

0.0

00

0.0

09

0.0

00

0.0

00

0.0

00

0.0

83

0.0

00

0.0

00

0.0

00

0.0

00

0.0

00

0.0

19

SO

UT

H3

72

0.0

19

0.0

00

0.0

40

0.0

00

0.0

78

0.19

40

.00

30

.05

60

.04

30

.05

10

.07

80

.03

00

.00

00

.00

00

.00

0So

cial

Hie

rarc

hyU

PP

ER

CA

ST

E2

11

0.0

09

0.0

05

0.0

19

0.0

00

0.0

43

0.1

85

0.0

05

0.1

00

0.0

24

0.0

00

0.0

95

0.0

19

0.0

00

0.0

00

0.0

00

MID

DL

E C

AS

TE

17

50

.00

60

.00

00

.05

10

.00

00

.04

00

.17

10

.00

00

.09

70

.04

00

.01

70

.03

40

.02

30

.00

00

.00

00

.00

0L

OW

ER

CA

ST

E2

61

0.0

08

0.0

00

0.0

46

0.0

00

0.1

07

0.1

69

0.0

00

0.0

31

0.0

50

0.0

46

0.0

54

0.0

00

0.0

00

0.0

00

0.0

00

TR

IBE

S5

05

0.0

22

0.0

08

0.0

20

0.0

02

0.0

71

0.1

39

0.0

02

0.0

26

0.0

38

0.0

42

0.0

24

0.0

08

0.0

00

0.0

00

0.0

06

Sam

ple

Size

O2

aO

2a

1O

3O

3e

P*

R*

R1

R1

aR

1a

1R

1b

3R

2

TO

TA

L I

ND

IA1

15

20

.14

90

.00

10

.00

10

.02

60

.02

70

.01

00

.01

10

.00

20

.17

50

.00

50

.13

5L

an

gu

ag

eIN

DO

-EU

RO

PE

AN

51

80

.01

00

.00

00

.00

20

.00

60

.03

90

.02

10

.00

80

.00

40.

297

0.0

12

0.1

37

DR

AV

IDIA

N3

93

0.0

23

0.0

00

0.0

00

0.0

00

0.0

08

0.0

00

0.0

23

0.0

00

0.1

17

0.0

00

0.2

09

AU

ST

RO

-AS

IAT

IC1

40

0.72

90

.00

00

.00

00

.00

00

.03

60

.00

00

.00

00

.00

00

.00

70

.00

00

.01

4T

IBE

TO

-BU

RM

AN

10

10.

554

0.0

10

0.0

00

0.2

67

0.0

30

0.0

00

0.0

00

0.0

00

0.0

10

0.0

00

0.0

00

Geo

grap

hyN

OR

TH

18

00

.00

00

.00

00

.00

60

.01

70

.00

00

.01

10

.00

00

.00

60.

483

0.0

06

0.1

11

WE

ST

13

50

.00

00

.00

00

.00

00

.00

00

.03

00

.02

20

.00

00

.00

00

.19

30

.00

00

.08

9E

AS

T3

57

0.32

50

.00

00

.00

00

.00

00

.04

50

.01

40

.00

60

.00

30

.10

40

.00

00

.12

0N

OR

TH

-EA

ST

10

80.

519

0.0

09

0.0

00

0.2

50

0.0

46

0.0

00

0.0

09

0.0

00

0.0

19

0.0

00

0.0

00

SO

UT

H3

72

0.0

00

0.0

00

0.0

00

0.0

00

0.0

16

0.0

03

0.0

27

0.0

00

0.1

34

0.0

13

0.2

15

Soci

al H

iera

rchy

UP

PE

R C

AS

TE

21

10

.00

00

.00

00

.00

00

.00

00

.01

90

.01

40

.00

50

.00

50.

360

0.0

05

0.0

90

MID

DL

E C

AS

TE

17

50

.00

00

.00

00

.00

00

.00

00

.02

90

.01

10

.02

90

.00

00.

263

0.0

00

0.1

89

LO

WE

R C

AS

TE

26

10

.00

40

.00

00

.00

00

.00

40

.02

30

.00

40

.02

30

.00

00

.15

70

.00

00.

276

TR

IBE

S5

05

0.33

90

.00

20

.00

20

.05

70

.03

20

.01

00

.00

20

.00

20

.07

70

.01

00

.06

1

399 EARLIEST SETTLERS OF INDIAN SUBCONTINENTT

able

2b

: C

omp

arat

ive

hap

logr

oup

fre

qu

ency

dat

a in

soc

io-e

thn

ic g

rou

ps

of I

nd

ia

Tr

ibe

Dra

vidi

an C

aste

Indo

-Eur

opea

n C

aste

No.

of

popu

lati

ons

11

87

85

41

09

88

19

25

Sam

ple

size

.1

79

12

69

21

08

72

58

13

71

32

11

71

15

Tot

al (

in %

)C

1.1

5.6

1.9

0.7

1.5

0.9

0.9

0.4

1.1

D3

.30

.9F

*2

.23

.21

.92

.81

0.3

5.1

1.5

2.6

4.3

5.6

2.7

G0

.9H

*2

.81

2.7

13

.95

.65

.28

.83

.83

.41

3.0

7.1

6.6

H1

7.3

22

.22

6.9

31

.91

0.3

18

.21

2.1

20

.51

6.5

20

.21

6.2

H2

0.9

1.4

0.4

J2*

3.9

3.2

1.9

8.3

15

.52

.21

1.4

6.8

4.3

6.7

7.7

K*

3.4

7.1

4.3

4.2

1.7

2.2

5.1

6.1

2.6

3.6

K2

2.2

11

.12

.85

.21

0.4

1.1

3.3

L0

.64

.84

.61

5.3

5.2

9.5

6.8

2.6

0.9

10

.13

.6L

13

.24

.26

.90

.82

.60

.3O

*0

.62

.2O

2a5

7.0

7.1

59

.84

.6O

2a1

1.1

O3

0.9

O3e

28

.32

.8P

*5

.60

.84

.61

.70

.71

.53

.41

.70

.72

.2R

*1

.71

.92

.31

.70

.91

.6R

10

.95

.24

.41

.73

.40

.5R

1a0

.60

.80

.3R

1a1

0.6

11

.91

.12

0.4

15

.31

0.3

10

.24

8.5

34

.22

3.5

11

.63

6.0

R1b

34

.60

.80

.3R

21

0.6

7.1

2.8

11

.12

2.4

38

.08

.31

7.1

17

.42

7.3

14

.0

AA

DR

TB

IEU

pper

Mid

dle

Low

erU

pper

Mid

dle

Low

erC

aste

_DR

Cas

te _

IE

PF

HO

3O

2a

LK

K2

R1

a1

CJ2

R2

No

of

Mal

es2

92

42

21

26

16

25

53

63

61

91

16

52

13

6N

o of

Hap

loty

pes

27

24

20

62

41

44

53

36

36

18

81

65

11

31

Lin

eage

div

ersi

ty0.

995±

1.00

0 ±

0.99

9 ±

0.99

3 ±

0.99

7 ±

0.99

8 ±

1.00

0 ±

1.00

0 ±

0.99

9 ±

1.00

0 ±

0.99

9 ±

0.99

9±0

.01

00

.01

20

.00

00

.01

20

.00

10

.00

30

.00

60

.00

60

.00

00

.02

20

.00

40

.00

1M

PD

14.1

9 ±

13

.91±

12.6

6±7.

37 ±

11.5

5±13

.13

±12

.93

±14

.18

±12

.05±

13.4

5±14

.04

±13

.04±

6.5

56

.47

5.7

33

.56

5.2

66

.00

5.9

66

.51

5.4

76

.38

6.4

05

.90

Av.

gen

e di

vers

ity

0.70

9 ±

0.69

5 ±

0.63

3±0.

368

±0.

577±

0.69

1 ±

0.71

8 ±

0.70

9 ±

0.60

2 ±

0.70

7 ±

0.70

2 ±

0.65

2±0

.36

40

.36

00

.31

70

.19

80

.29

10

.35

00

.36

80

.36

10

.30

20

.37

60

.35

50

.32

7H

aplo

type

sha

ring

Wit

hin

2N

on

10

21

02

No

nN

on

3N

on

12

Bet

wee

nN

on

No

n2

No

nN

on

No

nN

on

No

nN

on

No

nN

on

1

Tab

le 3

: Y

-Ch

rom

osom

e m

icro

sate

llit

e d

iver

sity

wit

hin

th

e tw

elve

maj

or l

inea

ges

fou

nd

in

In

dia


also showed a decreasing geographic cline alongthe latitude; its frequency was highest (approx.50%) among the populations of Bihar and UttarPradesh, where almost 60% of the Upper andMiddle caste groups harbored R1a1 Y-chromo-somes. Out of the 191 males that carried R1a1haplogroup, 188 unique 20-YSTR haplotypes wereobserved (h=0.999). While no haplotype wasshared between populations, intra-populationvariation was observed within Jat, Bhoksha andYerukula and the mean pairwise differencebetween all the Y-STRs was found to be high(12.05) (Table 3). The median–joining networkanalysis, however, revealed that populations ofneighbouring area shared few of the haplotypes(Fig 2b). (Passarino et al., 2002) reported tworegion specific allele pattern associated withinM17 among Europeans; DYS19=15 and YCAIIa,b=19,21 was specific to the R1a1s in WesternEurope, while Eastern European R1a1s typicallyharbored allele16 for DYS19 and 19,23 for YCAIIa,b. In our dataset, although, allele 15 and 16 atDYS19 were the two most common alleles withsignificant difference in their frequency (χ2= 4.66,p<0.05), they did not reveal any specificgeographical, socio-ethnic or linguistic patternin their distribution. The TMRCA of thoseindividuals harboring R1a1 Y-chromosome isestimated around ~32 KYA (Table 4).

Haplogroup O2a-M95

Haplogroup M95, which forms the majorSouth-East Asian male lineage, (Su et al. 2000,Karafet et al 2001) accounts for 15% of the Y-chromosome variation in India. It is however,localized to the eastern part of the subcontinent,restricted among the Austro-Asiatic speakers(72.9%) and Tibeto-Burman speaking tribes(56.4%) of NE India (Table 2a). Although thishaplogroup was also detected in Indo-Europeanand Dravidian speakers (3.3% in total), itspresence in them could be sufficiently attributedto admixture from Austro-Asiatic speakingneighbors living in close vicinity. While thislineage is completely fixed in Juang, Ho andSanthal, it is observed that the frequency inTibeto-Burman tribes varied 25% in Kuki to 80%in Hmar. Surprisingly however, none of theHimalayish branch of Tibeto-Burman speakers;Nepali, Bhutia, Tharu, Jaunsari and Bhokshaharbor this haplogroup (Fig 1). Although the Y-chromosomes were rather similar (FST =0.03,

p<0.05), none of the Y-STRs were shared betweengroups (Table 3). A comparison of Indian M-95 Y-STR haplotypes with populations of SE Asiaincluding Java, Borneo, Taiwan and Malayrevealed that the Austro-Asiatic speakers ofIndian subcontinent showed closer affinity to theSE Asians than their Tibeto-Burman speakingneighbors (F

CT =0.43 vs 4.15, respectively) (data

not shown). To further investigate therelationships between O2a Y-chromosome in theAustro-Asiatic and Tibeto-Burman speakers, amedian-joining network of 27 discrete haplotypesof 151 individuals was constructed (Fig 2c). Thisnetwork exhibited two distinct clusters ofhaplotypes with considerable haplotype sharingbetween the two linguistic families; however theAustro-Asiatic speakers depicted more diversehaplotypes compared to the Tibeto-Burmans. TheTMRCA of all M95T chromosomes was estimatedto be ~35,795 years (Table 4).

Haplogroup R2-M124

Our analysis revealed that haplogroup R2characterizes 13.5% of the Indian Y-chromo-somes and its frequency among Dravidianspeakers was comparable to that of haplogroupH (20.9%) and significantly different from Indo-European and Austro-Asiatic speakers (χ2= 16.2,d=3, p<0.05). While the distribution acrossvarious geographic regions was almost uniform,significant differentiation was observed along thesocial groups (χ2= 18.7, d=3, p<0.05); a decreasinggradient was discernible as one moved up thecaste hierarchy (Table 2a). Although tribescontributed only 7.4% of the total R2 lineage, itwas proportionately distributed between theAustro-Asiatic and Dravidian tribes (Table 2b).Extensive analysis of its distribution betweennorth and south Indian populations showed thatwhile there was marginal difference among middleand lower caste groups of north India (17.1 and17.4 % respectively), a clear gradient was observedamong south Indians, where the frequencydeclined by more than one-half from lower to uppercaste groups. Analysis of 20-Y-STRs within theR2 lineage revealed that three haplotypes wereshared; one between Kamma Chaudhary andKappu Naidu, both lower caste Dravidian speakersfrom Andhra Pradesh and two within Karmali andPallar populations. Network analysis (Fig 2d)depicted that a large number haplotypes wereshared between populations of south India, while


Locus -wise variance R1a1 H C O2a R2

DYS19 0.549 0.497 1.183 0.293 0.770DYS3891 0.793 0.760 0.783 0.383 0.649DYS3892 0.960 0.872 1.400 0.620 1.211DYS390 1.173 2.246 1.983 2.314 1.375DYS 391 1.081 2.533 1.762 0.791 0.966DYS392 1.009 0.708 1.662 1.620 1.749DYS393 0.710 0.921 0.917 0.995 1.051Average 0.896 1.220 1.384 1.002 1.110Age estimates in years 32,015.31 43,556.12 49,438.78 35,795.92 39,647.96

Table 4: Y-Chromosome haplogroup variances and TMRCA estimated on seven Y-STR loci

the populations of eastern India harbored morediscrete Y-STR haplotypes. The TMRCA forM124T was estimated to be ~39,647 years (Table 4).

Haplogroup L-M11

The overall frequency of haplogroup L-M11in the Indian populations was estimated to be5.6%, while sporadic occurrence of this lineagehas earlier been described among Indo-Europeanspeakers of Caucasus, Middle East, Europe anda maximum of 4.3% in Central Asia (Semino et al.,2000; Wells et al., 2001). Dravidian speakingpopulations harbored a significantly higherpercentage of L haplogroup compared to the Indo-European speakers, 11.2 and 3.7% respectively(χ2= 3.77, d=1, p=0.05). While frequencies wererather comparable in the lower caste groups ofnorth and south, middle and upper castepopulations of south India demonstratedrelatively higher frequencies than northern castegroups (Table 2a). The L-network and high MPD(13.13) revealed results in congruence withAMOVA suggesting that no clear geographic,linguistic or social pattern could be discernedamong the Y-STR haplotypes (data not shown).

Haplogroup J2-M172

Haplogroup J2-M172 is the major lineage ofMiddle East/Mediterranean and its frequencydecreases into Europe. Among the studied Indianpopulations, M172G exhibited a total frequencyof 5.1%, where it was uniformly distributed amongthe three major linguistic families. Except for theTibeto-Burman speaking tribes of northeast India,where this lineage was totally absent, no specificcline could be deciphered among the otherseventy-three mainland populations. In the socialcategories, upper and middle caste populationsharbor a significantly higher percentage of J2

lineage (~ 10%) as compared to ~3% in lower castepopulations and tribal groups (χ2= 7.74, d=3,p=0.05). While the distribution was proportionateamong upper caste groups of north and southIndia, the difference was more discrete (6.8% vs15.5%) among middle caste groups of theseregions (Table 2b). The Near Eastern populationsharboring M172 Y-chromosomes are characteri-zed by a very high frequency of DYS388 alleleswith e”15 repeats, while more than 70% of themales examined under this study displayed alleleswith repeat motifs d”14. Network analysis showeda large number of divergent haplotypes even with7-YSTRs and only two reticulations. Lodha, theonly Austro-Asiatic tribe that harbored J2 lineagealso displayed very diverse haplotype profiles.The genetic structure estimated with AMOVAshowed that although the extent of Y-STR diffe-rentiation in the populations carrying this lineagewas approximately 3%, geography, language orposition in the social hierarchy could notstatistically delineate the studied Indian popu-lations. Because this marker is associated withthe spread of agriculture, we further estimatedvariance in the agricultural groups and observeda marginal difference of 1% in Y-chromosomes oflandowner and labourer communities harboringthe J2 lineage (data not shown).

Haplogroup O3-M122 and O3e-M134

The M122 haplogroup and its sub-lineageM134 are among the predominant and widespreadlineages of East Asia (Su et al., 2000; Karafetet al., 2001; Shi et al., 2005). In the Indian males, itwas detectable at frequencies less than 3% andwas largely restricted among the Tibeto-Burmanspeakers of North-East. It was sporadicallypresent among tribal groups of north India,particularly Tharu (Fig 1), probably due to recentadmixture with neighboring Tibeto-Burman


speakers of Nepal and China. A clear delineationalong the language family was observed in itsdistribution; where it was completely absent amongthe Tani speakers (Adi Pasi), while the Naga-Kuki-Chin branch of Tibeto-Burman speakerscontributed the entire 26.7% of O3 lineage. Themean pairwise difference between Y-STRs and thelineage diversity was low at 7.37 and 0.993,respectively, compared to its sister clade O2a.

Haplogroup K2-M70

Haplogroup K2 occurs on a M9G backgroundand is reported to occur in populations of NearEast and Europe (Underhill et al. 2000). In ourstudy, it was found only in the eastern andsouthern regions of the country, adding to anoverall frequency of 3.1%. Although it was presentin the three major linguistic families, the statistical

difference in its distribution was insignificant.However, its distribution depicted an inverserelation as one moved up the social ladder, withthe upper caste populations completely lackingthe M70 lineage in their Y-chromosomes (Table2a). This lineage was predominant amongst thelower caste groups of east (Bauri) and tribalgroups of south, particularly Yerukula, contri-buting approx. 60% of the total K2 chromosomes.Within the lineage, 36 distinct Y-STR haplotypes,with a very high mean pairwise difference (14.18)between them, depicted the absence of populationstructure due to language, geography or ethnicity.

Haplogroup C- M130 (RPS4Y711)

The RPS4Y711T forms the second major clusterin Asia and Australo-Melanesia and hasreportedly spread into North America (Wells

Fig. 1. Y-Chromosome haplogroups and their frequency distribution in different regionalpopulations of India

Indian Ocean

Arabian Sea


Fig. 2a. Median-Joining network of H haplogroup individuals, based on seven Y-STR haplotypes.Circles represent haplotypes and have an area proportional to frequency. Colour represents the fourgeographic regions of India (Red: South; Blue: North; Green: West; Yellow: East)

Fig. 2b. Median-Joining network of R1a1 haplogroup individuals, based on seven Y-STR haplotypes.Circles represent haplotypes and have an area proportional to frequency. Colour represents the fourgeographic regions of India (Red: South; Blue: North; Green: West; Yellow: East)


Fig. 2c. Median-Joining network of O2a haplogroup individuals, based on seven Y-STR haplotypes.Circles represent haplotypes and have an area proportional to frequency. Colour represents the Austro-Asiatic (Black) and Tibeto-Burman (White) linguistic families of India

Fig. 2d. Median-Joining network of R2 haplogroup individuals, based on seven Y-STR haplotypes. Circlesrepresent haplotypes and have an area proportional to frequency. Colour represents the four geographicregions of India (Red: South; Blue: North; Green: West; Yellow: East)


No. of Groups % FST* % FSC* % FCT*

Table 5: Genetic Differentiation in Indians at different levels of hierarchy based on Y-SNP Data

Total 72.89 0.271 27.11Geography 5 71.22 0.287 19.09 0.211 9.69 0.096Regional 14 71.95 0.28 13.48 0.157 14.57 0.145Language 4 69.16 0.308 15.53 0.183 15.31 0.153

a 4 69.88 0.301 17.17 0.197 12.96 0.129Social CS vs TR$ 2 69.79 0.302 21.73 0.237 8.48 0.084

b 4 71.49 0.285 21.89 0.234 6.63 0.066c 4 82.29 0.177 16.21 0.164 1.51 0.015

Castes UP vs MD vs LW# 3 83.73 0.162 14.65 0.148 1.61 0.016

*All values are statistically significant at p<0.05$ CS: Castes: TR: Tribes# UP: Upper castes; MD: Middle castes; LW: Lower Castesa: Includes Karmali and Maheli under Austro-Asiatic; Tharu under Tibeto-Burman language familyb: Includes Upper, Middle, Lower Castes and Tribesc: Includes Upper, Middle, Lower Castes and excludes Austro-Asiatic and Tibeto-Burman Tribes

Within Population Among Population Among GroupsWithin Groups

et al., 2001; Underhill et al., 2000; Karafet et al.,1999; Underhill et al., 2001). In the present study,this lineage was found spread all along the coastalbelt in populations of Maharastra, Tamilnadu,Andhra Pradesh, Orissa and West Bengal, at anaverage frequency of 1.4%, and noticeably absentin the populations of North and North-East.Although this lineage was present in high fre-quency in tribes compared to caste groups, itsdistribution in them was not statistically significant(χ2= 2.25, d=1, p>0.05). We observed 16 discrete Y-STR haplotypes, with mean pairwise differencebetween haplotypes of 13.45 (Table 3). Althoughthe total number of individuals carrying RPS4Y

711T

was too small (n=16) to make accurate evolutionaryinferences about its origin within South Asia, theTMRCA of Indian RPS4Y

711T individuals was

estimated to be ~ 49,438 years (Table 4).

Other Haplogroups Observed in Indians

Haplogroup F, which is major and the mostparaphyletic subcluster of M168 lineages wasubiquitous in its distribution along geographic,linguistic and socio-ethnic boundaries of Indiaand was observed in approximately 3% of thestudied males (Table 2a). Haplogroup D, amonophyletic branch of M168 lineage, definedby an Alu insertion and M174C mutation, on theother hand, was restricted in Bhutia and Tharutribal groups. Its presence among them is mostlikely due to gene flow from Tibet, where thishaplogroup has earlier been reported. Majorhaplogroups K*, P*, R*, R1, R1a contributeapproximately 2-3% of the total Indian Y-

chromosomes and there was no difference in itsdistribution pattern among castes or tribes oramong different geographic regions. Although, wedetected a few European–specific haplogroups Gand R1b3 in Indians, none of our studied samplesshowed the presence of haplogroup K3-M147, N-M231 or I-M170, which are the other highlypredominant haplogroups of Europe.

Genetic Structure of the Indian Populations

Analysis of molecular variance revealed thatthe extent of genetic differentiation was highamong Indians; percent variation among differentgroups added up to 27.11%, suggesting that genepool of India males was highly structured. Toidentify factor/(s) responsible for this compart-mentalization of Y-chromosomes, populationgenetic structure was analyzed on haplogroupfrequency data in a hierarchical mode: withinpopulations, among populations and amonggroups of populations, pooled according togeography, linguistic family and their socio-ethnicposition (Table 5). The amount of genetic variationamong five major geographical regions was lesserthan the percent due to variation amongpopulations within regions (Öct= 0.096 and Ösc=0.211, respectively). Further analysis revealed thatalmost 14.57% of among group variation was dueto regional boundaries which also defined theirlinguistic affinities (language sub-families). Theincrease in “among group variation” and “amongpopulations within groups” was non-significantwhen only four major linguistic families were usedas a criterion for grouping Indian populations.


High Fct and Fsc values suggest that althoughsignificant structuring occurs within the popu-lations of India, they could not be partitioned eithergeographically or linguistically. Apportion-ment ofpopulations into two broad socio-ethnic group;caste and tribes depicted only 8% differencebetween them, which further decreased to 6.63%when the caste populations were further resolvedinto upper, middle and lower groups. Amongthemselves, the caste populations were not verydifferent, harboring only 1.6% variation among them.

Genetic Relationships among the Indians andwith World Populations

To investigate the extent of genetic relation-ships among India populations, pairwise F

STdistances were estimated on the Y-haplogroupfrequencies. On the whole, populations clusteredaccording to their Y-chromosome lineages. Twodistinct clusters of Indo-European and Dravidianspeakers were discernible in the NJ tree on 87Indian populations, where except for a fewdeviations most of the populations clusteredwithin their linguistic family. Austro-Asiatic andTibeto-Burman speakers harboring O2a Y-chromosome lineage described a separate cluster,while Tharu grouped with Mara, Lai and Kukicarrying O3e lineage, and formed a branch distantfrom other Tibeto-Burman tribes (Fig. 3). MDSplot also substantiated the genetic proximity ofAustro-Asiatic and Tibeto-Burman speakers tothe populations of South East Asia. Most of theother Indian populations were closer to the Indo-European speakers of Central Asia and EasternEurope (Russia and Siberia) but distant frompopulations of Western Europe, while popu-lations of Middle East and Caucasus regionformed a separate cluster in the MDS plot. Popu-lations of Uttar Pradesh, Bihar and Punjab weremoderately distant from other Indo-Europeanspeakers, while those of Pakistan remained bet-ween Indians, Central Asia and Russia (Fig. 4).

DISCUSSION

This comprehensive study of Y-chromosomediversity within India aims to identify evolu-tionary events (founder effects, gene flow andgenetic drift) and factors (geographical, linguisticand cultural barriers) that might have produced ahigh degree (27%) of genetic differentiation amongthe Indian patrilines. Here we also evaluate some

of the suggested theories of occupation of Indiansubcontinent by modern humans and populationhistories, in the light of current molecular geneticevidences.

Phylogeography of Indian Y-Chromosomes

India is a relict area, which is likely to haveserved as an incubator during the early dispersalof modern humans out of East Africa (Quintana-Murci et al., 1999; Cann, 2001) and a treasure-house of ancient population genetic signaturesin its gene pool. This is reflected in the 24different haplogroups which were observed inthe present Y-chromosome analysis of 1434Indian males. Overall haplogroup diversity amongIndian populations was relatively high (0.893) incontrast to other European or East Asianpopulations, but was closer to that of Central Asia.This pattern of high NRY diversity (Y-SNP and Y-STR) indicates an early settlement of the Indiansubcontinent by anatomically modern humans.Four haplogroups; H= 23%; R1a1=17.5%;O2a=15% and R2=13.5%, form major paternallineage of Indians and together account for ~70%of their Y-chromosomes. Being largely restrictedto the Indian subcontinent, haplogroup H isassumed to be associated with the eastwardexpansion of M89 Y-chromosomes from theLeventine corridor, which also carried the two latePleistocene mt DNA haplogroups, U2 and U7 intoIndia31. Although the M69 Y-chromosomes areparticularly predominant among the Dravidianspeakers of south India, its fairly uniformdistribution across different regions and socio-ethnic groups of India suggests deep time depthfor these lineage clusters.

Based on the predominance of M17 lineageamong diverse linguistic families (Indo-European,Altaic, Uralic and Caucasian) and geographicregions (Central Asia, Europe, Caucasus, MiddleEast), (Wells et al., 2001; Underhill et al., 2000;Karafet et al., 1999; Rosser et al., 2000; Nebelet al., 2000) it has been associated with the Kurganculture, domestication of horses and spread ofIndo-European languages, all which supposedlyoriginated in southern Russia/Ukraine and subse-quently extended to Europe, Central Asia around3000 B.C (Wells et al., 2001). Its presence in Indiahas been linked to the “Aryan” migration andsubsequent spread of Indo-European languages,appearance of iron and Painted Grey Ware culturein North West frontier (Cavalli-Sforza et al., 1994).


Fig. 3: Genetic relationship among populations of India based on FST distancesestimated on Y-Haplogroup frequencies


However, antiquity and geographic origin of thislineage still remains contentious. Our studyreveals that this lineage is present in a significantproportion among the Indian Indo-Europeanspeakers, and is proportionately high amongupper caste groups (Table 2a). The dispersion ofthis lineage into the southern tribal groups(Kivisild et al., 2003, Cordaux et al., 2004) and thefact that it is proportionately distributed betweenthe Dravidian and Indo-European tribal groupsprovides significant evidence against any majorinflux of Indo-European speakers that could havedrastically changed the Indian male gene pool(Sahoo et al., 2006; Sengupta et al., 2006). Thehigh average STR variance (0.896) and TMRCAsupports a rapid population growth and expan-sion of M17 Y-chromosomes, which contributedM17 lineages both to Central Asian nomads andSouth Asian tribes much before the Indo-European introgression into India. Another sub-lineage of M173, R1b3-M269 is present at appre-ciable frequencies 14.5% in Turkey (Cinnioglu etal. 2004) and at considerable frequency in Europe(Cruciani et al., 2002), while it is detected at rela-

tively low frequency (1.9%) in India, substantia-ting a recent and limited admixture with westEuropeans.

The observed high frequency of R2 Y-chromo-somes in Indians, which is equivalent to that ofhaplogroup H among Dravidian speakers, corro-borates previous reports suggesting its Indianorigin (Cordaux et al., 2004). The deep coalescencetime for R2 lineages, dating back to Late Pleisto-cene, supports its indigenous origin. OutsideIndia, it is found in Iran and Central Asia (3.3%)and among Roma Gypsies of Europe, known tohave historical evidence of their migration fromIndia (Wells et al., 2001). Within India, while it ispredominant in both eastern and southernregions, its distribution pattern is rather patchyin east (Sahoo et al., 2006). It is most likely thatgenetic drift or bottleneck has reduced thepaternal diversity of Karmali, which contributes28% of the eastern R2 lineages. This populationalthough considered to be Austro-Asiatic speak-er, does not present any evidence of O2a Y-chromosome lineage, portraying a distinctlydifferent history.

Fig. 4. Genetic relationship between populations of India and world estimated fromY-Chromosome haplogroup frequencies represented in MDS plot


On an average, the patterns of NRY haplo-group variation of Indians reflect that populationsof the subcontinent are not very distinct fromeach and probably have share a few commonpaternal ancestors. The lineage diversity wassmall for Austro-Asiatic and Tibeto-Burmanspeakers and most of them harbored singlelineage, indicating a founder paternal source forthese endogamous groups, which are confinedto the eastern and north-eastern regions of India.Haplogroup diversities were rather high forpopulations of south India (average of 0.740),giving concordant evidences of a relative earlysettlement, growth and expansion of populationsliving in southern India.

Traces of Ancient Migration of Modern Humans

Recent studies provide substantial evidencesin favor of the southern route hypothesis for thedispersal of modern human ~ 60-75 kya from thehorn of Africa along the tropical coast of IndianOcean to reach insular South East Asia andOceania (Cann, 2001; Stringer, 2000). A strong Y-chromosome support to this model is thedistribution of haplogroup C lineages in Asia(Kivisild et al., 2003), Australo-Melanesia andNorth America (Karafet et al., 1999). Althoughpresent in low frequencies in Indian subcontinent,(Bamshad et al., 2001; Sengupta et al., 2006;Kivisild et al., 2003; Cordaux, 2004; Wells, 2006;Ramana et al., 2001) it is largely distributed alongthe coastal regions, with a few patchy occu-rrences in Punjab. However, the persistence ofM130 lineages mostly among the south Indians,Pakistan (Qamar et al., 2002) and Sri Lanka(Kivisild et al., 2003) provides indirect evidencein support of the southern route of migration byearly modern humans. We have previouslysuggested that lack of haplogroup C sub-lineages(M217, M38 and M8) is indicative of theindigenous origin of most Indian populations andargues against the theory of Aryan migration fromCentral Asia (Sahoo et al., 2006). The presentanalysis showed none of the deletions (DYS390.1or DYS 390.3) associated with Australian orPolynesian C* chromosomes, contesting theclaims of link between India and Australianaboriginals (Redd et al., 2002). The age estimateof approximately 49KYA years in Indian samplesindicates a probable Indian origin of this lineage.However, until further analysis and age estimatesin other world populations are known, it cannot

be conclusively proven if RPS4YT mutation arosein India or arrived with the earliest migrants afterit arose somewhere in west Asia, from where itwas finally lost or diluted during the Upper Paleo-lithic expansion of modern humans (Underhillet al., 2001).

Genesis of Caste Structure and Influence ofMigrations on the Indian Gene Pool

The main feature of Indian society is that it ishighly structured by social factors such as castesystem, in which birth determines the position inthe society, mode of subsistence (occupation),and choice of marriage partners. However, genesisof caste system in India is ambiguous, since manyof the caste groups are known to have tribalorigins (Kosambi, 1964). Further the migration ofGreeks, Huns, Arabs, Chinese, Turks, Persians,Portuguese and others have made understandingthe nature of population structure more complex.mt DNA analysis from different geographic regionand social status showed that maternal haplo-groups in India are derived from a limited numberof founder lineages of M and N clades supportinga common proto-Asian ancestry with limited geneflow from later migrants (Kivisild et al., 2003; Basuet al., 2003). Our study reveals that there is virtuallyno genetic difference in the Y-chromosomesbetween the caste groups and tribes (Table 5).Whatever minor difference is present is largely dueto haplogroup O2a, contributed exclusively by theAustro-Asiatic and Tibeto-Burman tribes. Ourpresent analysis (AMOVA and haplogroupfrequency distribution in populations excludingthe Austro-Asiatic tribes of Jharkhand and Orissaand Tibeto-Burman tribes from Northeast)provides congruent evidence in support to thehypothesis that populations in India largely derivetheir gene pool from the common Pleistocenesettlers. High frequency of J2 and R1a1 lineagesmirror a greater influence of Indo-Europeanmigrants on upper caste populations of Gangeticplains compared to the peninsular southernregions. However, these skewed frequencies alsosuggest that the indigenous populations receivedlimited external gene flow Europe, Central and WestAsia. This is also supported by mt DNA haplo-groups that depict Indian-specific lineages with alimited contribution from both west and eastEurasian populations (Metspalu et al., 2004). Asimilar trend of J2 and R1a1 among castepopulations would probably provide a simplistic


assumption that agriculture was brought along withcaste system by the Indo-European speakers as aresult of demic diffusion of early farmers fromsouthwestern Iran, Fertile Cresent and Anatolia(Quintana-Murci et al., 2001; Cordaux et al., 2004).However, the absence of other Neolithic markersof early farmers, M35 and M201; that are prevalentin Europe, Anatolia, South Caucasus and Iran(Semino et al., 2000; Underhill et al., 2001) amongIndians, in addition to the frequency of M172 insouthern and western India and its persistence insouth India and tribal groups (Table 2a) questionsthe validity of this hypothesis. Agriculture in Indiaprobably arose as two independent events; onethat was a consequence of earliest migration thatbrought the Dravidian speakers and another muchlater through spread of rice cultivators from SEAsia (Fuller, 2003; Diamond et al., 2003).

Insights into Origin of Austro-Asiatic andTibeto-Burman Speakers

The origin of two language families, Austro-Asiatic and Tibeto-Burman in India is of particularinterest and has received considerable attention(Basu et al., 2003; Cordaux et al., 2004). In thepresent study, analysis of eleven Austro-Asiaticand seven Tibeto-Burman tribes from the easternand north-eastern region of India, establishes thatthe male gene pool of these groups are distinctlydifferent from other mainland tribal populations(Table 2a). An overall low Y-STR haplotypediversity and complete fixation of O2a in some ofthe aboriginal tribes (Ho, Santhal, Juang, Birhorand Munda) suggests that these tribes probablyexperienced a major demographic event, such ascommon founder effect followed by a bottleneckthat greatly reduced the Y-chromosome diversityin the Austro-Asiatic tribes of eastern India (Table1). In contrast to Austro-Asiatic tribes, the Tibeto-Burman tribes harbor both O2a and O3e lineagesin their Y-chromosomes. Interestingly, the twobranches of the Tibeto-Burman language;Himalayish and Naga-Kuki-Chin, could bedistinctly identified from their Y-chromosomes.The former depicts influence of Tibetan gene pool,marked by the presence of Haplogroup D lineagesin Bhutia and Tharu (Sahoo et al., 2006), while theother linguistic branch harbors O3e lineages. Thepredominance of O haplogroup and its sub-lineages in populations of East Asia suggest aSE Asian origin of Indian Austro-Asiatic andTibeto-Burman speakers. We hypothesize that

the Tibeto-Burman speakers came as a number ofmigratory events, while the Austro-Asiatic tribesprobably arrived in India as a single event. Thetwo groups probably migrated into India atdifferent time period is evident from the absenceof O3e lineages among Austro-Asiatic speakers,which probably are the earliest immigrants of thetwo. Presence of an Austro-Asiatic speaking tribe,Khasi, among the Tibeto-Burman speaking neigh-bors in the northeast corroborates this assump-tion. While the Tibeto-Burman speakers broughtin a number of East Asian maternal lineages (A,B5b, F1b, M8c, M8z) (Metspalu et al., 2004),absence of these lineages in Austro-Asiatic tribes(Thangaraj et al., 2005; Sahoo, 2006a) portraystwo different scenarios. First, the earlier exodusfrom South East Asia was probably a major male–mediated migration into India, or that the femalegene pool of the migrating East Asians is comple-tely lost among the Austro-Asiatic tribes. Addi-tional confirmation to this hypothesis is providedwith evidences of agricultural expansions fromtheir homelands in China, at different times andover different geographic ranges. Austro-Asiaticsare presumed to have spread west and south fromsouthern China into the Indian subcontinent andMalay Peninsula and brought rice cultivation withthem (Higham, 2003; Bellwood, 2004). The geneticevidence revealed in this study is consistent withanthropological records (Guha, 1935), whichsuggests that Sino-Tibetans dispersed from theYellow River and came into India through twodifferent routes; one from Burma probablybrought the Naga-Kuki-Chin language and O3eY-chromosomes and the other from Himalayas,which carried the YAP lineages into northernregions of subcontinent.

Age of Human Occupation in India- Austro-Asiatic or Dravidians as First Settlers?

Based on socio-cultural and linguisticevidences (Thapar, 1995; Pattanayak, 1998) andresults based on mt DNA HVSI nucleotidediversity and highest frequencies of mito-chondrial M haplogroup (Roychoudhary et al.,2001; Basu et al., 2003), it was asserted thatAustro-Asiatic tribes are the earliest settlers inIndia. The present comprehensive Y-chromosomeanalysis, which includes populations of alllinguistic and socio-ethnic affiliations, however,suggests people of south India as the originalsettlers of the subcontinent. The total lineage


diversity and distribution of Indian-specific Y-chromosome haplogroups (H, L, C, R1a1 andR2) in different geographical and socio-linguistic layers of the Indian populationsprovides substantial support in favor of thishypothesis. This theory also gathers adequateevidence from presence of the coastal marker,RPS4Y, in the south Indian tribes, who probablyrepresent remnants of the modern humanmigration out of Africa that took the southernroute to Australia. Any possibility that Austro-Asiatic speakers could have dispersed fromIndia is also eliminated based on the differentialdistribution of O2a Y-chromosomes in southernChina and India and the complete absence ofEast-Asian specific mt DNA lineages in Austro-Asiatic and Dravidian speakers of India. mt DNAhaplogroups of Indian Austro-Asiatic speakersare instead, probably a sub-group of theirDravidian neighbors (unpublished data,Kashyap et al). Recent archeo-logical andlinguistic evidences corroborate a Neolithicexpansion of Austro-Asiatic languages fromYangtze River basin (Higham, 2003) and ourpresent study supports an east-west clinalexpansion of Austro-Asiatic males from SouthEast Asia, which was not associated with anyfemale gene flow. Further, deeper coalescenceage for the Y-chromosome haplogroups C, H,R2 compared to O2a is consistent withhypothesis that Austro-Asiatic speakers cannotbe consi-dered as the earliest settlers of SouthAsia.

CONCLUSIONS

We find that genetic variation in India ischaracterized by a high Y-chromosomediversity, which is reflected by a greatercorrespondence with linguistic groups of India.Our results demonstrate India as a hotspot bothas an important source and recipient of major Y-chromosome lineages of the world. Haplogroupdistribution and AMOVA results provide tandemevidence in support a common Pleistoceneorigin of Indian populations, which wassubsequently followed by migrations of Austro-Asiatic speaking tribal males from SE Asia. TheTibeto-Burman populations were later migrantswho took two different routes and carried bothmale and female lineages specific to East Asia.Based on deep coalescence age estimates of H,R2 and C Y-chromosome lineages, their diversity

and distribution pattern, our data suggests anearly Pleistocene settlement of South Asia byDravidian speaking south Indian populations; theAustro-Asiatic speakers migrated much later fromSE Asia and probably contributed only paternallineages while amalgamating with the aboriginalpopulations of the region.

ACKNOWLEDGEMENTS

We express our appreciation to all the originaldonors who made this study possible. This studywas made possible through facilities provided atCFSL, Kolkata. We acknowledge all researcherswhose valuable data was used for this study. TheSS, AS, JB, MT, SG, RR, RA are grateful to theDirectorate of Forensic Sciences, MHA for theSenior Research Fellowship. GHB and TS arerecipients of Senior Research Fellowship fromCSIR, India. This research was supported by afinancial grant to CFSL, Kolkata under the Xth FiveYear Plan of the Govt. of India.

Electronic –Database InformationURLs for the data mentioned in this article are

as follows:XL STAT pro 7.5, http://www.xlstat.comNetwork 4.1, http://www.fluxus-engineering. comhttp://www.ethnologue.com

REFERENCES

Bamshad, M., Kivisild, T., Watkins, W.S., Dixon, M.E.,Ricker, C.E., Rao, B.B., Naidu, J.M., Prasad, B.V.,Reddy, P.G., Rasanayagam, A., Papiha, S.S., Villems,R., Redd, A.J., Hammer, M.F., Nguyen, S.V., Carroll,M.L., Batzer, M.A. and Jorde, L.B.: Genetic evidenceon the origins of Indian caste populations. GenomeRes., 11: 994-1004 (2001).

Bandelt, H.J., Forster, P. and Rohl, A.: Median-joiningnetworks for inferring intraspecific phylogenies. Mol.Biol. Evol., 16: 37-48 (1999)

Basu, A., Mukherjee, N., Roy, S., Sengupta, S., Banerjee,S., Chakraborty, M., Dey, B., Roy, M., Roy, B.,Bhattacharyya, N.P., Roychoudhury, S., Majumder,P.P.: Ethnic India: A genomic view, with specialreference to peopling and structure. Genome Res.,13: 2277-2290 (2003).

Bellwood P (2004) Tracking the spreads of farming beyondthe fertile Crescent: Europe and Asia. In: FirstFarmers: The Origins of Agricultural Societies. pp87 (2003).

Butler, J.M., Schoske, R., Vallone, P.M., Kline, M.C., Redd,A.J. and Hammer, M.F.: A novel multiplex forsimultaneous amplification of 20 Y chromosome STRmarkers. Forensic Sci. Int., 129: 10-24 (1989) (2002).

Cann, R.L.: Genetic clues to the dispersal of humanpopulations: Retracing the past from the present.Science, 291: 1742-1748 (2001).


Cavalli-Sforza, L.L., Menozzi, P. and Piazza, A.: TheHistory and Geography of Human Genes. PrincetonUniversity Press, Princeton, pp 208-213 (1994).

Cinnioglu, C., King, R., Kivisild, T., Kalfoglu, E., Atasoy,S., Cavalleri, G.L., Lillie, A.S., Roseman, C.C., Lin,A.A., Prince, K., Oefner, P.J., Shen, P., Semino, O.,Cavalli-Sforza, L.L. and Underhill, P.A.: ExcavatingY-chromosome haplotype strata in Anatolia. Hum.Genet., 114: 127-148 (2004).

Cordaux, R., Aunger, R., Bentley, G., Nasidze, I., Sirajuddin,S.M. and Stoneking, M.: Independent origins ofIndian caste and tribal paternal lineages. Curr. Biol.,14: 231-235 (2004).

Cordaux, R., Deepa, E., Vishwanathan, H. and Stoneking,M.: Genetic evidence for the demic diffusion ofagriculture to India. Science, 304: 1125 (2004).

Cordaux, R., Weiss, G., Saha, N. and Stoneking, M.: Thenortheast Indian passageway: A barrier or corridorfor human migrations? Mol. Biol. Evol., 21: 1525-1533 (2004).

Cruciani, F., Santolamazza, P., Shen, P., Macaulay, V.,Moral, P., Olckers, A., Modiano, D., Holmes, S.,Destro-Bisol, G., Coia, V., Wallace, D.C., Oefner,P.J., Torroni, A., Cavalli-Sforza, L.L., Scozzari, R.,and Underhill, P.A.: A back migration from Asia tosub-Saharan Africa is supported by high-resolutionanalysis of human Y-chromosome haplotypes. Am.J. Hum. Genet., 70: 1197-1214 (2002).

Deraniyagala, S.U.: The Prehistory of Sri Lanka: AnEcological Perspective. Department of theArcheological Survey, Government of Sri Lanka,Colombo (1992).

Diamond, J. and Bellwood, P.: Farmers and their languages:the first expansions. Science, 300:597-602 (2003)

Excoffier L, Smouse PE, Quattro JM (1992) Analysis ofmolecular variance inferred from metric dis-tancesamong DNA haplotypes: application to humanmitochondrial DNA restriction data. Genetics, 131:479-491 (2000).

Forster, P., Rohl, A., Lunnemann, P., Brinkmann, C.,Zerjal, T., Tyler-Smith, C. and Brinkmann, B.: Ashort tandem repeat-based phylogeny for the humanY chromosome. Am. J. Hum. Genet., 67: 182-196(2000).

Fuller, D.: An agricultural perspective on Dravidianhistorical linguistics: archaeological crop packages,livestock and Dravidian crop vocabulary. Pp. 191-213. In: Examining the Farming/LanguageDispersal Hypothesis. P. Bellwood and C., Renfrew(Eds.) McDonald Institute for ArchaeologicalResearch, Cambridge. (2003).

Guha, B.S.: The racial affinities of the people of India.In: Census of India, 1931, Part III-Ethnographical(1935).

Higham, C.: Languages and farming dispersals: Austro-Asiatic languages and rice cultivation. In: Examiningthe Farming/Language Dispersal Hypothesis. P.Bellwood and C., Renfrew (Eds.). McDonald Institutefor Archaeological Research, Cambridge (2003).

James, H.V.A. and Petraglia, M.D.: Modern Human originsand the evolution of behavior in the later PleistoceneRecord of South Asia. Curr. Anthropol., 46 Supp:S3-S27

Karafet, T., Xu, L., Du, R., Wang, W., Feng, S., Wells,R.S., Redd, A.J., Zegura, S.L. and Hammer, M.F.:Paternal Population History of East Asia: Sources,

Patterns and Microevolutionary Processes. Am. J.Hum. Genet., 69: 615-628 (2001).

Karafet, T.M., Zegura, S.L., Posukh, O., Osipova, L.,Bergen, A., Long, J., Goldman, D., Klitz, W.,Harihara, S., de, Knijff, P., Wiebe, V., Griffiths, R.C.,Templeton, A.R. and Hammer, M.F.: Ancestral Asiansource(s) of new world Y-chromosome founderhaplotypes. Am. J. Hum. Genet., 64: 817-831(1999).

Kennedy, K.: God, Apes and Fossil Men:Paleoanthropology in South Asia. University ofMichigan Press, Ann. Arbor (2000).

Kivisild, T., Rootsi, S., Metspalu, M., Mastana, S.,Kaldma, K., Parik, J., Metspalu, E., Adojaan, M.,Tolk, H.V., Stepanov, V., Golge, M., Usanga, E.,Papiha, S.S., Cinnioglu, C., King, R., Cavalli-Sforza,L., Underhill, P.A. and Villems, R.: The geneticheritage of the earliest settlers persists both in Indiantribal and caste populations. Am. J. Hum. Genet.,72: 313-332 (2003).

Kosambi, D.D.: The Culture and Civilization of AncientIndia in Historical Outline. Vikas Publishing HousePvt. Ltd., New Delhi (1964)

Macaulay, V., Hill, C., Achilli, A., Rengo, C., Clarke, D.,Meehan, W., Blackburn, J., Semino, O., Scozzari,R., Cruciani, F., Taha, A., Shaari, N.K., Raja, J.M.,Ismail, P., Zainuddin, Z., Goodwin, W., Bulbeck, D.,Bandelt, H.J., Oppenheimer, S., Torroni, A., Richards,M.: Single, rapid coastal settlement of Asia revealedby analysis of complete mitochondrial genomes.Science, 308: 1034-1036 (2005).

Metspalu, M., Kivisild, T., Metspalu, E., Parik, J.,Hudjashov, G., Kaldma, K., Serk, P., Karmin, M.,Behar, D.M., Gilbert, M.T., Endicott, P., Mastana,S., Papiha, S.S., Skorecki, K., Torroni, A. and Villems,R.: Most of the extant mtDNA boundaries in southand southwest Asia were likely shaped during theinitial settlement of Eurasia by anatomically modernhumans. BMC Genet., 5: 26 (2004).

Misra, V.N.: Prehistoric human colonization of India. J.Biosci., 26: 491-531 (2005).

Nebel, A., Filon, D., Weiss, D.A., Weale, M., Faerman,M., Oppenheim, A. and Thomas, M.G.: High-resolution Y chromosome haplotypes of Israeli andPalestinian Arabs reveal geographic substructure andsubstantial overlap with haplotypes of Jews. Hum.Genet., 107: 630-641 (2000).

Passarino, G., Cavalleri, G.L., Lin, A.A., Cavalli-Sforza,L.L., Borresen-Dale, A.L. and Underhill, P.A.:Different genetic components in the Norwegianpopulation revealed by the analysis of mtDNA andY chromosome polymorphisms. Eur. J. Hum. Genet.,10: 521-529 (2002).

Pattanayak, D.P.: The language heritage of India. pp: 95-99. In: The Indian Human Heritage. Balasubramanianand N.A., Rao (Eds.), Hyderabad (1998).

Qamar, R., Ayub, Q., Mohyuddin, A., Helgason, A.,Mazhar, K., Mansoor, A., Zerjal, T., Tyler-Smith,C. and Mehdi, S.Q.: Y-chromosomal DNA variationin Pakistan. Am. J. Hum. Genet., 70: 1107-1124(2002).

Quintana-Murci, L., Krausz, C., Zerjal, T., Sayar, S.H.,Hammer, M.F., Mehdi, S.Q., Ayub, Q., Qamar, R.,Mohyuddin, A., Radhakrishna, U., Jobling, M.A.,Tyler-Smith, C. and McElreavey, K.: Y-chromosomelineages trace diffusion of people and languages in


southwestern Asia. Am. J. Hum. Genet., 68: 537-542 (2001).

Quintana-Murci, L., Semino, O., Bandelt, H.J., Passarino,G., McElreavey, K. and Santachiara-Benerecetti,A.S.: Genetic evidence of an early exit of Homosapiens sapiens from Africa through eastern Africa.Nat. Genet., 23: 437-441 (1999).

Ramana, G.V., Su, B., Jin, L., Singh, L., Wang, N.,Underhill, P. and Chakraborty, R.: Y-chromosomeSNP haplotypes suggest evidence of gene flowamong caste, tribe, and the migrant Siddipopulations of Andhra Pradesh, South India. Eur. J.Hum. Genet., 9: 695-700 (2001).

Redd, A.J., Roberts-Thomson, J., Karafet, T., Bamshad,M., Jorde, L.B., Naidu, J.M., Walsh, B. andHammer, M.F.: Gene flow from the Indiansubcontinent to Australia: evidence from the Ychromosome. Curr. Biol., 12: 673-677 (2002).

Renfrew, C.: The origins of Indo-European languages.Sci. Am., 261: 82-90 (1989)

Rosser, Z.H., Zerjal, T., Hurles, M.E., Adojaan, M.,Alavantic, D., Amorim, A., Amos, W., et al.: Y-chromosomal diversity in Europe is clinal andinfluenced primarily by geography, rather than bylanguage. Am. J. Hum. Genet., 67: 1526-1543(2000).

Roychoudhary, S., Roy, S., Basu, A., Banerjee, R.,Vishwanathan, H., Usha Rani, M.V., Sil, S.K., Mitra,M., Majumder, P.P.: Genomic structures andpopulation histories of linguistically distinct tribalgroups of India. Hum. Genet., 109: 339-50 (2001).

Sahoo, S. and Kashyap, V.K.: Phylogeography ofmitochondrial DNA and Y-Chromosomehaplogroups reveal asymmetric gene flow inpopulations of Eastern India. Am. J. Phys.Anthropol. (in press) (2006).

Sahoo, S., Singh, A., Himabindu, G., Banerjee, J.,Sitalaximi, T., Gaikwad, S., Trivedi, R., Endicott,P., Kivisild, T., Metspalu, M., Villems, R. andKashyap, V.K.: A prehistory of Indian Ychromosomes: evaluating demic diffusion scenarios.Proc. Natl. Acad. Sci. U S A., 103: 843-848 (2006).

Sambrook, J., Fritsch, E.F., Maniatis, T.: MolecularCloning. A Laboratory Manual. 2nd Ed. CSHL Press,Cold Spring Harbor, N.Y. (1989).

Schneider, S., Roessli, D. and Excoffier, L.: ARLEQUINver 2.0. A Software for Population Genetics DataAnalysis., Genetics and Biometry Laboratory,University of Geneva, Switzerland

Semino, O., Passarino, G., Oefner, P.J., Lin, A.A.,Arbuzova, S., Beckman, L.E., De, Benedictis. G.,Francalacci, P., Kouvatsi, A., Limborska, S.,Marcikiae, M., Mika, A., Mika, B., Primorac, D.,Santachiara-Benerecetti, A.S., Cavalli-Sforza, L.L.and Underhill, P.A.: The genetic legacy of PaleolithicHomo sapiens sapiens in extant Europeans: a Ychromosome perspective. Science, 290: 1155-1159(2000).

Sengupta, S., Zhivotovsky, L.A., King, R., Mehdi, S.Q.,Edmonds, C.A., Chow, C.E., Lin, A.A., Mitra, M.,

Sil, S.K., Ramesh, A., Usha Rani, M.V., Thakur, C.M.,Cavalli-Sforza, L.L., Majumder, P.P. and Underhill,P.A.: Polarity and temporality of high-resolution Y-chromosome distributions in India identify bothindigenous and exogenous expansions and revealminor genetic influence of central Asian pastoralists.Am. J. Hum. Genet., 78: 202-221 (2006).

Shi, H., Dong, Y.L., Wen, B., Xiao, C.J., Underhill, P.A.,Shen, P.D., Chakraborty, R., Jin, L. and Su, B.: Y-chromosome evidence of southern origin of the EastAsian-specific haplogroup O3-M122. Am J HumGenet., 77: 408-419 (2005).

Singh, K.S: India’s Communities. National Series. Peopleof India. Oxford University Press, New Delhi (1998)

Stringer, C.: Coasting out of Africa. Nature. 405: 24-25(2000).

Su, B., Xiao, C., Deka, R., Seielstad, M.T., Kangwanpong,D., Xiao, J., Lu, D., Underhill, P., Cavalli-Sforza, L.,Chakraborty, R., Jin, L.: Y chromosome haplotypesreveal prehistorical migrations to the Himalayas.Hum Genet., 107: 582-590 (2000).

Thangaraj, K., Sridhar, V., Kivisild, T., Reddy, A.G.,Chaubey, G., Singh, V.K., Kaur, S., Agarawal, P., Rai,A., Gupta, J., Mallick, C.B., Kumar, N., Velavan,T.P., Suganthan, R., Udaykumar, D., Kumar, R.,Mishra, R., Khan, A., Annapurna, C. and Singh, L.:Different population histories of the Mundari- andMon-Khmer-speaking Austro-Asiatic tribes inferredfrom the mtDNA 9-bp deletion/insertion polymor-phism in Indian populations. Hum. Genet., 116: 507-517 (2005).

Thapar, R.: The first millennium B.C. in the northernIndia. Pp: 80-141. In: Recent Perspective of EarlyIndian History. R Thapar (Ed.) Bombay (1995).

Underhill, P., Passarino, G., Lin, A.A., Shen, P., Mirazón,Lahr, M., Foley, R.A., Oefner, P.J. and Cavalli-Sforza,L.L.: The phylogeography of the Y chromosomebinary haplotypes and the origins of modern humanpopulations. Ann. Hum. Genet., 65: 43-62 (2001).

Underhill, P.A., Shen, P., Lin, A.A., Jin, L., Passarino, G,.Yang, W.H., Kauffman, E., Bonne-Tamir, B.,Bertranpetit, J., Francalacci, P., Ibrahim, M., Jenkins,T., Kidd, J.R., Mehdi, S.Q., Seielstad, M.T., Wells,R.S., Piazza, A., Davis, R.W., Feldman, M.W., Cavalli-Sforza, L.L. and Oefner, P.J.: Y chromosome sequencevariation and the history of human populations. Nat.Genet., 26: 358-361 (2000).

Wells, R.S., Yuldasheva, N., Ruzibakiev, R., Underhill, P.,Evseeva, I., Blue-Smith, J., Jin, L., et al.: The Eurasianheartland: A continental perspective on Y-chromosome diversity. Proc. Natl. Acad. Sci. USA98: 10244-10249 (2001).

Zhivotovsky, L.A., Underhill, P.A., Cinnioglu, C., Kayser,M., Morar, B., Kivisild, T., Scozzari, R., Cruciani, F.,Destro-Bisol, G., Spedini, G., Chambers, G., Herrera,R.J., Yong, K.K., Gresham, D., Tournev, I., Feldman,M.W. and Kalaydjieva, L: The Effective MutationRate at Y Chromosome Short Tandem Repeats, withApplication to Human Population-Divergence Time.Am. J. Hum. Genet., 74: 50-61 (2004).


KEYWORDS Population Genetics. People of India. Linguistic Groups. Migration

ABSTRACT Paleoanthropological evidence indicates that modern humans reached South Asia in one of the firstdispersals out of Africa, which were later followed by migrations from different parts of the world. The variation of38 binary and 20 microsatellite polymorphisms on the non-recombining part of the Y-chromosome was examined in1434 male subjects of 87 different populations of India to investigate various hypothesis of migration and peoplingof South Asia. This survey revealed a total of 24 paternal lineages, of which haplogroups H, R1a1, O2a and R2accounted for approximately 70% of the Indian Y-Chromosomes. The high NRY diversity value (0.893) andcoalescence age of approx. 45-50 KYA for H and C haplogroups indicated an early settlement of the subcontinent bymodern humans. Haplogroup frequency and AMOVA results provide congruent evidence in support a commonPleistocene origin of Indian populations, with limited influence of Indo-European gene pool on the Indian society.The differential Y-chromosome and mt DNA pattern in the two Austric speakers of India signaled that an earliermale–mediated exodus from South East Asia largely involved the Austro-Asiatic tribes, while the Tibeto-Burmanmales migrated with females through two different routes; one from Burma probably brought the Naga-Kuki-Chinlanguage and O3e Y-chromosomes and the other from Himalayas, which carried the YAP lineages into northernregions of subcontinent. Based on distribution of Y-chromosome haplogroups (H, C, O2a, and R2) and deep coalescingtime depths for these paternal lineages, we suggest that the present day Dravidian speaking populations of South Indiaare the descendants of earliest Pleistocene settlers while Austro-Asiatic speakers came from SE Asia in a latermigration event.

Authors’ Addresses: R. Trivedi, Sanghamitra Sahoo, Anamika Singh, G. Hima Bindu, Jheelam

Banerjee, Manuj Tandon, Sonali Gaikwad, Revathi Rajkumar, T. Sitalaximi, Richa Ashma, R. Trivediand V. K. Kashyap, National DNA Analysis Centre, Central Forensic Science Laboratory, 30, GorachandRoad, Kolkata 700 014, West Bengal, IndiaSanghamitra Sahoo and V. K. Kashyap, National Institute of Biologicals, A-32, Sector 62, InstitutionalArea, NOIDA 201307, Uttar Pradesh, IndiaTelephone: +91-120-2400027, Fax: +91-120-2403014, E-mail: [email protected]. B. N. Chainy, PG Department of Biotechnology,Vani Vihar, Utkal University, Bhubaneswar 751 004,Orissa, India

© Kamla-Raj Enterprises 2007 Anthropology Today: Trends, Scope and ApplicationsAnthropologist Special Volume No. 3: 393-414 (2007) Veena Bhasin & M.K. Bhasin, Guest Editors

chapter 31 high resolution phylogeographic map of y ... volume-journal/t-anth-00-special...

Documents