chapter 31 high resolution phylogeographic map of y ... volume-journal/t-anth-00-special...
TRANSCRIPT
CHAPTER 31
High Resolution Phylogeographic Map of Y-ChromosomesReveal the Genetic Signatures of Pleistocene Origin of
Indian Populations
R. Trivedi, Sanghamitra Sahoo, Anamika Singh, G. Hima Bindu, Jheelam Banerjee,Manuj Tandon, Sonali Gaikwad, Revathi Rajkumar, T Sitalaximi, Richa Ashma,
G. B. N. Chainy and V. K. Kashyap
INTRODUCTION
The origins of modern humans in South Asiahave been obscure. Archeological and paleo-anthropological evidences are few and fragmen-tary. Human remains dating back to the LatePleistocene provide limited but conclusive evi-dence for early human occupation in the Indiansubcontinent (Deraniyagala, 1992; Kennedy,2000; James et al., 2005). A number of artefacts ofMiddle and Upper Paleolithic cultures in NarmadaValley and the remains of Acheulian culture havebeen extensively found through out South Asia.Mesolithic microliths and evidences of Neolithicsettlements found in diverse parts of thesubcontinent also testify towards occupation ofIndia by early humans (Misra, 2001). Most of theprevailing genetic records further corroborate withthe hypothesis that Homo sapiens colonizedSouth Asia as a part of an early southern dispersalfrom Africa (Quintana-Murci et al., 1999; Cann,2001; Macaulay et al., 2005). This paper examinesthe current genetic diversity of Indian Y chromo-somes in context to place the genetic origin/(s)and time of settlement of the earliest humanpopulations in India.
The present-day populations of India belongto 4635 endogamous communities (Singh, 1998)and speak as many as 350 living languages(ethnologue), which fall under the four majorsupra-language families, i.e., Indo-European,Dravidian, Sino-Tibetan and Austric. The natureof extensive diversity among varied groupsreported with 54 classical markers showed a typi-cally north-south geographic division of popula-tions and placed Indians closer to Europeanpopulations than either with east-Asians orAfricans in the genetic distance trees (Cavalli-Sforza et al., 1994). A number of studies based onmt DNA, Y-chromosome and other nuclear DNAmarkers have invariably supported these obser-vations. Numerous surveys of genetic variationhave generally portrayed the differences between
caste and tribes, and the extent of gene flowamong ranked caste clusters. Most of the studiesconclude that maternal gene pool Indianpopulations are proto-Asian in origin with limitedwest-Eurasian admixture. While the Y-chromo-somes of the caste populations were found to bemore similar to Europeans than Asians; withgreater west-Eurasian admixture in castes ofhigher rank (Bamshad et al., 2001), recent studiesprovide congruent evidence against any majorinflux of Indo-European speakers into the Indiangene pool and have ascertained a late PleistoceneSouth Asian origin for majority of Indianpopulations (Sahoo et al., 2006; Sengupta, 2006).These new findings are consistent with archeo-botanical evidences (Fuller, 2003) and linguisticdata (Renfrew, 1989) which suggest a recentcommon root for Elamite and Dravidic languages.It is hypothesized that the same prehistoric genepool of southern Asian Pleistocene coastal settlersfrom Africa provided inocula for both Indian castesand tribes, and subsequent diversification of thegene pools was probably due to the genetic imprintslaid down by later migrants, such as Huns, Greeks,Kushans, Moghuls, and others (Kivisild et al.,2003). However, much speculations remain aboutwhich of the population groups are amongst theearliest settlers of the Indian subcontinent. Whilethe Austro-Asiatic tribes have been presumed tobe descendants of the early modern humansbased on nucleotide diversity of mitochondrialM haplogroup (Roychoudhary et al., 2001; Basuet al., 2003), analysis of the Indian Y chromosomesundertaken in this study depicts a differentscenario.
In this study, we have assessed a total of 1434unrelated male individuals belonging to 87different Indian populations, of which 936 Y-chromosomes have been previously analysed for38 Y-SNP markers (Sahoo et al., 2006). Additional216 samples are included in the present analysis,while Y-chromosomal haplogroup data for 282additional samples from seven other Indian
394 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
populations were collated from the literature(Kivisild et al., 2003; Cordaux et al., 2004). Thepresent study is based on simultaneous analysisof 38 SNP and 20-STR markers on the Y-chromo-some to provide the age estimates and describetheir phylogeographic distribution. Apart fromdetermining antiquity of various populationsgroups in South Asia, we also discuss the geneticstructure and peopling of the subcontinent inlight of present molecular evidences.
SUBJECTS AND METHODS
Populations Analyzed
A total of 1152 unrelated male individualsbelonging to 80 different populations wereanalyzed in the present study. Samples includepopulations from various linguistic families (Indo-European, Austro-Asiatic, Dravidian and Tibeto-Burman) and sixteen geographical areas of India.Blood samples were collected with informedconsent using a protocol approved by the ethicalcommittee of CFSL, Kolkata. DNA was extractedusing standard protocols (Sambrook, 1989) fromperipheral blood lymphocytes. Informationconcerning their geographic origin, linguistic andsocio-ethnic affiliation for each population isgiven in table 1. Additional data on 282 samplesfrom seven Indian populations (Punjab,Konkanstha Brahmin, Koya, Yerava, Mullukunan,Kuruchian, Koraga) and from 76 populations ofWestern Europe (51), Russia (281), Middle East(102), Caucasus (122), Central Asia (584), Siberia(66), North East Asia (334), South East Asia (552),Oceania (225), Pakistan (691) and Sri Lanka (39)were collated from literature and included in thegenetic distance analysis.
Markers Analyzed
38 binary polymorphisms included in thepresent analysis have been previously described(Sahoo et al. 2006). Analysis of 20 Y-STRs (twelvetetranucleotide repeats DYS19, DYS385a/b, DYS389I/II, DYS390, DYS391, DYS393, DYS460, H4,DYS437 and DYS439 and three trinucleotiderepeats, DYS392, DYS388 and DYS426, twodinucleotide YCAIIa/b, two pentanucleotiderepeat loci, DYS438 and DYS447 and a hexa-repeatnucleotide DYS448) was carried out in the sameDNA samples by using an in-house standardizedprotocol using primers described elsewhere
(Butler et al., 2002). Y-STR haplotypes wereconstructed in a sequential order of loci keepingan ascending numerical order for the minimalhaplotype to facilitate Y-chromosome compari-sons with other world populations.
Statistical Analyses
Several population genetic parameters,including mean haplogroup and haplotypediversity and their standard errors, mean numberof pair wise differences (MPD), pairwise F
ST values
for haplogroups and associated p values werecalculated using ARLEQUIN ver.2.0 softwarepackage (Schneider et al. 2000). To test fordifferences in the proportions, χ2- test forsignificance was employed.
Apportionment of genetic differences amongvarious socio-ethnic, geographic and linguisticgroups at different levels of hierarchical sub-divisions; between individuals within populations,between populations within groups and betweengroups of populations were calculated usinganalysis of molecular variance (AMOVA)(Excoffier et al., 1992). To examine the factor/(s)responsible for genetic differentiation of Y-chromosomes, AMOVA was done both on binarymarkers as well as with Y-STRs within the lineages.Significance levels of the genetic variancecomponents as well as Ö
ST values were estimated
by using 10000 iterations.Median-joining network algorithm of
haplogroup associated haplotypes (Bandelt et al.,1999; Forster et al., 2000) was performed usingthe software NETWORK 4.1.0.8 version (LifeSciences and Engineering Technology SolutionsWeb site), with epsilon value set to zero. Fornetwork calculation, seven Y-STR (DYS19,DYS389I, DYS389II, DYS390, DYS391, DYS392and DYS393) loci were used, where weightage toeach locus was given according to the estimatedvariance. Y-STR loci with highest variance wasgiven the lowest weights.
To estimate the time to the most recentcommon ancestor (TMRCA), we calculated theages to STR variation within the correpondinghaplogroup observed in the Indian populationsusing the average square difference (ASD)method. We used the same seven Y-STRs as thoseused in Network analysis and and a generationtime of 25 years and mutation rate of 6.9 X 10-4 asdescribed by (Zhivotovsky et al., 2004).
Neighbor-joining tree based on FST values of
395 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
87 Indian populations were used to illustrate thegenetic affinity between the studied groups usingMEGA 3.0. Genetic relationship betweenpopulations of India and other parts of the worldwas estimated based on pairwise geneticdistances (FST values) calculated from haplogroupfrequencies. Multidimensional scaling (MDS)analysis of pairwise FST values was performedusing XL STAT pro 7.5 to decipher the geneticaffinities of populations. The Indian populationswere pooled into their regional boundaries forcomparison of genetic similarity with worldpopulations, to obtain a better resolution in theMDS plot.
RESULTS
Approximately 1152 individuals belonging to80 extant human populations from 16geographical regions of India were analysed with38 Y-SNPs and 20 Y-STR markers to evaluate thepossibility that Austro-Asiatic speakers are theearliest settlers of the Indian subcontinent. 24different paternal lineages were observed, out ofwhich, haplogroups H-M69, R1a1-M17, R2-M124and O2a-M95 together account for 69% of thepaternal diversity in South Asia. Another 20.9%of the genetic variation in Indian males isdescribed by haplogroups L-M11, J2-M172, O3e-M134, K2-M70, F-M89 and C-RPS4Y
711, while the
presence of other haplogroups- R1b3-M269 andG-M201 could be attributed to recent admixturewith Europeans.
Haplogroups and Extent of Y-ChromosomeDiversity in Indians
The Y-SNPs used in this study were based onprevious reports of polymorphisms in Eurasianand Oceanic populations. Overall haplogroupdiversity among Indians was relatively high whencompared to European or East Asian populations.Indian populations depicted diversity values from0.133 to a high of 0.914, with Austro-Asiatic andTibeto-Burman tribes generally showing reduceddiversity (Table 1). Twenty-five Dravidianpopulations showed a higher mean haplogroupdiversity (0.723± 0.083) compared to Indo-European speakers (0.684± 0.079) represented bythirty endogamous groups. South Indian groups;Andhra Brahmins, Kallar, Raju, Chenchu andLambadi displayed high lineage diversity values,while populations of North India typically
demonstrated lower mean haplogroup diversity(0.569 ± 0.104). The distribution of the lineagesand Y-STR diversity within the haplogroups aredescribed in detail.
Haplogroup H-M69
Majority of males analyzed from differentgeographic regions of India (23%) carried theM69C haplotype, which is additionally definedby M52C mutation. Distribution of HaplogroupH showed a north-south gradient (24.4% to27.4%), however geographically its totalfrequency was highest (44.4%) in populations ofwestern India (Table 2a). Among the 23% carryingH lineage in their Y-chromosomes, most of themwere representatives of south India (27.4%),speaking Dravidian languages (30.0%). In socio-ethnic groups, the frequency was 27.6% in lowercaste groups, while in the tribal groups itaccounted for 21.2% of their paternal variation.However, the pattern of distribution did not varystatistically between the Dravidian and Indo-European speaking tribes or caste cluster (Table2b). 206 distinct 20-Y-STR haplotype profilesdeciphered out of 221 individuals carried a meanpairwise difference of 12.66 (Table 3), where noneof the hapotypes were shared between groups.In the median–joining network analysis with 7-Y-STRs associated with M69C/ M52C lineagebranch, majority of Y-chromosome STR haplo-types are connected by one-or two-step mutationevents (Fig. 2a). Kora, an Indo-European speakingtribal group from eastern India, branched out ofthe network with more than three mutation steps.The Y-STR based coalescence time of haplogroupH1 chromosomes was estimated to be ~ 43,556years (Table 4).
Haplogroup R1a1-M17
Haplogroup R1a1-M17 characterizes 17.5% ofthe Indians. The Indo-European speakersdemonstrated a significantly higher proportionof this lineage as compared to populationsbelonging to Dravidian linguistic family (29.7%vs. 11.7%; χ2= 7.82, p<0.05) (Table 2a). With theexception of Lodha, Nepali and Bhutia, all otherAustro-Asiatic and Tibeto-Burman speakers lackthis haplogroup in their Y-chromsomes. While theIndo-European and Dravidian caste group depictsignificant variation (χ2= 12.5, p<0.01), the tribalgroups are more akin. Distribution of M17 lineage
396 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.T
able
1:
Des
crip
tion
of
the
Ind
ian
pop
ula
tion
s in
clu
ded
in
th
is s
tud
y
Pop
ulat
ion
Cod
eSt
ate
Reg
ion
La
ng
ua
ge
Soci
al S
tatu
sH
iera
rchy
Hap
logr
oup
Div
ersi
ty
1A
ND
HR
AB
RA
HM
INA
NB
AN
DH
RA
PR
AD
ES
HS
outh
Dra
vidi
anC
AS
TE
Upp
er0.
8538
± 0
.050
42
CH
EN
CH
UC
HU
AN
DH
RA
PR
AD
ES
HS
outh
Dra
vidi
anT
RIB
E0.
8474
± 0
.043
73
KA
MM
A C
HA
UD
HA
RY
KM
CA
ND
HR
A P
RA
DE
SH
Sou
thD
ravi
dian
CA
ST
EL
ower
0.64
33 ±
0.1
078
4K
AP
PU
NA
IDU
KP
NA
ND
HR
A P
RA
DE
SH
Sou
thD
ravi
dian
CA
ST
EL
ower
0.57
37 ±
0.1
213
5K
OM
AT
IK
OM
AN
DH
RA
PR
AD
ES
HS
outh
Dra
vidi
anC
AS
TE
Low
er0.
5000
± 0
.122
26
LA
MB
AD
IL
MD
AN
DH
RA
PR
AD
ES
HS
outh
Indo
-Eur
opea
nT
RIB
E0.
8789
± 0
.043
27
NA
IKP
OD
GO
ND
NP
GA
ND
HR
A P
RA
DE
SH
Sou
thD
ravi
dian
TR
IBE
0.74
21 ±
0.0
584
8R
AJU
RA
JA
ND
HR
A P
RA
DE
SH
Sou
thD
ravi
dian
CA
ST
EM
iddl
e0.
8596
± 0
.039
39
RE
DD
YR
DY
AN
DH
RA
PR
AD
ES
HS
outh
Dra
vidi
anC
AS
TE
Mid
dle
0.80
22 ±
0.0
687
10
YE
RU
KU
LA
YE
RA
ND
HR
A P
RA
DE
SH
Sou
thD
ravi
dian
TR
IBE
0.63
16 ±
0.0
875
11
AD
I PA
SI
AD
IA
RU
NA
CH
AL
PR
AD
ES
HN
orth
-Eas
tT
ibet
o-B
urm
anT
RIB
E0.
3556
± 0
.159
11
2B
IHA
RB
RA
HM
INB
BH
BIH
AR
Eas
tIn
do-E
urop
ean
CA
ST
EU
pper
0.45
10 ±
0.1
174
13
BH
UM
IHA
RB
HU
BIH
AR
Eas
tIn
do-E
urop
ean
CA
ST
EM
iddl
e0.
6368
± 0
.115
11
4R
AJP
UT
BR
JB
IHA
RE
ast
Indo
-Eur
opea
nC
AS
TE
Upp
er0.
4394
± 0
.158
11
5K
AY
AS
TH
AB
KY
BIH
AR
Eas
tIn
do-E
urop
ean
CA
ST
EM
iddl
e0.
7363
± 0
.074
81
6Y
AD
AV
YA
VB
IHA
RE
ast
Indo
-Eur
opea
nC
AS
TE
Low
er0.
6786
± 0
.122
01
7K
UR
MI
KU
IB
IHA
RE
ast
Indo
-Eur
opea
nC
AS
TE
Low
er0.
8590
± 0
.063
31
8B
AN
IYA
BA
NB
IHA
RE
ast
Indo
-Eur
opea
nC
AS
TE
Low
er0.
7273
± 0
.067
91
9G
UJ
PAT
EL
PA
TG
UJR
AT
Wes
tIn
do-E
urop
ean
CA
ST
EM
iddl
e0.
7778
± 0
.110
02
0H
P R
AJP
UT
HR
JH
PN
ort
hIn
do-E
urop
ean
CA
ST
EU
pper
0.91
43 ±
0.0
425
21
OR
OA
NO
RO
JHA
RK
HA
ND
Eas
tD
ravi
dian
TR
IBE
0.40
91 ±
0.1
333
22
HO
HO
JHA
RK
HA
ND
Eas
tA
ustr
o-A
siat
icT
RIB
E0.
0000
± 0
.000
02
3B
HU
MIJ
BH
JJH
AR
KH
AN
DE
ast
Aus
tro-
Asi
atic
TR
IBE
0.45
71 ±
0.1
406
24
KH
AR
IAK
RA
JHA
RK
HA
ND
Eas
tA
ustr
o-A
siat
icT
RIB
E0.
5333
± 0
.180
12
5M
UN
DA
MU
NJH
AR
KH
AN
DE
ast
Aus
tro-
Asi
atic
TR
IBE
0.14
29 ±
0.1
188
26
BIR
HO
RB
IRJH
AR
KH
AN
DE
ast
Aus
tro-
Asi
atic
TR
IBE
0.13
33 ±
0.1
123
27
SA
NT
HA
LSA
NJH
AR
KH
AN
DE
ast
Aus
tro-
Asi
atic
TR
IBE
0.00
00 ±
0.0
000
28
IYE
NG
AR
IYN
KA
RN
AT
AK
AS
outh
Dra
vidi
anC
AS
TE
Upp
er0.
7485
± 0
.061
02
9L
ING
AY
AT
LY
NK
AR
NA
TA
KA
Sou
thD
ravi
dian
CA
ST
EU
pper
0.83
33 ±
0.0
691
30
GO
WD
AG
OW
KA
RN
AT
AK
AS
outh
Dra
vidi
anC
AS
TE
Low
er0.
9524
± 0
.095
53
1B
HO
VI
BH
VK
AR
NA
TA
KA
Sou
thD
ravi
dian
CA
ST
EL
ower
0.75
24 ±
0.0
918
32
CH
RIS
TIA
NC
HR
KA
RN
AT
AK
AS
outh
Dra
vidi
anC
AS
TE
Low
er0.
8333
± 0
.059
73
3M
US
LIM
MU
SK
AR
NA
TA
KA
Sou
thD
ravi
dian
CA
ST
EL
ower
0.83
33 ±
0.2
224
34
KU
RU
VA
KU
RK
AR
NA
TA
KA
Sou
thD
ravi
dian
TR
IBE
0.74
36 ±
0.0
909
35
DE
SA
ST
H B
RA
HM
IND
SBM
AH
AR
AS
TR
AW
est
Indo
-Eur
opea
nC
AS
TE
Upp
er0.
8421
± 0
.065
73
6C
HIT
PAV
AN
BR
AH
MIN
CH
BM
AH
AR
AS
TR
AW
est
Indo
-Eur
opea
nC
AS
TE
Upp
er0.
9048
± 0
.040
53
7M
AR
AT
HA
MA
TM
AH
AR
AS
TR
AW
est
Indo
-Eur
opea
nC
AS
TE
Mid
dle
0.80
00 ±
0.0
681
38
DH
AN
GA
RD
GR
MA
HA
RA
ST
RA
Wes
tIn
do-E
urop
ean
CA
ST
EL
ower
0.81
67 ±
0.0
571
39
PAW
AR
AP
WR
MA
HA
RA
ST
RA
Wes
tIn
do-E
urop
ean
TR
IBE
0.74
17 ±
0.1
053
40
KA
TK
AR
IK
TK
MA
HA
RA
ST
RA
Wes
tIn
do-E
urop
ean
TR
IBE
0.82
46 ±
0.0
648
41
MA
DIA
GO
ND
MG
DM
AH
AR
AS
TR
AW
est
Dra
vidi
anT
RIB
E0.
6264
± 0
.109
8
397 EARLIEST SETTLERS OF INDIAN SUBCONTINENTT
able
1:
Con
td...
.
Pop
ulat
ion
Cod
eSt
ate
Reg
ion
La
ng
ua
ge
Soci
al S
tatu
sH
iera
rchy
Hap
logr
oup
Div
ersi
ty
42
MA
HA
DE
O K
OL
IM
KL
MA
HA
RA
ST
RA
Wes
tIn
do-E
urop
ean
TR
IBE
0.76
36 ±
0.0
833
43
MA
RA
MR
AM
IZO
RA
MN
orth
-Eas
tT
ibet
o-B
urm
anT
RIB
E0.
5333
± 0
.051
54
4H
MA
RH
MR
MIZ
OR
AM
Nor
th-E
ast
Tib
eto-
Bur
man
TR
IBE
0.27
89 ±
0.1
235
45
LA
IL
AI
MIZ
OR
AM
Nor
th-E
ast
Tib
eto-
Bur
man
TR
IBE
0.54
55 ±
0.0
615
46
LU
SE
IL
US
MIZ
OR
AM
Nor
th-E
ast
Tib
eto-
Bur
man
TR
IBE
0.40
94 ±
0.1
002
47
KU
KI
KU
KM
IZO
RA
MN
orth
-Eas
tT
ibet
o-B
urm
anT
RIB
E0.
6818
± 0
.091
04
8M
AN
IPU
RI
MU
SL
IMM
MS
MIZ
OR
AM
Nor
th-E
ast
Tib
eto-
Bur
man
CA
ST
EL
ower
0.83
33 ±
0.0
980
49
OR
IYA
BR
AH
MIN
OB
HO
RIS
SAE
ast
Indo
-Eur
opea
nC
AS
TE
Upp
er0.
8043
± 0
.069
75
0K
AR
AN
KR
NO
RIS
SAE
ast
Indo
-Eur
opea
nC
AS
TE
Mid
dle
0.64
71 ±
0.0
953
51
KH
AN
DA
YA
TK
DY
OR
ISSA
Eas
tIn
do-E
urop
ean
CA
ST
EM
iddl
e0.
7564
± 0
.097
45
2G
OP
EG
PE
OR
ISSA
Eas
tIn
do-E
urop
ean
CA
ST
EL
ower
0.83
33 ±
0.0
720
53
PAR
OJA
PR
JO
RIS
SAE
ast
Dra
vidi
anT
RIB
E0.
8667
± 0
.048
35
4JU
AN
GJU
NO
RIS
SAE
ast
Aus
tro-
Asi
atic
TR
IBE
0.00
00 ±
0.0
000
55
SAO
RA
SAR
OR
ISSA
Eas
tA
ustr
o-A
siat
icT
RIB
E0.
6784
± 0
.088
45
6N
EPA
LI
NE
PS
IKK
IMN
orth
-Eas
tT
ibet
o-B
urm
an/
CA
ST
EU
pper
0.90
48 ±
0.1
033
Indo
-Eur
opea
n5
7B
HU
TIA
BH
TS
IKK
IMN
orth
-Eas
tT
ibet
o-B
urm
anT
RIB
E0.
5000
± 0
.265
25
8C
HA
KK
LIA
RC
HK
TA
MIL
NA
DU
Sou
thD
ravi
dian
CA
ST
EL
ower
0.79
12 ±
0.0
673
59
KA
LL
AR
KA
LT
AM
IL N
AD
US
outh
Dra
vidi
anC
AS
TE
Mid
dle
0.87
88 ±
0.0
751
60
VA
NN
IYA
RV
AN
TA
MIL
NA
DU
Sou
thD
ravi
dian
CA
ST
EM
iddl
e0.
8333
± 0
.059
76
1PA
LL
AR
PAL
TA
MIL
NA
DU
Sou
thD
ravi
dian
CA
ST
EL
ower
0.70
00 ±
0.0
896
62
GO
UN
DE
RG
OU
TA
MIL
NA
DU
Sou
thD
ravi
dian
CA
ST
EU
pper
0.71
24 ±
0.0
650
63
IRU
LA
RIR
UT
AM
IL N
AD
US
outh
Dra
vidi
anT
RIB
E0.
7576
± 0
.122
16
4K
AN
YA
KU
BJ
BR
AH
MIN
KK
BU
TT
AR
PR
AD
ES
HN
ort
hIn
do-E
urop
ean
CA
ST
EU
pper
0.69
09 ±
0.1
276
65
UP
JA
TU
PJ
UT
TA
R P
RA
DE
SH
No
rth
Indo
-Eur
opea
nC
AS
TE
Upp
er0.
0000
± 0
.000
06
6U
P T
HA
KU
RU
PT
UT
TA
R P
RA
DE
SH
No
rth
Indo
-Eur
opea
nC
AS
TE
Upp
er0.
5357
± 0
.123
26
7K
HA
TR
IK
HT
UT
TA
R P
RA
DE
SH
No
rth
Indo
-Eur
opea
nC
AS
TE
Mid
dle
0.28
57 ±
0.1
964
68
BH
OK
SH
AB
KS
UT
TA
R P
RA
DE
SH
No
rth
Tib
eto-
Bur
man
/T
RIB
E0.
6222
± 0
.138
3In
do-E
urop
ean
69
UP
KU
RM
IU
PK
UT
TA
R P
RA
DE
SH
No
rth
Indo
-Eur
opea
nC
AS
TE
Low
er0.
0000
± 0
.000
07
0T
HA
RU
TH
RU
TT
AR
PR
AD
ES
HN
ort
hT
ibet
o-B
urm
an/
TR
IBE
0.80
00 ±
0.1
721
Indo
-Eur
opea
n7
1JA
UN
SA
RI
JUS
UT
TA
R P
RA
DE
SH
No
rth
Tib
eto-
Bur
man
/T
RIB
E0.
7333
±
0.1
552
Indo
-Eur
opea
n7
2M
AH
ISH
IYA
MSY
WE
ST
BE
NG
AL
Eas
tIn
do-E
urop
ean
CA
ST
EM
iddl
e0.
8684
±
0.0
489
73
NA
MA
SU
DR
AN
MS
WE
ST
BE
NG
AL
Eas
tIn
do-E
urop
ean
CA
ST
EL
ower
0.90
00 ±
0
.035
57
4B
AU
RI
BA
UW
ES
T B
EN
GA
LE
ast
Indo
-Eur
opea
nC
AS
TE
Low
er0.
6526
±
0.0
648
75
MA
HE
LI
MH
LW
ES
T B
EN
GA
LE
ast
Aus
tro-
Asi
atic
TR
IBE
0.82
11 ±
0
.058
67
6K
AR
MA
LI
KR
MW
ES
T B
EN
GA
LE
ast
Aus
tro-
Asi
atic
TR
IBE
0.29
24 ±
0
.127
47
7K
OR
AK
OR
WE
ST
BE
NG
AL
Eas
tIn
do-E
urop
ean
TR
IBE
0.75
79 ±
0
.049
57
8L
OD
HA
LO
DW
ES
T B
EN
GA
LE
ast
Aus
tro-
Asi
atic
TR
IBE
0.84
21 ±
0
.059
57
9E
ZH
AV
A H
IND
UE
ZH
KE
RA
LA
Sou
thD
ravi
dian
CA
ST
EL
ower
0.80
56 ±
0
.088
98
0N
AIR
NA
RK
ER
AL
AS
outh
Dra
vidi
anC
AS
TE
Upp
er0.
0000
±
0.0
000
398 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
Tab
le 2
a: C
omp
reh
ensi
ve h
aplo
grou
p f
req
uen
cy d
ata
amon
g li
ngu
isti
c, g
eogr
aph
ic a
nd
soc
ial
cate
gori
es o
f In
dia
S
ampl
e Si
zeC
DF
*G
H*
H1
H2
J2*
K*
K2
LL
1M
NO
*
TO
TA
L I
ND
IA1
15
20
.01
40
.00
40
.03
00
.00
10
.06
90
.15
90
.00
20
.05
10
.03
80
.03
10
.04
50
.01
00
.00
00
.00
00
.00
3L
an
gu
ag
eIN
DO
-EU
RO
PE
AN
51
80
.01
20
.00
40
.02
70
.00
20
.07
90
.18
30
.00
20
.05
80
.02
90
.03
30
.03
50
.00
20
.00
00
.00
00
.00
0D
RA
VID
IAN
39
30
.02
00
.00
00
.04
80
.00
00
.08
90.
209
0.0
03
0.0
56
0.0
41
0.0
43
0.0
84
0.0
28
0.0
00
0.0
00
0.0
00
AU
ST
RO
-AS
IAT
IC1
40
0.0
14
0.0
00
0.0
14
0.0
00
0.0
21
0.0
43
0.0
00
0.0
50
0.0
43
0.0
14
0.0
07
0.0
00
0.0
00
0.0
00
0.0
07
TIB
ET
O-B
UR
MA
N1
01
0.0
00
0.0
30
0.0
00
0.0
00
0.0
10
0.0
00
0.0
00
0.0
00
0.0
69
0.0
00
0.0
00
0.0
00
0.0
00
0.0
00
0.0
20
Geo
grap
hyN
OR
TH
18
00
.00
00
.00
60
.01
10
.00
60
.10
60
.13
90
.00
00
.07
80
.00
00
.00
00
.01
70
.00
00
.00
00
.00
00
.00
0W
ES
T1
35
0.0
37
0.0
00
0.0
07
0.0
00
0.0
81
0.35
60
.00
70
.08
10
.00
00
.00
00
.09
60
.00
00
.00
00
.00
00
.00
0E
AS
T3
57
0.0
11
0.0
00
0.0
48
0.0
00
0.0
56
0.1
06
0.0
00
0.0
36
0.0
53
0.0
48
0.0
20
0.0
03
0.0
00
0.0
00
0.0
03
NO
RT
H-E
AS
T1
08
0.0
00
0.0
37
0.0
00
0.0
00
0.0
09
0.0
00
0.0
00
0.0
00
0.0
83
0.0
00
0.0
00
0.0
00
0.0
00
0.0
00
0.0
19
SO
UT
H3
72
0.0
19
0.0
00
0.0
40
0.0
00
0.0
78
0.19
40
.00
30
.05
60
.04
30
.05
10
.07
80
.03
00
.00
00
.00
00
.00
0So
cial
Hie
rarc
hyU
PP
ER
CA
ST
E2
11
0.0
09
0.0
05
0.0
19
0.0
00
0.0
43
0.1
85
0.0
05
0.1
00
0.0
24
0.0
00
0.0
95
0.0
19
0.0
00
0.0
00
0.0
00
MID
DL
E C
AS
TE
17
50
.00
60
.00
00
.05
10
.00
00
.04
00
.17
10
.00
00
.09
70
.04
00
.01
70
.03
40
.02
30
.00
00
.00
00
.00
0L
OW
ER
CA
ST
E2
61
0.0
08
0.0
00
0.0
46
0.0
00
0.1
07
0.1
69
0.0
00
0.0
31
0.0
50
0.0
46
0.0
54
0.0
00
0.0
00
0.0
00
0.0
00
TR
IBE
S5
05
0.0
22
0.0
08
0.0
20
0.0
02
0.0
71
0.1
39
0.0
02
0.0
26
0.0
38
0.0
42
0.0
24
0.0
08
0.0
00
0.0
00
0.0
06
Sam
ple
Size
O2
aO
2a
1O
3O
3e
P*
R*
R1
R1
aR
1a
1R
1b
3R
2
TO
TA
L I
ND
IA1
15
20
.14
90
.00
10
.00
10
.02
60
.02
70
.01
00
.01
10
.00
20
.17
50
.00
50
.13
5L
an
gu
ag
eIN
DO
-EU
RO
PE
AN
51
80
.01
00
.00
00
.00
20
.00
60
.03
90
.02
10
.00
80
.00
40.
297
0.0
12
0.1
37
DR
AV
IDIA
N3
93
0.0
23
0.0
00
0.0
00
0.0
00
0.0
08
0.0
00
0.0
23
0.0
00
0.1
17
0.0
00
0.2
09
AU
ST
RO
-AS
IAT
IC1
40
0.72
90
.00
00
.00
00
.00
00
.03
60
.00
00
.00
00
.00
00
.00
70
.00
00
.01
4T
IBE
TO
-BU
RM
AN
10
10.
554
0.0
10
0.0
00
0.2
67
0.0
30
0.0
00
0.0
00
0.0
00
0.0
10
0.0
00
0.0
00
Geo
grap
hyN
OR
TH
18
00
.00
00
.00
00
.00
60
.01
70
.00
00
.01
10
.00
00
.00
60.
483
0.0
06
0.1
11
WE
ST
13
50
.00
00
.00
00
.00
00
.00
00
.03
00
.02
20
.00
00
.00
00
.19
30
.00
00
.08
9E
AS
T3
57
0.32
50
.00
00
.00
00
.00
00
.04
50
.01
40
.00
60
.00
30
.10
40
.00
00
.12
0N
OR
TH
-EA
ST
10
80.
519
0.0
09
0.0
00
0.2
50
0.0
46
0.0
00
0.0
09
0.0
00
0.0
19
0.0
00
0.0
00
SO
UT
H3
72
0.0
00
0.0
00
0.0
00
0.0
00
0.0
16
0.0
03
0.0
27
0.0
00
0.1
34
0.0
13
0.2
15
Soci
al H
iera
rchy
UP
PE
R C
AS
TE
21
10
.00
00
.00
00
.00
00
.00
00
.01
90
.01
40
.00
50
.00
50.
360
0.0
05
0.0
90
MID
DL
E C
AS
TE
17
50
.00
00
.00
00
.00
00
.00
00
.02
90
.01
10
.02
90
.00
00.
263
0.0
00
0.1
89
LO
WE
R C
AS
TE
26
10
.00
40
.00
00
.00
00
.00
40
.02
30
.00
40
.02
30
.00
00
.15
70
.00
00.
276
TR
IBE
S5
05
0.33
90
.00
20
.00
20
.05
70
.03
20
.01
00
.00
20
.00
20
.07
70
.01
00
.06
1
399 EARLIEST SETTLERS OF INDIAN SUBCONTINENTT
able
2b
: C
omp
arat
ive
hap
logr
oup
fre
qu
ency
dat
a in
soc
io-e
thn
ic g
rou
ps
of I
nd
ia
Tr
ibe
Dra
vidi
an C
aste
Indo
-Eur
opea
n C
aste
No.
of
popu
lati
ons
11
87
85
41
09
88
19
25
Sam
ple
size
.1
79
12
69
21
08
72
58
13
71
32
11
71
15
Tot
al (
in %
)C
1.1
5.6
1.9
0.7
1.5
0.9
0.9
0.4
1.1
D3
.30
.9F
*2
.23
.21
.92
.81
0.3
5.1
1.5
2.6
4.3
5.6
2.7
G0
.9H
*2
.81
2.7
13
.95
.65
.28
.83
.83
.41
3.0
7.1
6.6
H1
7.3
22
.22
6.9
31
.91
0.3
18
.21
2.1
20
.51
6.5
20
.21
6.2
H2
0.9
1.4
0.4
J2*
3.9
3.2
1.9
8.3
15
.52
.21
1.4
6.8
4.3
6.7
7.7
K*
3.4
7.1
4.3
4.2
1.7
2.2
5.1
6.1
2.6
3.6
K2
2.2
11
.12
.85
.21
0.4
1.1
3.3
L0
.64
.84
.61
5.3
5.2
9.5
6.8
2.6
0.9
10
.13
.6L
13
.24
.26
.90
.82
.60
.3O
*0
.62
.2O
2a5
7.0
7.1
59
.84
.6O
2a1
1.1
O3
0.9
O3e
28
.32
.8P
*5
.60
.84
.61
.70
.71
.53
.41
.70
.72
.2R
*1
.71
.92
.31
.70
.91
.6R
10
.95
.24
.41
.73
.40
.5R
1a0
.60
.80
.3R
1a1
0.6
11
.91
.12
0.4
15
.31
0.3
10
.24
8.5
34
.22
3.5
11
.63
6.0
R1b
34
.60
.80
.3R
21
0.6
7.1
2.8
11
.12
2.4
38
.08
.31
7.1
17
.42
7.3
14
.0
AA
DR
TB
IEU
pper
Mid
dle
Low
erU
pper
Mid
dle
Low
erC
aste
_DR
Cas
te _
IE
PF
HO
3O
2a
LK
K2
R1
a1
CJ2
R2
No
of
Mal
es2
92
42
21
26
16
25
53
63
61
91
16
52
13
6N
o of
Hap
loty
pes
27
24
20
62
41
44
53
36
36
18
81
65
11
31
Lin
eage
div
ersi
ty0.
995±
1.00
0 ±
0.99
9 ±
0.99
3 ±
0.99
7 ±
0.99
8 ±
1.00
0 ±
1.00
0 ±
0.99
9 ±
1.00
0 ±
0.99
9 ±
0.99
9±0
.01
00
.01
20
.00
00
.01
20
.00
10
.00
30
.00
60
.00
60
.00
00
.02
20
.00
40
.00
1M
PD
14.1
9 ±
13
.91±
12.6
6±7.
37 ±
11.5
5±13
.13
±12
.93
±14
.18
±12
.05±
13.4
5±14
.04
±13
.04±
6.5
56
.47
5.7
33
.56
5.2
66
.00
5.9
66
.51
5.4
76
.38
6.4
05
.90
Av.
gen
e di
vers
ity
0.70
9 ±
0.69
5 ±
0.63
3±0.
368
±0.
577±
0.69
1 ±
0.71
8 ±
0.70
9 ±
0.60
2 ±
0.70
7 ±
0.70
2 ±
0.65
2±0
.36
40
.36
00
.31
70
.19
80
.29
10
.35
00
.36
80
.36
10
.30
20
.37
60
.35
50
.32
7H
aplo
type
sha
ring
Wit
hin
2N
on
10
21
02
No
nN
on
3N
on
12
Bet
wee
nN
on
No
n2
No
nN
on
No
nN
on
No
nN
on
No
nN
on
1
Tab
le 3
: Y
-Ch
rom
osom
e m
icro
sate
llit
e d
iver
sity
wit
hin
th
e tw
elve
maj
or l
inea
ges
fou
nd
in
In
dia
400 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
also showed a decreasing geographic cline alongthe latitude; its frequency was highest (approx.50%) among the populations of Bihar and UttarPradesh, where almost 60% of the Upper andMiddle caste groups harbored R1a1 Y-chromo-somes. Out of the 191 males that carried R1a1haplogroup, 188 unique 20-YSTR haplotypes wereobserved (h=0.999). While no haplotype wasshared between populations, intra-populationvariation was observed within Jat, Bhoksha andYerukula and the mean pairwise differencebetween all the Y-STRs was found to be high(12.05) (Table 3). The median–joining networkanalysis, however, revealed that populations ofneighbouring area shared few of the haplotypes(Fig 2b). (Passarino et al., 2002) reported tworegion specific allele pattern associated withinM17 among Europeans; DYS19=15 and YCAIIa,b=19,21 was specific to the R1a1s in WesternEurope, while Eastern European R1a1s typicallyharbored allele16 for DYS19 and 19,23 for YCAIIa,b. In our dataset, although, allele 15 and 16 atDYS19 were the two most common alleles withsignificant difference in their frequency (χ2= 4.66,p<0.05), they did not reveal any specificgeographical, socio-ethnic or linguistic patternin their distribution. The TMRCA of thoseindividuals harboring R1a1 Y-chromosome isestimated around ~32 KYA (Table 4).
Haplogroup O2a-M95
Haplogroup M95, which forms the majorSouth-East Asian male lineage, (Su et al. 2000,Karafet et al 2001) accounts for 15% of the Y-chromosome variation in India. It is however,localized to the eastern part of the subcontinent,restricted among the Austro-Asiatic speakers(72.9%) and Tibeto-Burman speaking tribes(56.4%) of NE India (Table 2a). Although thishaplogroup was also detected in Indo-Europeanand Dravidian speakers (3.3% in total), itspresence in them could be sufficiently attributedto admixture from Austro-Asiatic speakingneighbors living in close vicinity. While thislineage is completely fixed in Juang, Ho andSanthal, it is observed that the frequency inTibeto-Burman tribes varied 25% in Kuki to 80%in Hmar. Surprisingly however, none of theHimalayish branch of Tibeto-Burman speakers;Nepali, Bhutia, Tharu, Jaunsari and Bhokshaharbor this haplogroup (Fig 1). Although the Y-chromosomes were rather similar (FST =0.03,
p<0.05), none of the Y-STRs were shared betweengroups (Table 3). A comparison of Indian M-95 Y-STR haplotypes with populations of SE Asiaincluding Java, Borneo, Taiwan and Malayrevealed that the Austro-Asiatic speakers ofIndian subcontinent showed closer affinity to theSE Asians than their Tibeto-Burman speakingneighbors (F
CT =0.43 vs 4.15, respectively) (data
not shown). To further investigate therelationships between O2a Y-chromosome in theAustro-Asiatic and Tibeto-Burman speakers, amedian-joining network of 27 discrete haplotypesof 151 individuals was constructed (Fig 2c). Thisnetwork exhibited two distinct clusters ofhaplotypes with considerable haplotype sharingbetween the two linguistic families; however theAustro-Asiatic speakers depicted more diversehaplotypes compared to the Tibeto-Burmans. TheTMRCA of all M95T chromosomes was estimatedto be ~35,795 years (Table 4).
Haplogroup R2-M124
Our analysis revealed that haplogroup R2characterizes 13.5% of the Indian Y-chromo-somes and its frequency among Dravidianspeakers was comparable to that of haplogroupH (20.9%) and significantly different from Indo-European and Austro-Asiatic speakers (χ2= 16.2,d=3, p<0.05). While the distribution acrossvarious geographic regions was almost uniform,significant differentiation was observed along thesocial groups (χ2= 18.7, d=3, p<0.05); a decreasinggradient was discernible as one moved up thecaste hierarchy (Table 2a). Although tribescontributed only 7.4% of the total R2 lineage, itwas proportionately distributed between theAustro-Asiatic and Dravidian tribes (Table 2b).Extensive analysis of its distribution betweennorth and south Indian populations showed thatwhile there was marginal difference among middleand lower caste groups of north India (17.1 and17.4 % respectively), a clear gradient was observedamong south Indians, where the frequencydeclined by more than one-half from lower to uppercaste groups. Analysis of 20-Y-STRs within theR2 lineage revealed that three haplotypes wereshared; one between Kamma Chaudhary andKappu Naidu, both lower caste Dravidian speakersfrom Andhra Pradesh and two within Karmali andPallar populations. Network analysis (Fig 2d)depicted that a large number haplotypes wereshared between populations of south India, while
401 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
Locus -wise variance R1a1 H C O2a R2
DYS19 0.549 0.497 1.183 0.293 0.770DYS3891 0.793 0.760 0.783 0.383 0.649DYS3892 0.960 0.872 1.400 0.620 1.211DYS390 1.173 2.246 1.983 2.314 1.375DYS 391 1.081 2.533 1.762 0.791 0.966DYS392 1.009 0.708 1.662 1.620 1.749DYS393 0.710 0.921 0.917 0.995 1.051Average 0.896 1.220 1.384 1.002 1.110Age estimates in years 32,015.31 43,556.12 49,438.78 35,795.92 39,647.96
Table 4: Y-Chromosome haplogroup variances and TMRCA estimated on seven Y-STR loci
the populations of eastern India harbored morediscrete Y-STR haplotypes. The TMRCA forM124T was estimated to be ~39,647 years (Table 4).
Haplogroup L-M11
The overall frequency of haplogroup L-M11in the Indian populations was estimated to be5.6%, while sporadic occurrence of this lineagehas earlier been described among Indo-Europeanspeakers of Caucasus, Middle East, Europe anda maximum of 4.3% in Central Asia (Semino et al.,2000; Wells et al., 2001). Dravidian speakingpopulations harbored a significantly higherpercentage of L haplogroup compared to the Indo-European speakers, 11.2 and 3.7% respectively(χ2= 3.77, d=1, p=0.05). While frequencies wererather comparable in the lower caste groups ofnorth and south, middle and upper castepopulations of south India demonstratedrelatively higher frequencies than northern castegroups (Table 2a). The L-network and high MPD(13.13) revealed results in congruence withAMOVA suggesting that no clear geographic,linguistic or social pattern could be discernedamong the Y-STR haplotypes (data not shown).
Haplogroup J2-M172
Haplogroup J2-M172 is the major lineage ofMiddle East/Mediterranean and its frequencydecreases into Europe. Among the studied Indianpopulations, M172G exhibited a total frequencyof 5.1%, where it was uniformly distributed amongthe three major linguistic families. Except for theTibeto-Burman speaking tribes of northeast India,where this lineage was totally absent, no specificcline could be deciphered among the otherseventy-three mainland populations. In the socialcategories, upper and middle caste populationsharbor a significantly higher percentage of J2
lineage (~ 10%) as compared to ~3% in lower castepopulations and tribal groups (χ2= 7.74, d=3,p=0.05). While the distribution was proportionateamong upper caste groups of north and southIndia, the difference was more discrete (6.8% vs15.5%) among middle caste groups of theseregions (Table 2b). The Near Eastern populationsharboring M172 Y-chromosomes are characteri-zed by a very high frequency of DYS388 alleleswith e”15 repeats, while more than 70% of themales examined under this study displayed alleleswith repeat motifs d”14. Network analysis showeda large number of divergent haplotypes even with7-YSTRs and only two reticulations. Lodha, theonly Austro-Asiatic tribe that harbored J2 lineagealso displayed very diverse haplotype profiles.The genetic structure estimated with AMOVAshowed that although the extent of Y-STR diffe-rentiation in the populations carrying this lineagewas approximately 3%, geography, language orposition in the social hierarchy could notstatistically delineate the studied Indian popu-lations. Because this marker is associated withthe spread of agriculture, we further estimatedvariance in the agricultural groups and observeda marginal difference of 1% in Y-chromosomes oflandowner and labourer communities harboringthe J2 lineage (data not shown).
Haplogroup O3-M122 and O3e-M134
The M122 haplogroup and its sub-lineageM134 are among the predominant and widespreadlineages of East Asia (Su et al., 2000; Karafetet al., 2001; Shi et al., 2005). In the Indian males, itwas detectable at frequencies less than 3% andwas largely restricted among the Tibeto-Burmanspeakers of North-East. It was sporadicallypresent among tribal groups of north India,particularly Tharu (Fig 1), probably due to recentadmixture with neighboring Tibeto-Burman
402 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
speakers of Nepal and China. A clear delineationalong the language family was observed in itsdistribution; where it was completely absent amongthe Tani speakers (Adi Pasi), while the Naga-Kuki-Chin branch of Tibeto-Burman speakerscontributed the entire 26.7% of O3 lineage. Themean pairwise difference between Y-STRs and thelineage diversity was low at 7.37 and 0.993,respectively, compared to its sister clade O2a.
Haplogroup K2-M70
Haplogroup K2 occurs on a M9G backgroundand is reported to occur in populations of NearEast and Europe (Underhill et al. 2000). In ourstudy, it was found only in the eastern andsouthern regions of the country, adding to anoverall frequency of 3.1%. Although it was presentin the three major linguistic families, the statistical
difference in its distribution was insignificant.However, its distribution depicted an inverserelation as one moved up the social ladder, withthe upper caste populations completely lackingthe M70 lineage in their Y-chromosomes (Table2a). This lineage was predominant amongst thelower caste groups of east (Bauri) and tribalgroups of south, particularly Yerukula, contri-buting approx. 60% of the total K2 chromosomes.Within the lineage, 36 distinct Y-STR haplotypes,with a very high mean pairwise difference (14.18)between them, depicted the absence of populationstructure due to language, geography or ethnicity.
Haplogroup C- M130 (RPS4Y711)
The RPS4Y711T forms the second major clusterin Asia and Australo-Melanesia and hasreportedly spread into North America (Wells
Fig. 1. Y-Chromosome haplogroups and their frequency distribution in different regionalpopulations of India
Indian Ocean
Arabian Sea
403 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
Fig. 2a. Median-Joining network of H haplogroup individuals, based on seven Y-STR haplotypes.Circles represent haplotypes and have an area proportional to frequency. Colour represents the fourgeographic regions of India (Red: South; Blue: North; Green: West; Yellow: East)
Fig. 2b. Median-Joining network of R1a1 haplogroup individuals, based on seven Y-STR haplotypes.Circles represent haplotypes and have an area proportional to frequency. Colour represents the fourgeographic regions of India (Red: South; Blue: North; Green: West; Yellow: East)
404 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
Fig. 2c. Median-Joining network of O2a haplogroup individuals, based on seven Y-STR haplotypes.Circles represent haplotypes and have an area proportional to frequency. Colour represents the Austro-Asiatic (Black) and Tibeto-Burman (White) linguistic families of India
Fig. 2d. Median-Joining network of R2 haplogroup individuals, based on seven Y-STR haplotypes. Circlesrepresent haplotypes and have an area proportional to frequency. Colour represents the four geographicregions of India (Red: South; Blue: North; Green: West; Yellow: East)
405 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
No. of Groups % FST* % FSC* % FCT*
Table 5: Genetic Differentiation in Indians at different levels of hierarchy based on Y-SNP Data
Total 72.89 0.271 27.11Geography 5 71.22 0.287 19.09 0.211 9.69 0.096Regional 14 71.95 0.28 13.48 0.157 14.57 0.145Language 4 69.16 0.308 15.53 0.183 15.31 0.153
a 4 69.88 0.301 17.17 0.197 12.96 0.129Social CS vs TR$ 2 69.79 0.302 21.73 0.237 8.48 0.084
b 4 71.49 0.285 21.89 0.234 6.63 0.066c 4 82.29 0.177 16.21 0.164 1.51 0.015
Castes UP vs MD vs LW# 3 83.73 0.162 14.65 0.148 1.61 0.016
*All values are statistically significant at p<0.05$ CS: Castes: TR: Tribes# UP: Upper castes; MD: Middle castes; LW: Lower Castesa: Includes Karmali and Maheli under Austro-Asiatic; Tharu under Tibeto-Burman language familyb: Includes Upper, Middle, Lower Castes and Tribesc: Includes Upper, Middle, Lower Castes and excludes Austro-Asiatic and Tibeto-Burman Tribes
Within Population Among Population Among GroupsWithin Groups
et al., 2001; Underhill et al., 2000; Karafet et al.,1999; Underhill et al., 2001). In the present study,this lineage was found spread all along the coastalbelt in populations of Maharastra, Tamilnadu,Andhra Pradesh, Orissa and West Bengal, at anaverage frequency of 1.4%, and noticeably absentin the populations of North and North-East.Although this lineage was present in high fre-quency in tribes compared to caste groups, itsdistribution in them was not statistically significant(χ2= 2.25, d=1, p>0.05). We observed 16 discrete Y-STR haplotypes, with mean pairwise differencebetween haplotypes of 13.45 (Table 3). Althoughthe total number of individuals carrying RPS4Y
711T
was too small (n=16) to make accurate evolutionaryinferences about its origin within South Asia, theTMRCA of Indian RPS4Y
711T individuals was
estimated to be ~ 49,438 years (Table 4).
Other Haplogroups Observed in Indians
Haplogroup F, which is major and the mostparaphyletic subcluster of M168 lineages wasubiquitous in its distribution along geographic,linguistic and socio-ethnic boundaries of Indiaand was observed in approximately 3% of thestudied males (Table 2a). Haplogroup D, amonophyletic branch of M168 lineage, definedby an Alu insertion and M174C mutation, on theother hand, was restricted in Bhutia and Tharutribal groups. Its presence among them is mostlikely due to gene flow from Tibet, where thishaplogroup has earlier been reported. Majorhaplogroups K*, P*, R*, R1, R1a contributeapproximately 2-3% of the total Indian Y-
chromosomes and there was no difference in itsdistribution pattern among castes or tribes oramong different geographic regions. Although, wedetected a few European–specific haplogroups Gand R1b3 in Indians, none of our studied samplesshowed the presence of haplogroup K3-M147, N-M231 or I-M170, which are the other highlypredominant haplogroups of Europe.
Genetic Structure of the Indian Populations
Analysis of molecular variance revealed thatthe extent of genetic differentiation was highamong Indians; percent variation among differentgroups added up to 27.11%, suggesting that genepool of India males was highly structured. Toidentify factor/(s) responsible for this compart-mentalization of Y-chromosomes, populationgenetic structure was analyzed on haplogroupfrequency data in a hierarchical mode: withinpopulations, among populations and amonggroups of populations, pooled according togeography, linguistic family and their socio-ethnicposition (Table 5). The amount of genetic variationamong five major geographical regions was lesserthan the percent due to variation amongpopulations within regions (Öct= 0.096 and Ösc=0.211, respectively). Further analysis revealed thatalmost 14.57% of among group variation was dueto regional boundaries which also defined theirlinguistic affinities (language sub-families). Theincrease in “among group variation” and “amongpopulations within groups” was non-significantwhen only four major linguistic families were usedas a criterion for grouping Indian populations.
406 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
High Fct and Fsc values suggest that althoughsignificant structuring occurs within the popu-lations of India, they could not be partitioned eithergeographically or linguistically. Apportion-ment ofpopulations into two broad socio-ethnic group;caste and tribes depicted only 8% differencebetween them, which further decreased to 6.63%when the caste populations were further resolvedinto upper, middle and lower groups. Amongthemselves, the caste populations were not verydifferent, harboring only 1.6% variation among them.
Genetic Relationships among the Indians andwith World Populations
To investigate the extent of genetic relation-ships among India populations, pairwise F
STdistances were estimated on the Y-haplogroupfrequencies. On the whole, populations clusteredaccording to their Y-chromosome lineages. Twodistinct clusters of Indo-European and Dravidianspeakers were discernible in the NJ tree on 87Indian populations, where except for a fewdeviations most of the populations clusteredwithin their linguistic family. Austro-Asiatic andTibeto-Burman speakers harboring O2a Y-chromosome lineage described a separate cluster,while Tharu grouped with Mara, Lai and Kukicarrying O3e lineage, and formed a branch distantfrom other Tibeto-Burman tribes (Fig. 3). MDSplot also substantiated the genetic proximity ofAustro-Asiatic and Tibeto-Burman speakers tothe populations of South East Asia. Most of theother Indian populations were closer to the Indo-European speakers of Central Asia and EasternEurope (Russia and Siberia) but distant frompopulations of Western Europe, while popu-lations of Middle East and Caucasus regionformed a separate cluster in the MDS plot. Popu-lations of Uttar Pradesh, Bihar and Punjab weremoderately distant from other Indo-Europeanspeakers, while those of Pakistan remained bet-ween Indians, Central Asia and Russia (Fig. 4).
DISCUSSION
This comprehensive study of Y-chromosomediversity within India aims to identify evolu-tionary events (founder effects, gene flow andgenetic drift) and factors (geographical, linguisticand cultural barriers) that might have produced ahigh degree (27%) of genetic differentiation amongthe Indian patrilines. Here we also evaluate some
of the suggested theories of occupation of Indiansubcontinent by modern humans and populationhistories, in the light of current molecular geneticevidences.
Phylogeography of Indian Y-Chromosomes
India is a relict area, which is likely to haveserved as an incubator during the early dispersalof modern humans out of East Africa (Quintana-Murci et al., 1999; Cann, 2001) and a treasure-house of ancient population genetic signaturesin its gene pool. This is reflected in the 24different haplogroups which were observed inthe present Y-chromosome analysis of 1434Indian males. Overall haplogroup diversity amongIndian populations was relatively high (0.893) incontrast to other European or East Asianpopulations, but was closer to that of Central Asia.This pattern of high NRY diversity (Y-SNP and Y-STR) indicates an early settlement of the Indiansubcontinent by anatomically modern humans.Four haplogroups; H= 23%; R1a1=17.5%;O2a=15% and R2=13.5%, form major paternallineage of Indians and together account for ~70%of their Y-chromosomes. Being largely restrictedto the Indian subcontinent, haplogroup H isassumed to be associated with the eastwardexpansion of M89 Y-chromosomes from theLeventine corridor, which also carried the two latePleistocene mt DNA haplogroups, U2 and U7 intoIndia31. Although the M69 Y-chromosomes areparticularly predominant among the Dravidianspeakers of south India, its fairly uniformdistribution across different regions and socio-ethnic groups of India suggests deep time depthfor these lineage clusters.
Based on the predominance of M17 lineageamong diverse linguistic families (Indo-European,Altaic, Uralic and Caucasian) and geographicregions (Central Asia, Europe, Caucasus, MiddleEast), (Wells et al., 2001; Underhill et al., 2000;Karafet et al., 1999; Rosser et al., 2000; Nebelet al., 2000) it has been associated with the Kurganculture, domestication of horses and spread ofIndo-European languages, all which supposedlyoriginated in southern Russia/Ukraine and subse-quently extended to Europe, Central Asia around3000 B.C (Wells et al., 2001). Its presence in Indiahas been linked to the “Aryan” migration andsubsequent spread of Indo-European languages,appearance of iron and Painted Grey Ware culturein North West frontier (Cavalli-Sforza et al., 1994).
407 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
Fig. 3: Genetic relationship among populations of India based on FST distancesestimated on Y-Haplogroup frequencies
408 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
However, antiquity and geographic origin of thislineage still remains contentious. Our studyreveals that this lineage is present in a significantproportion among the Indian Indo-Europeanspeakers, and is proportionately high amongupper caste groups (Table 2a). The dispersion ofthis lineage into the southern tribal groups(Kivisild et al., 2003, Cordaux et al., 2004) and thefact that it is proportionately distributed betweenthe Dravidian and Indo-European tribal groupsprovides significant evidence against any majorinflux of Indo-European speakers that could havedrastically changed the Indian male gene pool(Sahoo et al., 2006; Sengupta et al., 2006). Thehigh average STR variance (0.896) and TMRCAsupports a rapid population growth and expan-sion of M17 Y-chromosomes, which contributedM17 lineages both to Central Asian nomads andSouth Asian tribes much before the Indo-European introgression into India. Another sub-lineage of M173, R1b3-M269 is present at appre-ciable frequencies 14.5% in Turkey (Cinnioglu etal. 2004) and at considerable frequency in Europe(Cruciani et al., 2002), while it is detected at rela-
tively low frequency (1.9%) in India, substantia-ting a recent and limited admixture with westEuropeans.
The observed high frequency of R2 Y-chromo-somes in Indians, which is equivalent to that ofhaplogroup H among Dravidian speakers, corro-borates previous reports suggesting its Indianorigin (Cordaux et al., 2004). The deep coalescencetime for R2 lineages, dating back to Late Pleisto-cene, supports its indigenous origin. OutsideIndia, it is found in Iran and Central Asia (3.3%)and among Roma Gypsies of Europe, known tohave historical evidence of their migration fromIndia (Wells et al., 2001). Within India, while it ispredominant in both eastern and southernregions, its distribution pattern is rather patchyin east (Sahoo et al., 2006). It is most likely thatgenetic drift or bottleneck has reduced thepaternal diversity of Karmali, which contributes28% of the eastern R2 lineages. This populationalthough considered to be Austro-Asiatic speak-er, does not present any evidence of O2a Y-chromosome lineage, portraying a distinctlydifferent history.
Fig. 4. Genetic relationship between populations of India and world estimated fromY-Chromosome haplogroup frequencies represented in MDS plot
409 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
On an average, the patterns of NRY haplo-group variation of Indians reflect that populationsof the subcontinent are not very distinct fromeach and probably have share a few commonpaternal ancestors. The lineage diversity wassmall for Austro-Asiatic and Tibeto-Burmanspeakers and most of them harbored singlelineage, indicating a founder paternal source forthese endogamous groups, which are confinedto the eastern and north-eastern regions of India.Haplogroup diversities were rather high forpopulations of south India (average of 0.740),giving concordant evidences of a relative earlysettlement, growth and expansion of populationsliving in southern India.
Traces of Ancient Migration of Modern Humans
Recent studies provide substantial evidencesin favor of the southern route hypothesis for thedispersal of modern human ~ 60-75 kya from thehorn of Africa along the tropical coast of IndianOcean to reach insular South East Asia andOceania (Cann, 2001; Stringer, 2000). A strong Y-chromosome support to this model is thedistribution of haplogroup C lineages in Asia(Kivisild et al., 2003), Australo-Melanesia andNorth America (Karafet et al., 1999). Althoughpresent in low frequencies in Indian subcontinent,(Bamshad et al., 2001; Sengupta et al., 2006;Kivisild et al., 2003; Cordaux, 2004; Wells, 2006;Ramana et al., 2001) it is largely distributed alongthe coastal regions, with a few patchy occu-rrences in Punjab. However, the persistence ofM130 lineages mostly among the south Indians,Pakistan (Qamar et al., 2002) and Sri Lanka(Kivisild et al., 2003) provides indirect evidencein support of the southern route of migration byearly modern humans. We have previouslysuggested that lack of haplogroup C sub-lineages(M217, M38 and M8) is indicative of theindigenous origin of most Indian populations andargues against the theory of Aryan migration fromCentral Asia (Sahoo et al., 2006). The presentanalysis showed none of the deletions (DYS390.1or DYS 390.3) associated with Australian orPolynesian C* chromosomes, contesting theclaims of link between India and Australianaboriginals (Redd et al., 2002). The age estimateof approximately 49KYA years in Indian samplesindicates a probable Indian origin of this lineage.However, until further analysis and age estimatesin other world populations are known, it cannot
be conclusively proven if RPS4YT mutation arosein India or arrived with the earliest migrants afterit arose somewhere in west Asia, from where itwas finally lost or diluted during the Upper Paleo-lithic expansion of modern humans (Underhillet al., 2001).
Genesis of Caste Structure and Influence ofMigrations on the Indian Gene Pool
The main feature of Indian society is that it ishighly structured by social factors such as castesystem, in which birth determines the position inthe society, mode of subsistence (occupation),and choice of marriage partners. However, genesisof caste system in India is ambiguous, since manyof the caste groups are known to have tribalorigins (Kosambi, 1964). Further the migration ofGreeks, Huns, Arabs, Chinese, Turks, Persians,Portuguese and others have made understandingthe nature of population structure more complex.mt DNA analysis from different geographic regionand social status showed that maternal haplo-groups in India are derived from a limited numberof founder lineages of M and N clades supportinga common proto-Asian ancestry with limited geneflow from later migrants (Kivisild et al., 2003; Basuet al., 2003). Our study reveals that there is virtuallyno genetic difference in the Y-chromosomesbetween the caste groups and tribes (Table 5).Whatever minor difference is present is largely dueto haplogroup O2a, contributed exclusively by theAustro-Asiatic and Tibeto-Burman tribes. Ourpresent analysis (AMOVA and haplogroupfrequency distribution in populations excludingthe Austro-Asiatic tribes of Jharkhand and Orissaand Tibeto-Burman tribes from Northeast)provides congruent evidence in support to thehypothesis that populations in India largely derivetheir gene pool from the common Pleistocenesettlers. High frequency of J2 and R1a1 lineagesmirror a greater influence of Indo-Europeanmigrants on upper caste populations of Gangeticplains compared to the peninsular southernregions. However, these skewed frequencies alsosuggest that the indigenous populations receivedlimited external gene flow Europe, Central and WestAsia. This is also supported by mt DNA haplo-groups that depict Indian-specific lineages with alimited contribution from both west and eastEurasian populations (Metspalu et al., 2004). Asimilar trend of J2 and R1a1 among castepopulations would probably provide a simplistic
410 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
assumption that agriculture was brought along withcaste system by the Indo-European speakers as aresult of demic diffusion of early farmers fromsouthwestern Iran, Fertile Cresent and Anatolia(Quintana-Murci et al., 2001; Cordaux et al., 2004).However, the absence of other Neolithic markersof early farmers, M35 and M201; that are prevalentin Europe, Anatolia, South Caucasus and Iran(Semino et al., 2000; Underhill et al., 2001) amongIndians, in addition to the frequency of M172 insouthern and western India and its persistence insouth India and tribal groups (Table 2a) questionsthe validity of this hypothesis. Agriculture in Indiaprobably arose as two independent events; onethat was a consequence of earliest migration thatbrought the Dravidian speakers and another muchlater through spread of rice cultivators from SEAsia (Fuller, 2003; Diamond et al., 2003).
Insights into Origin of Austro-Asiatic andTibeto-Burman Speakers
The origin of two language families, Austro-Asiatic and Tibeto-Burman in India is of particularinterest and has received considerable attention(Basu et al., 2003; Cordaux et al., 2004). In thepresent study, analysis of eleven Austro-Asiaticand seven Tibeto-Burman tribes from the easternand north-eastern region of India, establishes thatthe male gene pool of these groups are distinctlydifferent from other mainland tribal populations(Table 2a). An overall low Y-STR haplotypediversity and complete fixation of O2a in some ofthe aboriginal tribes (Ho, Santhal, Juang, Birhorand Munda) suggests that these tribes probablyexperienced a major demographic event, such ascommon founder effect followed by a bottleneckthat greatly reduced the Y-chromosome diversityin the Austro-Asiatic tribes of eastern India (Table1). In contrast to Austro-Asiatic tribes, the Tibeto-Burman tribes harbor both O2a and O3e lineagesin their Y-chromosomes. Interestingly, the twobranches of the Tibeto-Burman language;Himalayish and Naga-Kuki-Chin, could bedistinctly identified from their Y-chromosomes.The former depicts influence of Tibetan gene pool,marked by the presence of Haplogroup D lineagesin Bhutia and Tharu (Sahoo et al., 2006), while theother linguistic branch harbors O3e lineages. Thepredominance of O haplogroup and its sub-lineages in populations of East Asia suggest aSE Asian origin of Indian Austro-Asiatic andTibeto-Burman speakers. We hypothesize that
the Tibeto-Burman speakers came as a number ofmigratory events, while the Austro-Asiatic tribesprobably arrived in India as a single event. Thetwo groups probably migrated into India atdifferent time period is evident from the absenceof O3e lineages among Austro-Asiatic speakers,which probably are the earliest immigrants of thetwo. Presence of an Austro-Asiatic speaking tribe,Khasi, among the Tibeto-Burman speaking neigh-bors in the northeast corroborates this assump-tion. While the Tibeto-Burman speakers broughtin a number of East Asian maternal lineages (A,B5b, F1b, M8c, M8z) (Metspalu et al., 2004),absence of these lineages in Austro-Asiatic tribes(Thangaraj et al., 2005; Sahoo, 2006a) portraystwo different scenarios. First, the earlier exodusfrom South East Asia was probably a major male–mediated migration into India, or that the femalegene pool of the migrating East Asians is comple-tely lost among the Austro-Asiatic tribes. Addi-tional confirmation to this hypothesis is providedwith evidences of agricultural expansions fromtheir homelands in China, at different times andover different geographic ranges. Austro-Asiaticsare presumed to have spread west and south fromsouthern China into the Indian subcontinent andMalay Peninsula and brought rice cultivation withthem (Higham, 2003; Bellwood, 2004). The geneticevidence revealed in this study is consistent withanthropological records (Guha, 1935), whichsuggests that Sino-Tibetans dispersed from theYellow River and came into India through twodifferent routes; one from Burma probablybrought the Naga-Kuki-Chin language and O3eY-chromosomes and the other from Himalayas,which carried the YAP lineages into northernregions of subcontinent.
Age of Human Occupation in India- Austro-Asiatic or Dravidians as First Settlers?
Based on socio-cultural and linguisticevidences (Thapar, 1995; Pattanayak, 1998) andresults based on mt DNA HVSI nucleotidediversity and highest frequencies of mito-chondrial M haplogroup (Roychoudhary et al.,2001; Basu et al., 2003), it was asserted thatAustro-Asiatic tribes are the earliest settlers inIndia. The present comprehensive Y-chromosomeanalysis, which includes populations of alllinguistic and socio-ethnic affiliations, however,suggests people of south India as the originalsettlers of the subcontinent. The total lineage
411 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
diversity and distribution of Indian-specific Y-chromosome haplogroups (H, L, C, R1a1 andR2) in different geographical and socio-linguistic layers of the Indian populationsprovides substantial support in favor of thishypothesis. This theory also gathers adequateevidence from presence of the coastal marker,RPS4Y, in the south Indian tribes, who probablyrepresent remnants of the modern humanmigration out of Africa that took the southernroute to Australia. Any possibility that Austro-Asiatic speakers could have dispersed fromIndia is also eliminated based on the differentialdistribution of O2a Y-chromosomes in southernChina and India and the complete absence ofEast-Asian specific mt DNA lineages in Austro-Asiatic and Dravidian speakers of India. mt DNAhaplogroups of Indian Austro-Asiatic speakersare instead, probably a sub-group of theirDravidian neighbors (unpublished data,Kashyap et al). Recent archeo-logical andlinguistic evidences corroborate a Neolithicexpansion of Austro-Asiatic languages fromYangtze River basin (Higham, 2003) and ourpresent study supports an east-west clinalexpansion of Austro-Asiatic males from SouthEast Asia, which was not associated with anyfemale gene flow. Further, deeper coalescenceage for the Y-chromosome haplogroups C, H,R2 compared to O2a is consistent withhypothesis that Austro-Asiatic speakers cannotbe consi-dered as the earliest settlers of SouthAsia.
CONCLUSIONS
We find that genetic variation in India ischaracterized by a high Y-chromosomediversity, which is reflected by a greatercorrespondence with linguistic groups of India.Our results demonstrate India as a hotspot bothas an important source and recipient of major Y-chromosome lineages of the world. Haplogroupdistribution and AMOVA results provide tandemevidence in support a common Pleistoceneorigin of Indian populations, which wassubsequently followed by migrations of Austro-Asiatic speaking tribal males from SE Asia. TheTibeto-Burman populations were later migrantswho took two different routes and carried bothmale and female lineages specific to East Asia.Based on deep coalescence age estimates of H,R2 and C Y-chromosome lineages, their diversity
and distribution pattern, our data suggests anearly Pleistocene settlement of South Asia byDravidian speaking south Indian populations; theAustro-Asiatic speakers migrated much later fromSE Asia and probably contributed only paternallineages while amalgamating with the aboriginalpopulations of the region.
ACKNOWLEDGEMENTS
We express our appreciation to all the originaldonors who made this study possible. This studywas made possible through facilities provided atCFSL, Kolkata. We acknowledge all researcherswhose valuable data was used for this study. TheSS, AS, JB, MT, SG, RR, RA are grateful to theDirectorate of Forensic Sciences, MHA for theSenior Research Fellowship. GHB and TS arerecipients of Senior Research Fellowship fromCSIR, India. This research was supported by afinancial grant to CFSL, Kolkata under the Xth FiveYear Plan of the Govt. of India.
Electronic –Database InformationURLs for the data mentioned in this article are
as follows:XL STAT pro 7.5, http://www.xlstat.comNetwork 4.1, http://www.fluxus-engineering. comhttp://www.ethnologue.com
REFERENCES
Bamshad, M., Kivisild, T., Watkins, W.S., Dixon, M.E.,Ricker, C.E., Rao, B.B., Naidu, J.M., Prasad, B.V.,Reddy, P.G., Rasanayagam, A., Papiha, S.S., Villems,R., Redd, A.J., Hammer, M.F., Nguyen, S.V., Carroll,M.L., Batzer, M.A. and Jorde, L.B.: Genetic evidenceon the origins of Indian caste populations. GenomeRes., 11: 994-1004 (2001).
Bandelt, H.J., Forster, P. and Rohl, A.: Median-joiningnetworks for inferring intraspecific phylogenies. Mol.Biol. Evol., 16: 37-48 (1999)
Basu, A., Mukherjee, N., Roy, S., Sengupta, S., Banerjee,S., Chakraborty, M., Dey, B., Roy, M., Roy, B.,Bhattacharyya, N.P., Roychoudhury, S., Majumder,P.P.: Ethnic India: A genomic view, with specialreference to peopling and structure. Genome Res.,13: 2277-2290 (2003).
Bellwood P (2004) Tracking the spreads of farming beyondthe fertile Crescent: Europe and Asia. In: FirstFarmers: The Origins of Agricultural Societies. pp87 (2003).
Butler, J.M., Schoske, R., Vallone, P.M., Kline, M.C., Redd,A.J. and Hammer, M.F.: A novel multiplex forsimultaneous amplification of 20 Y chromosome STRmarkers. Forensic Sci. Int., 129: 10-24 (1989) (2002).
Cann, R.L.: Genetic clues to the dispersal of humanpopulations: Retracing the past from the present.Science, 291: 1742-1748 (2001).
412 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
Cavalli-Sforza, L.L., Menozzi, P. and Piazza, A.: TheHistory and Geography of Human Genes. PrincetonUniversity Press, Princeton, pp 208-213 (1994).
Cinnioglu, C., King, R., Kivisild, T., Kalfoglu, E., Atasoy,S., Cavalleri, G.L., Lillie, A.S., Roseman, C.C., Lin,A.A., Prince, K., Oefner, P.J., Shen, P., Semino, O.,Cavalli-Sforza, L.L. and Underhill, P.A.: ExcavatingY-chromosome haplotype strata in Anatolia. Hum.Genet., 114: 127-148 (2004).
Cordaux, R., Aunger, R., Bentley, G., Nasidze, I., Sirajuddin,S.M. and Stoneking, M.: Independent origins ofIndian caste and tribal paternal lineages. Curr. Biol.,14: 231-235 (2004).
Cordaux, R., Deepa, E., Vishwanathan, H. and Stoneking,M.: Genetic evidence for the demic diffusion ofagriculture to India. Science, 304: 1125 (2004).
Cordaux, R., Weiss, G., Saha, N. and Stoneking, M.: Thenortheast Indian passageway: A barrier or corridorfor human migrations? Mol. Biol. Evol., 21: 1525-1533 (2004).
Cruciani, F., Santolamazza, P., Shen, P., Macaulay, V.,Moral, P., Olckers, A., Modiano, D., Holmes, S.,Destro-Bisol, G., Coia, V., Wallace, D.C., Oefner,P.J., Torroni, A., Cavalli-Sforza, L.L., Scozzari, R.,and Underhill, P.A.: A back migration from Asia tosub-Saharan Africa is supported by high-resolutionanalysis of human Y-chromosome haplotypes. Am.J. Hum. Genet., 70: 1197-1214 (2002).
Deraniyagala, S.U.: The Prehistory of Sri Lanka: AnEcological Perspective. Department of theArcheological Survey, Government of Sri Lanka,Colombo (1992).
Diamond, J. and Bellwood, P.: Farmers and their languages:the first expansions. Science, 300:597-602 (2003)
Excoffier L, Smouse PE, Quattro JM (1992) Analysis ofmolecular variance inferred from metric dis-tancesamong DNA haplotypes: application to humanmitochondrial DNA restriction data. Genetics, 131:479-491 (2000).
Forster, P., Rohl, A., Lunnemann, P., Brinkmann, C.,Zerjal, T., Tyler-Smith, C. and Brinkmann, B.: Ashort tandem repeat-based phylogeny for the humanY chromosome. Am. J. Hum. Genet., 67: 182-196(2000).
Fuller, D.: An agricultural perspective on Dravidianhistorical linguistics: archaeological crop packages,livestock and Dravidian crop vocabulary. Pp. 191-213. In: Examining the Farming/LanguageDispersal Hypothesis. P. Bellwood and C., Renfrew(Eds.) McDonald Institute for ArchaeologicalResearch, Cambridge. (2003).
Guha, B.S.: The racial affinities of the people of India.In: Census of India, 1931, Part III-Ethnographical(1935).
Higham, C.: Languages and farming dispersals: Austro-Asiatic languages and rice cultivation. In: Examiningthe Farming/Language Dispersal Hypothesis. P.Bellwood and C., Renfrew (Eds.). McDonald Institutefor Archaeological Research, Cambridge (2003).
James, H.V.A. and Petraglia, M.D.: Modern Human originsand the evolution of behavior in the later PleistoceneRecord of South Asia. Curr. Anthropol., 46 Supp:S3-S27
Karafet, T., Xu, L., Du, R., Wang, W., Feng, S., Wells,R.S., Redd, A.J., Zegura, S.L. and Hammer, M.F.:Paternal Population History of East Asia: Sources,
Patterns and Microevolutionary Processes. Am. J.Hum. Genet., 69: 615-628 (2001).
Karafet, T.M., Zegura, S.L., Posukh, O., Osipova, L.,Bergen, A., Long, J., Goldman, D., Klitz, W.,Harihara, S., de, Knijff, P., Wiebe, V., Griffiths, R.C.,Templeton, A.R. and Hammer, M.F.: Ancestral Asiansource(s) of new world Y-chromosome founderhaplotypes. Am. J. Hum. Genet., 64: 817-831(1999).
Kennedy, K.: God, Apes and Fossil Men:Paleoanthropology in South Asia. University ofMichigan Press, Ann. Arbor (2000).
Kivisild, T., Rootsi, S., Metspalu, M., Mastana, S.,Kaldma, K., Parik, J., Metspalu, E., Adojaan, M.,Tolk, H.V., Stepanov, V., Golge, M., Usanga, E.,Papiha, S.S., Cinnioglu, C., King, R., Cavalli-Sforza,L., Underhill, P.A. and Villems, R.: The geneticheritage of the earliest settlers persists both in Indiantribal and caste populations. Am. J. Hum. Genet.,72: 313-332 (2003).
Kosambi, D.D.: The Culture and Civilization of AncientIndia in Historical Outline. Vikas Publishing HousePvt. Ltd., New Delhi (1964)
Macaulay, V., Hill, C., Achilli, A., Rengo, C., Clarke, D.,Meehan, W., Blackburn, J., Semino, O., Scozzari,R., Cruciani, F., Taha, A., Shaari, N.K., Raja, J.M.,Ismail, P., Zainuddin, Z., Goodwin, W., Bulbeck, D.,Bandelt, H.J., Oppenheimer, S., Torroni, A., Richards,M.: Single, rapid coastal settlement of Asia revealedby analysis of complete mitochondrial genomes.Science, 308: 1034-1036 (2005).
Metspalu, M., Kivisild, T., Metspalu, E., Parik, J.,Hudjashov, G., Kaldma, K., Serk, P., Karmin, M.,Behar, D.M., Gilbert, M.T., Endicott, P., Mastana,S., Papiha, S.S., Skorecki, K., Torroni, A. and Villems,R.: Most of the extant mtDNA boundaries in southand southwest Asia were likely shaped during theinitial settlement of Eurasia by anatomically modernhumans. BMC Genet., 5: 26 (2004).
Misra, V.N.: Prehistoric human colonization of India. J.Biosci., 26: 491-531 (2005).
Nebel, A., Filon, D., Weiss, D.A., Weale, M., Faerman,M., Oppenheim, A. and Thomas, M.G.: High-resolution Y chromosome haplotypes of Israeli andPalestinian Arabs reveal geographic substructure andsubstantial overlap with haplotypes of Jews. Hum.Genet., 107: 630-641 (2000).
Passarino, G., Cavalleri, G.L., Lin, A.A., Cavalli-Sforza,L.L., Borresen-Dale, A.L. and Underhill, P.A.:Different genetic components in the Norwegianpopulation revealed by the analysis of mtDNA andY chromosome polymorphisms. Eur. J. Hum. Genet.,10: 521-529 (2002).
Pattanayak, D.P.: The language heritage of India. pp: 95-99. In: The Indian Human Heritage. Balasubramanianand N.A., Rao (Eds.), Hyderabad (1998).
Qamar, R., Ayub, Q., Mohyuddin, A., Helgason, A.,Mazhar, K., Mansoor, A., Zerjal, T., Tyler-Smith,C. and Mehdi, S.Q.: Y-chromosomal DNA variationin Pakistan. Am. J. Hum. Genet., 70: 1107-1124(2002).
Quintana-Murci, L., Krausz, C., Zerjal, T., Sayar, S.H.,Hammer, M.F., Mehdi, S.Q., Ayub, Q., Qamar, R.,Mohyuddin, A., Radhakrishna, U., Jobling, M.A.,Tyler-Smith, C. and McElreavey, K.: Y-chromosomelineages trace diffusion of people and languages in
413 EARLIEST SETTLERS OF INDIAN SUBCONTINENT
southwestern Asia. Am. J. Hum. Genet., 68: 537-542 (2001).
Quintana-Murci, L., Semino, O., Bandelt, H.J., Passarino,G., McElreavey, K. and Santachiara-Benerecetti,A.S.: Genetic evidence of an early exit of Homosapiens sapiens from Africa through eastern Africa.Nat. Genet., 23: 437-441 (1999).
Ramana, G.V., Su, B., Jin, L., Singh, L., Wang, N.,Underhill, P. and Chakraborty, R.: Y-chromosomeSNP haplotypes suggest evidence of gene flowamong caste, tribe, and the migrant Siddipopulations of Andhra Pradesh, South India. Eur. J.Hum. Genet., 9: 695-700 (2001).
Redd, A.J., Roberts-Thomson, J., Karafet, T., Bamshad,M., Jorde, L.B., Naidu, J.M., Walsh, B. andHammer, M.F.: Gene flow from the Indiansubcontinent to Australia: evidence from the Ychromosome. Curr. Biol., 12: 673-677 (2002).
Renfrew, C.: The origins of Indo-European languages.Sci. Am., 261: 82-90 (1989)
Rosser, Z.H., Zerjal, T., Hurles, M.E., Adojaan, M.,Alavantic, D., Amorim, A., Amos, W., et al.: Y-chromosomal diversity in Europe is clinal andinfluenced primarily by geography, rather than bylanguage. Am. J. Hum. Genet., 67: 1526-1543(2000).
Roychoudhary, S., Roy, S., Basu, A., Banerjee, R.,Vishwanathan, H., Usha Rani, M.V., Sil, S.K., Mitra,M., Majumder, P.P.: Genomic structures andpopulation histories of linguistically distinct tribalgroups of India. Hum. Genet., 109: 339-50 (2001).
Sahoo, S. and Kashyap, V.K.: Phylogeography ofmitochondrial DNA and Y-Chromosomehaplogroups reveal asymmetric gene flow inpopulations of Eastern India. Am. J. Phys.Anthropol. (in press) (2006).
Sahoo, S., Singh, A., Himabindu, G., Banerjee, J.,Sitalaximi, T., Gaikwad, S., Trivedi, R., Endicott,P., Kivisild, T., Metspalu, M., Villems, R. andKashyap, V.K.: A prehistory of Indian Ychromosomes: evaluating demic diffusion scenarios.Proc. Natl. Acad. Sci. U S A., 103: 843-848 (2006).
Sambrook, J., Fritsch, E.F., Maniatis, T.: MolecularCloning. A Laboratory Manual. 2nd Ed. CSHL Press,Cold Spring Harbor, N.Y. (1989).
Schneider, S., Roessli, D. and Excoffier, L.: ARLEQUINver 2.0. A Software for Population Genetics DataAnalysis., Genetics and Biometry Laboratory,University of Geneva, Switzerland
Semino, O., Passarino, G., Oefner, P.J., Lin, A.A.,Arbuzova, S., Beckman, L.E., De, Benedictis. G.,Francalacci, P., Kouvatsi, A., Limborska, S.,Marcikiae, M., Mika, A., Mika, B., Primorac, D.,Santachiara-Benerecetti, A.S., Cavalli-Sforza, L.L.and Underhill, P.A.: The genetic legacy of PaleolithicHomo sapiens sapiens in extant Europeans: a Ychromosome perspective. Science, 290: 1155-1159(2000).
Sengupta, S., Zhivotovsky, L.A., King, R., Mehdi, S.Q.,Edmonds, C.A., Chow, C.E., Lin, A.A., Mitra, M.,
Sil, S.K., Ramesh, A., Usha Rani, M.V., Thakur, C.M.,Cavalli-Sforza, L.L., Majumder, P.P. and Underhill,P.A.: Polarity and temporality of high-resolution Y-chromosome distributions in India identify bothindigenous and exogenous expansions and revealminor genetic influence of central Asian pastoralists.Am. J. Hum. Genet., 78: 202-221 (2006).
Shi, H., Dong, Y.L., Wen, B., Xiao, C.J., Underhill, P.A.,Shen, P.D., Chakraborty, R., Jin, L. and Su, B.: Y-chromosome evidence of southern origin of the EastAsian-specific haplogroup O3-M122. Am J HumGenet., 77: 408-419 (2005).
Singh, K.S: India’s Communities. National Series. Peopleof India. Oxford University Press, New Delhi (1998)
Stringer, C.: Coasting out of Africa. Nature. 405: 24-25(2000).
Su, B., Xiao, C., Deka, R., Seielstad, M.T., Kangwanpong,D., Xiao, J., Lu, D., Underhill, P., Cavalli-Sforza, L.,Chakraborty, R., Jin, L.: Y chromosome haplotypesreveal prehistorical migrations to the Himalayas.Hum Genet., 107: 582-590 (2000).
Thangaraj, K., Sridhar, V., Kivisild, T., Reddy, A.G.,Chaubey, G., Singh, V.K., Kaur, S., Agarawal, P., Rai,A., Gupta, J., Mallick, C.B., Kumar, N., Velavan,T.P., Suganthan, R., Udaykumar, D., Kumar, R.,Mishra, R., Khan, A., Annapurna, C. and Singh, L.:Different population histories of the Mundari- andMon-Khmer-speaking Austro-Asiatic tribes inferredfrom the mtDNA 9-bp deletion/insertion polymor-phism in Indian populations. Hum. Genet., 116: 507-517 (2005).
Thapar, R.: The first millennium B.C. in the northernIndia. Pp: 80-141. In: Recent Perspective of EarlyIndian History. R Thapar (Ed.) Bombay (1995).
Underhill, P., Passarino, G., Lin, A.A., Shen, P., Mirazón,Lahr, M., Foley, R.A., Oefner, P.J. and Cavalli-Sforza,L.L.: The phylogeography of the Y chromosomebinary haplotypes and the origins of modern humanpopulations. Ann. Hum. Genet., 65: 43-62 (2001).
Underhill, P.A., Shen, P., Lin, A.A., Jin, L., Passarino, G,.Yang, W.H., Kauffman, E., Bonne-Tamir, B.,Bertranpetit, J., Francalacci, P., Ibrahim, M., Jenkins,T., Kidd, J.R., Mehdi, S.Q., Seielstad, M.T., Wells,R.S., Piazza, A., Davis, R.W., Feldman, M.W., Cavalli-Sforza, L.L. and Oefner, P.J.: Y chromosome sequencevariation and the history of human populations. Nat.Genet., 26: 358-361 (2000).
Wells, R.S., Yuldasheva, N., Ruzibakiev, R., Underhill, P.,Evseeva, I., Blue-Smith, J., Jin, L., et al.: The Eurasianheartland: A continental perspective on Y-chromosome diversity. Proc. Natl. Acad. Sci. USA98: 10244-10249 (2001).
Zhivotovsky, L.A., Underhill, P.A., Cinnioglu, C., Kayser,M., Morar, B., Kivisild, T., Scozzari, R., Cruciani, F.,Destro-Bisol, G., Spedini, G., Chambers, G., Herrera,R.J., Yong, K.K., Gresham, D., Tournev, I., Feldman,M.W. and Kalaydjieva, L: The Effective MutationRate at Y Chromosome Short Tandem Repeats, withApplication to Human Population-Divergence Time.Am. J. Hum. Genet., 74: 50-61 (2004).
414 R. TRIVEDI, SANGHAMITRA SAHOO, ANAMIKA SINGH, G. HIMA BINDU ET AL.
KEYWORDS Population Genetics. People of India. Linguistic Groups. Migration
ABSTRACT Paleoanthropological evidence indicates that modern humans reached South Asia in one of the firstdispersals out of Africa, which were later followed by migrations from different parts of the world. The variation of38 binary and 20 microsatellite polymorphisms on the non-recombining part of the Y-chromosome was examined in1434 male subjects of 87 different populations of India to investigate various hypothesis of migration and peoplingof South Asia. This survey revealed a total of 24 paternal lineages, of which haplogroups H, R1a1, O2a and R2accounted for approximately 70% of the Indian Y-Chromosomes. The high NRY diversity value (0.893) andcoalescence age of approx. 45-50 KYA for H and C haplogroups indicated an early settlement of the subcontinent bymodern humans. Haplogroup frequency and AMOVA results provide congruent evidence in support a commonPleistocene origin of Indian populations, with limited influence of Indo-European gene pool on the Indian society.The differential Y-chromosome and mt DNA pattern in the two Austric speakers of India signaled that an earliermale–mediated exodus from South East Asia largely involved the Austro-Asiatic tribes, while the Tibeto-Burmanmales migrated with females through two different routes; one from Burma probably brought the Naga-Kuki-Chinlanguage and O3e Y-chromosomes and the other from Himalayas, which carried the YAP lineages into northernregions of subcontinent. Based on distribution of Y-chromosome haplogroups (H, C, O2a, and R2) and deep coalescingtime depths for these paternal lineages, we suggest that the present day Dravidian speaking populations of South Indiaare the descendants of earliest Pleistocene settlers while Austro-Asiatic speakers came from SE Asia in a latermigration event.
Authors’ Addresses: R. Trivedi, Sanghamitra Sahoo, Anamika Singh, G. Hima Bindu, Jheelam
Banerjee, Manuj Tandon, Sonali Gaikwad, Revathi Rajkumar, T. Sitalaximi, Richa Ashma, R. Trivediand V. K. Kashyap, National DNA Analysis Centre, Central Forensic Science Laboratory, 30, GorachandRoad, Kolkata 700 014, West Bengal, IndiaSanghamitra Sahoo and V. K. Kashyap, National Institute of Biologicals, A-32, Sector 62, InstitutionalArea, NOIDA 201307, Uttar Pradesh, IndiaTelephone: +91-120-2400027, Fax: +91-120-2403014, E-mail: [email protected]. B. N. Chainy, PG Department of Biotechnology,Vani Vihar, Utkal University, Bhubaneswar 751 004,Orissa, India
© Kamla-Raj Enterprises 2007 Anthropology Today: Trends, Scope and ApplicationsAnthropologist Special Volume No. 3: 393-414 (2007) Veena Bhasin & M.K. Bhasin, Guest Editors