atkinsongray2006a how old is ie lang fam

Upload: anthony-lee-zhang

Post on 09-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    1/20

    91

    HowOldistheIndoEuropeanLanguageFamily?

    Chapter8

    HowOldistheIndoEuropeanLanguageFamily?IlluminationorMoreMothstotheFlame?

    QuentinD.Atkinson&RussellD.Gray

    European(thehypothesizedancestralIndoEuropeantongue)withtheKurgancultureofsouthernRussiaandtheUkraine.TheKurganswereagroupofseminomadic,pastoralist,warriorhorsemenwhoexpandedfromtheirhomelandintheRussiansteppesduring

    thefiPhandsixthmillennia,conqueringDanubianEurope,CentralAsiaandIndia,andlatertheBalkansandAnatolia.Thisexpansionis thoughttoroughlymatchtheacceptedancestralrangeofIndoEuropean(Trask1996).As well as the apparent geographicalcongruencebetweenKurganandIndoEuropeanterritories,thereislinguisticevidenceforanassociation

    betweenthetwocultures.WordsforsupposedKurgantechnologicalinnovationsarenotablyconsistentacrosswidelydivergentIndoEuropeansubfamilies.Theseincludetermsforwheel(*rotho,*kW(e)kWlo),axle(*akslo),yoke(*jugo),horse(*ekwo)andtogo,transportinavehicle( *wegh:Mallory1989;Campbell2004).Itisarguedthatthesewordsandas

    sociatedtechnologiesmusthavebeenpresentintherotoIndoEuropeancultureandthattheywerelikelytohavebeenKurganinorigin.Hence,theargumentgoes,theIndoEuropeanlanguagefamilyisnoolderthan50006000.Mallory(1989)arguesforasimilartimeandplaceofIndoEuropeanoriginaregionaroundtheBlackSeaabout50006000 (althoughhe is more cautious and refrains from identifyingrotoIndoEuropeanwithaspecificculturesuchastheKurgans).

    The second theory, proposed by archaeologistColin Renfrew (1987), holds that IndoEuropeanlanguagesspread,notwithmaraudingRussianhorsemen,butwiththeexpansionofagriculturefromAna

    toliabetween8000and9500yearsago.RadiocarbonanalysisoftheearliestNeolithicsitesacrossEuropeprovidesafairlydetailedchronologyofagriculturaldispersal.ThisarchaeologicalevidenceindicatesthatagriculturespreadfromAnatolia,arrivinginGreeceatsometimeduringtheninthmillenniumandreachingasfarastheBritishIslesby5500(Gkiastaetal.2003).Renfrewmaintainsthatthelinguisticargument

    1.Anelectriclightonasummernight

    TheoriginofIndoEuropeanhasrecentlybeendescribedasoneofthemostintensivelystudied,yetstillmostrecalcitrantproblemsofhistoricallinguistics

    (Diamond&Bellwood2003,601).Despiteover200yearsofscrutiny,scholarshavebeenunabletolocatetheoriginofIndoEuropeandefinitivelyintimeorplace.Theorieshavebeenputforwardadvocatingagesrangingfrom4000to23,000years(Oe1997),withhypothesized homelandsincluding Central Europe(Devoto1962),theBalkans(Diakonov1984),andevenIndia(Kumar1999).Mallory(1989)acknowledges14distincthomelandhypothesessince1960alone.Herathercolourfullyremarksthat

    thequestfortheoriginsoftheIndoEuropeanshasallthefascinationofanelectriclightintheopenaironasummernight:ittendstoaracteveryspeciesofscholarorwouldbesavantwhocantakepento

    hand(Mallory1989,143).Unfortunately,archaeological,geneticandlinguisticresearchonIndoEuropeanoriginshassofarprovedinconclusive. Whilst numerous theories of IndoEuropean origin have been proposed, they haveprovendifficulttotest.Inthischapter,weoutlinehowtechniquesderivedfromevolutionarybiologycanbeadaptedtotestbetweencompetinghypothesesaboutthe age of the IndoEuropeanlanguage family. Wearguethatthesetechniquesareausefulsupplementtotraditional methodsin historicallinguistics.Thischapterisadevelopmentandextensionofpreviousworkontheapplicationofphylogeneticmethodstothestudy of language evolution (Gray &Atkinson

    2003;Atkinson&Gray2006;Atkinsonetal.2005).

    2.Twotheories

    TherearecurrentlytwomaintheoriesabouttheoriginofIndoEuropean.Thefirsttheory,putforwardbyMaraGimbutas(1973a,b)onthe basisof linguisticand archaeological evidence, links rotoIndo

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    2/20

    92

    Chapter8

    fortheKurgantheoryisbasedononlylimitedevidenceforafewenigmaticrotoIndoEuropeanwordforms.HepointsoutthatparallelsemanticshiPsorwidespreadborrowingcanproducesimilarwordforms

    acrossdifferentlanguageswithoutrequiringthatanancestral term was present in the protolanguage.RenfrewalsochallengestheideathatKurgansocialstructureandtechnologywassufficientlyadvancedtoallowthemtoconquerwholecontinentsinatimewhenevensmallcitiesdidnotexist.Farmorecredible, heargues,isthat rotoIndoEuropean spreadwiththespreadofagricultureascenariothatisalsothoughttohaveoccurredacrosstheacific(Bellwood1991;1994),SoutheastAsia(Glover&Higham1996)andsubSaharanAfrica(Holden2002).Onthebasisoflinguisticevidence,Diakonov(1984)alsoarguesforanearlyIndoEuropeanspreadwithagriculturebutplacesthehomelandintheBalkansapositionthat

    maybereconcilablewithRenfrewstheory.The debate about IndoEuropeanorigins thuscentresonarchaeologicalevidencefortwopopulationexpansions,bothimplyingverydifferenttimescales the Kurgan theory with a dateof 50006000 ,andtheAnatoliantheorywithadateof80009500.Onewayofpotentiallyresolvingthedebateistolookoutside the archaeological record for independentevidencethatallowsustotestbetweenthesetwotimedepths.Geneticstudiesofferonepotentialsourceofevidence.Unfortunately,duetoproblemsassociatedwithadmixture,slowratesofgeneticchangeandtherelativelyrecenttimescalesinvolved,geneticanalyseshavebeenunabletoresolvethedebate(CavalliSforza

    etal.1994;Rosser etal.2000).Anotherpotentiallineofevidenceiscontainedinthelanguagesthemselves,anditisthelinguisticevidenceweshallturntonow.

    3.ThedemiseofgloDochronologyandtheriseofcomputationalbiology

    Languages, likegenes, chronicle their evolutionaryhistory.Languages,however,changemuchfasterthangenesandsocontainmoreinformationatshallowertimedepths.Conventionalmeansoflinguisticinquiry,likethecomparativemethod,areabletoinferancestralrelationshipsbetweenlanguagesbutcannotprovideanabsoluteestimateoftimedepth.Analternativeap

    proachisMorrisSwadeshs(1952;1955)lexicostatisticsanditsderivativegloochronology.Thesemethodsuselexicaldatatodeterminelanguagerelationshipsand to estimate absolute divergence times. Lexicostatisticalmethodsinferlanguagetreesonthebasisofthepercentageofsharedcognatesbetweenlanguagesthemoresimilarthelanguages,themorecloselytheyarerelated.Wordsarejudgedtobecognateifthey

    canbeshowntoberelatedviaapaernofsystematicsound correspondences andhavesimilar meanings(see Fig. 8.1 for some examples). This informationcanbeusedtoconstructevolutionarylanguagetrees.

    Gloochronologyisanextensionofthisapproachtoestimatedivergence times under theassumptionofaglooclock,orconstantrateoflanguagechange.Thefollowingformulaecanbeusedtorelatelanguagesimilaritytotimealonganexponentialdecaycurve:

    t logC

    2logr

    wheretistimedepthinmillennia,Cisthepercentageofcognatessharedandristheuniversalconstantorrateofretention(theexpectedproportionofcognatesremaining aPer 1000 years of separation: Swadesh1955).Usually,analysesarerestrictedtotheSwadeshwordlist,acollectionof100200basicmeaningsthat

    arethoughttoberelativelyculturallyuniversal,stableand resistant to borrowing. These include kinshipterms(e.g.mother,father),termsforbodyparts(e.g.hand,mouth,hair),numeralsandbasicverbs(e.g.todrink,tosleep,toburn).FortheSwadesh200wordlist,avalueof81percentisoPenusedforr.

    Linguistshave identified a number of seriousproblemswiththegloochronologicalapproach:1. Muchoftheinformationinthelexicaldataislost

    whenwordinformationisreducedtopercentagesimilarity scores between languages (Steel etal.1988).

    2. Themethodsusedtoconstructevolutionarytreesfromlanguagedistancematriceshavebeenshown

    toproduceinaccurateresults,particularlywhereratesofchangevary(Blust2000).3. Languagesdonotalwaysevolveataconstantrate.

    Bergsland& Vogt (1962) compared presentdaylanguages with their archaic forms and foundevidence for significant rate variation betweenlanguages.Forexample,IcelandicandNorwegianwere compared to their common ancestor, OldNorse,spokenroughly1000yearsago.Norwegianhasretained81percentofthevocabularyofOldNorse,correctlysuggestinganageofapproximately1000years.However,Icelandichasretainedover95percentoftheOldNorsevocabulary,falselysuggestingthatIcelandicsplitfromOldNorseless

    than200yearsago.4. Languages do not always evolve in a treelikemanner(Batemanetal.1990;Hjelmslev1958).Borrowingbetweenlanguagescanproduceerroneous(or,inextremecases,meaningless)languagetrees.Also,widespreadborrowingcanbiasdivergencetimeestimatesbymakinglanguagesseemmoresimilar(andhenceyounger)thantheyreallyare.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    3/20

    93

    HowOldistheIndoEuropeanLanguageFamily?

    These problems have led many linguists tocompletelyabandonanyaempttoderivedatesfromlexicaldata.Forexample,Clackson(2000,451)claimsthatthedataandmethodsdonotallowthequestion

    WhenwasrotoIndoEuropeanspoken?tobeansweredinanyreallymeaningfulorhelpfulway.

    Fortunately,noneoftheseproblemsareuniquetolinguistics.It isironicthat whilst computationalmethods in historical linguistics have fallen out offavouroverthelasthalfcentury,computationalbiologyhasthrived.Inmuchthesamewayaslinguistsuseinformationaboutcurrentandhistoricallyaestedlanguagestoinfertheirhistory,evolutionarybiologistsuse DNA sequence, morphological and sometimes

    behavioural data to construct evolutionary treesof biological species. Questions of relatedness anddivergencedatesareofinteresttobiologistsjustastheyaretolinguists.Asaresultbiologistsmustalso

    dealwith the problems outlinedabove:nucleotidesequenceinformationis lostwhendataisanalyzedasdistancematrices(Steeletal.1988);distancebasedtreebuildingmethodsmaynotaccuratelyreconstructphylogeny (Kuhner & Felsenstein 1994); differentgenes(andnucleotides)evolveatdifferentratesandtheseratescanvarythroughtime(Excoffier&Yang1999);andfinally,evolutionisnotalwaystreelikeduetophenomenasuchashybridizationandhorizontalgenetransfer(Faguy&Doolile2000).

    Despitetheseobstacles,computationalmethodshaverevolutionizedevolutionarybiology.Ratherthangiving up and declaringthat timedepth estimatesareintractable,biologistshavedevelopedtechniques

    toovercomeeachproblem.Here,wedescribehowthesebiologicalmethodscanbeadaptedandappliedtolexicaldatatoanswerthequestionHowoldistheIndoEuropeanlanguagefamily?

    4.Fromwordliststobinarymatrices

    Inordertoestimatephylogeniesaccuratelyweneedtoovercometheproblemofinformationlossencounteredinlexicostatisticsandgloochronology.Thisrequiresalargedatasetwithindividualcharacterstateinformationforeachlanguage.Lexicaldataareidealbecausethereare alargenumber ofwellstudiedcharactersavailableandthesecanbe dividedintomeaningful

    evolutionaryunitsknownascognatesets(asdescribedabove,wordsarejudgedtobecognateiftheycanbeshowntoberelatedviaapaernofsystematicsoundcorrespondencesandhavesimilarmeaning).Cognatewordsfromdifferentlanguagescanbegroupedintocognatesetsthatreflectpaernsofinheritance.Owingtothepossibilityofunintuitiveormisleadingsimilaritiesbetweenwordsfromdifferentlanguages,expert

    knowledgeofthesoundchangesinvolvedisrequiredinordertomakecognacyjudgementsaccurately.Forexample, knowledge of regular sound correspondencesbetweenthelanguagesisrequiredtoascertain

    thattheEnglishwordwheniscognatewithGreekpoteofthesamemeaning.Conversely,EnglishhaveisnotcognatewithLatinhaberedespitesimilarwordformandmeaning.

    Toestimatetreetopologyandbranchlengthsaccuratelyrequiresalargeamountofdata.OurdatawastakenfromtheDyen etal.(1992)IndoEuropeanlexicaldatabase,whichcontainsexpertcognacyjudgementsfor200Swadeshlisttermsin95languages.Dyenetal.(1997)identifiedelevenlanguagesaslessreliableandhencetheywerenotincludedintheanalysispresentedhere.Threeextinctlanguages(Hiite,TocharianAandTocharianB)wereaddedtothedatabaseinanaempttoimprovetheresolutionofbasalrelationshipsinthe

    inferredphylogeny.Multiplereferenceswereusedtocorroboratecognacyjudgements(Adams1999;Gamkrelidze&Ivanov1995;Guterbock&Hoffner1986;Hoffner1967;Tischler1973;1997).Foreachmeaninginthedatabase,languagesweregroupedintocognatesets.SomeexamplesareshowninFigure8.1.

    ByrestrictinganalysestobasicvocabularysuchastheSwadeshwordlisttheinfluenceofborrowingcanbeminimized.Forexample,althoughEnglishisaGermaniclanguage,ithasborrowedaround60percentofitstotallexiconfromFrenchandLatin.However,onlyabout6percentofEnglishentriesintheSwadesh200wordlistareclearRomancelanguage

    borrowings(Embleton1986).Knownborrowingswere

    notcodedascognateintheDyenetal.database.Forexample,theEnglishwordmountainwasnotcodedascognatewithFrenchmontagne,sinceitwasobviously

    borrowedfromFrenchintoEnglishaPertheNormaninvasion.Anyremainingreticulationcanbedetectedusingbiologicalmethodssuchassplitdecomposition,which can identify conflicting signal. The issue of

    borrowinginlexicaldataisdiscussedinmoredetailbyHolden&Gray(Chapter2thisvolume;seealsoBryantetal.2005).

    WecanrepresenttheinformationinFigure8.1mostsimplyasbinarycharactersinamatrix,wherethepresenceorabsenceofaparticularcognatesetinaparticularlanguageisdenotedbya1 or0respec

    tively.Figure8.2showsabinaryrepresentationofthecognateinformationfromFigure8.1.Usingthiscodingprocedureweproducedamatrixof2449cognacy

    judgementsacross87languages.Alternativecodingmethodsarealsopossible, suchasrepresentingthedata as 200meaning categories eachwith multiplecharacter states. It has been argued that semanticcategoriesarethefundamentalobjectsoflinguistic

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    4/20

    94

    Chapter8

    change(seeEvans etal.Chapter9thisvolume)andthatbinarycodingofthepresenceorabsenceofcognatesetsisthusinappropriate.However,cognatesetsconstitutediscrete,relativelyunambiguousheritableevolutionaryunitswithabirthanddeath(seeNicholls&Gray,Chapter14thisvolume)andthereisnoreasontosupposetheyareanymoreorlessfundamentaltolanguage evolution than semantic categories. Further, coding thedata as semantic categories makes

    itdifficulttodealwithpolymorphisms(i.e.whenalanguagehasmorethanonewordforagivenmeaninge.g.forthemeaningseaGermanhasbothSeeandMeer).Italsosignificantlyincreasesthenumberofparametersrequiredtomodeltheprocessofevolution.agel(2000)pointsoutthat,ifeachwordrequiresa differentset of rateparameters,then forjust 200wordsin40languagesthereare1278parameterstoestimate.Wethususedabinarycodingofcognatepresence/absenceinformationtorepresentlinguistic

    changeinouranalysis.Aswellasavoidingtheproblemofinformationloss,analyzingcognatepresence/absenceinformationallowsustoexplicitlymodeltheevolutionaryprocess.UnlikelexicostatisticsandgloAochronology,wedonotcountthenumberofcognatessharedbetweenlanguages,nordowecalculatepairwisedistancesbetweenlanguages. Instead,thedistributionofcognatesismappedontoanevolutionarylanguagetree(seeFig.8.3),andlikelycharacterstatechangesareinferredacrossthewholetree.

    5.Modelsareliesthatleadustothetruth

    Whenbiologists model evolution, theylie: theylieabout the independence of character state changesacrosssites;theylieaboutthehomogeneityofsubstitutionmechanisms;andtheylieabouttheimportanceofselectionpressureonsubstitutionrates.Buttheseareliesthatleadustothetruth.Biologicalresearchis

    basedonastrategyofmodelbuildingandstatisticalinference that has proved highly successful (Hillis

    English here1 sea5 water9 when12

    German hier1 See5,Meer6 Wasser9 wann12

    French ici2 mer6 eau10 quand12

    Italian qui2,qua2 mare6 acqua10 quando12

    ModernGreek edo3 thalassa7 nero11 pote12

    Hiite ka4 aruna8 watar9 kuwapi12

    Figure8.1.SelectionoflanguagesandSwadeshlistterms.Cognacyisindicatedbythenumbersinsuperscript.

    Meaning here sea water when

    Cognateset 1 2 3 4 5 6 7 8 9 10 11 12

    English 1 0 0 0 1 0 0 0 1 0 0 1

    German 1 0 0 0 1 1 0 0 1 0 0 1

    French 0 1 0 0 0 1 0 0 0 1 0 1

    Italian 0 1 0 0 0 1 0 0 0 1 0 1

    Greek 0 0 1 0 0 0 1 0 0 0 1 1

    Hiite 0 0 0 1 0 0 0 1 1 0 0 1

    Figure8.2.CognatesetsfromFigure8.1expressedinabinarymatrixshowingcognatepresence(1)orabsence(0).

    Figure8.3.Characterstatesforcognatesets5(black)and6(grey)fromFigure8.1areshownmappedontoahypotheticaltree.Impliedcharacterstatechangescanthenbereconstructedonthetree.Theblackandgreybandsshowalikelypointatwhichcognatesets5and6weregained.Wecanusethisinformationtoevaluatepossibleevolutionaryscenarios.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    5/20

    95

    HowOldistheIndoEuropeanLanguageFamily?

    1992;Hillisetal.1996;agel1999).Thegoalforbiologistsisnottoconstructamodelsocomplexthatitcaptureseverynuanceandvagaryoftheevolutionaryprocess,butrathertofindthesimplestmodelavailablethatcanreliablyestimatetheparametersofinterest.

    Modelchoiceisthusabalancebetweenoverandunderfiingparameters(Burnham&Anderson1998). Adding extra parameters can improve theapparentfitofamodeltodata,however,samplingerroris also increasedbecause thereare moreunknownparameterstoestimate(Swoffordetal.1996).Dependingonthequestionwearetryingtoanswer,thisaddeduncertaintycanpreventusfromestimatingthemodelparametersfromthedatatowithinausefulmarginoferror.Inmanycases,addingjustafewextraparameterscancreateacomputationallyintractableproblem.Conversely,amodelthatistoosimplecanproducebiasedresultsifitfailstocaptureanimportantpartoftheprocess(Burnham&

    Anderson1998).Thereisthusacompromisebetweenbiasedestimatesandvarianceinflation.Thestrategythathasprovedsuccessfulinbiol

    ogyistostartwithasimplemodelthatcapturessomeof thefundamentalprocessesinvolvedand increasethecomplexityasnecessary.Forexample,nucleotidesubstitutionmodelsrangefromasimpleequalratesmodel(Jukes&Cantor1969),tomorecomplexmodelsthat allow for differences in transition/transversionrates,unequalcharacterstatefrequencies,sitespecificrates,andautocorrelationbetweensites(Swofford etal.1996).Althougheventhemostcomplicatedmodelsaresimplificationsoftheprocessofevolution,oPenthesimplestsubstitutionmodelcapturesenoughofwhat

    isgoingontoallowbiologiststoextractameaningfulsignalfromthedata.Levins(1966)givesthreereasonswhyweshoulduseasimplemodel.First,violationsoftheassumptionsofthemodelareexpectedtocanceleachotherout.Second,smallerrorsinthemodelshouldresultinsmallerrorsintheconclusions.Andthird,bycomparingmultiplemodelswithrealitywecandeterminewhichaspectsoftheprocessareimportant.

    Likelihoodevolutionarymodellinghasbecomethemethodofchoiceinphylogenetics(Swoffordetal. 1996). The likelihood approach to phylogeneticreconstructionallowsustoexplicitlymodeltheproc

    essoflanguageevolution.Themethodisbasedonthepremisethatweshouldfavourthetreetopology/topologiesandbranchlengthsthatmakeourobserveddatamostlikely,giventhedataandassumptionsofourmodeli.e.weshouldfavourthetreewiththehighestlikelihoodscore.Wecanevaluatepossibletreetopologiesforagivenmodelanddatabymodellingthesequenceofcognategainsandlossesacrossthetrees.

    Likelihoodmodelshaveanumberofadvantagesoverotherapproaches.First,wecanworkwithexplicitmodelsofevolutionandtestbetweencompetingmodels.Theassumptionsofthemethodarethusovertand easilyverifiable.Second,we canincrease

    thecomplexityof themodelas required.Forexample,asexplainedbelow,wewereabletotestfortheinfluenceofratevariationbetweencognatesetsand,asaresult,incorporatethisintotheanalysisusingagamma distribution. And third, model parameterscanbeestimatedfromthedataitself,thusavoidingrestrictiveaprioriassumptionsabouttheevolutionaryprocessesinvolved(agel1997).

    Likelihoodmodelsofevolutionareusuallyexpressedasaratematrixrepresentingtherelativeratesofallpossiblecharacterstatechanges.Here,weareinterestedintheprocessesofcognategainandloss,respectivelyrepresentedby0to1changesand1to0changesonthetree(seeFig.8.3).Wecanmodelthis

    processeffectivelywitharelativelysimpletwostatetimereversiblemodeloflexicalevolution(showninFig.8.4).Weextendedthissimplemodelbyaddingagammashapeparameter()toallowratesofchangetovarybetweencognatesetsaccordingtoagammadistribution.ThiswasimplementedaPeralikelihoodratio test (Goldman1993) showed that adding thegammashapeparametersignificantlyimprovedtheabilityofthemodeltoexplainthedata(2=108,df=1,p

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    6/20

    96

    Chapter8

    theprocessof linguistic change,weexplicitlyrejectWarnow etal.s (Chapter7 thisvolume) counselofdespair,thatlanguageevolution is so idiosyncraticand unconstrained that inferring divergence datesisimpossible.Languageevolutionissubjecttoreal

    worldconstraints,suchashumanlanguageacquisition, expressiveness, intelligibility, and generationtime.WecannothelpbutquoteRingeetal.(2002,61)onthispoint:

    Languagesreplicatethemselves(andthussurvivefromgeneration togeneration) through a processofnativelanguageacquisitionbychildren.Importantlyforhistoricallinguistics,thatprocessistightlyconstrained.

    Theseconstraintscreateunderlyingcommonalitiesintheevolutionaryprocessthatwecan,andshould,betryingtomodel.

    Evans et al. (Chapter 10 this volume) arguethatourmodelispatentlyinappropriatebecause

    it assumes that all charactersare independent. Inbiology,thisisknownastheI.I.D.(identicallyandindependently distributed) assumption. Evans etal.correctlypointoutthattheI.I.D.assumptionisviolatedwhenindividualmeaningsintheSwadeshwordlistarebrokenupintocharactersrepresentingmultiple cognate sets. Specifically, if a particularcognatesetispresentinalanguage,itwillbe less

    likelythatothercognatesetsforthesamemeaningwillalsobepresent.However,wedonotthinkthatthislackofindependencebiasesourresults.Theissueofindependencewillbedealtwithindetailin

    section10.2.

    6.Bayesianinferenceofphylogeny

    Itisnotusuallycomputationallyfeasibletoevaluatethelikelihoodofallpossiblelanguagetreesfor87languagesthereareover110155possiblerootedtrees.Further,thevastnumberofpossibilitiescombinedwithfinitedatameansthatinferringasingletreewillbemisleadingtherewillalwaysbeuncertaintyinthetopology and branchlengths. Ifwe are touse ourresults to testhypotheseswe needto useheuristicmethodstosearchthroughtreespaceandquantifythisphylogeneticuncertainty. Bayesianinference is

    analternativeapproachtophylogeneticanalysisthatallowsustodrawinferencesfromalargeamountofdata using powerful probabilistic models withoutsearchingfor the optimaltree (Huelsenbeck etal.2001).Inthisapproachtreesaresampledaccordingtotheirposteriorprobabilities.Theposteriorprobabilityofatree(theprobabilityofthetreegiventhepriors,dataandthemodel)isrelatedbyBayesstheoremtoitslikelihoodscore(theprobabilityofthedatagiventhetree)anditspriorprobability(areflectionofanyprior knowledge about tree topology that is to beincludedintheanalysis).Unfortunately,wecannotevaluatethisfunctionanalytically.However,wecanuseaMarkovChainMonteCarlo(MCMC:Metropolis

    etal.1953)algorithmtogenerateasampleoftreesinwhichthefrequencydistributionofthesampleisanapproximationoftheposteriorprobabilitydistributionofthetrees(Huelsenbecketal.2001).Todothis,weusedMrBayes,aBayesianphylogeneticinferenceprogramme(Huelsenbeck&Ronquist2001).

    MrBayes uses MCMC algorithms to searchthroughtherealmofpossibletrees.Fromarandomstartingtree,changesareproposedtothetreetopology,branchlengthsandmodelparametersaccordingtoaspecifiedpriordistributionoftheparameters.Thechangesareeitheracceptedorrejectedbasedonthe likelihood ofthe resulting evolutionaryreconstructioni.e.reconstructionsthatgivehigher

    likelihoodscores tend tobe favoured.Inthiswaythechainquicklygoesfromsamplingrandomtreestosamplingthosetreeswhichbestexplainthedata.APeraninitialburninperiod,treesbegintobesampledinproportiontotheirlikelihoodgiventhedata.Thisproducesadistributionoftrees.Ausefulwaytosummarizethisdistributioniswithaconsensustreeorconsensusnetwork(Holland&Moulton2003)

    Figure8.5.Thegammadistribution,usedtomodelratevariationbetweensites.Threepossiblevaluesforareshown.Forsmallvaluesof(e.g.=0.5),mostcognatesetsevolveslowly,butafewcanevolveathigherrates.sincreases,thedistributionbecomesmorepeakedandsymmetricalaroundarateof1i.e.ratesbecomemoreequal(e.g.=50).

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    7/20

    97

    HowOldistheIndoEuropeanLanguageFamily?

    depictinguncertaintyinthereconstructedrelationships.Thesegraphsare,however,justusefulpictorialsummariesoftheanalysis.Thefundamentaloutputoftheanalysisisthedistributionoftrees.

    TheconsensusnetworkfromaBayesiansample

    distributionof100treesisshowninFigure8.6.Thevaluesnexttosplitsgiveanindicationoftheuncertaintyassociatedwitheachsplit(theposteriorprobability,derivedfromthepercentageoftreesintheBayesiandistributionthatcontainthesplit).Forexample,thevalue41nexttotheparallellinesseparatingItalicandCeltic from the of the IndoEuropean subfamiliesindicatesthatthatsplitwaspresentin41percentof

    thetreesinthesampledistribution.Similarly,thesplitgroupingItalicandGermaniclanguageswaspresentin46percentofthesampledistribution.

    7.Ratevariationandestimatingdates

    Thereareatleasttwotypesofratevariationinlexicalevolution.First,ratevariationcanoccurbetweencognates.Forexample,evenintheSwadeshwordlist,theIndoEuropeanwordforfiveishighlyconserved(1 cognate set) whilst the word for dirty is highlyvariable(27cognatesets).Thisisakintositespecificrate variationin biology. Biologists canaccount for

    Figure8.6.ConsensusnetworkfromtheBayesianMCMCsampleoftrees.Valuesexpresstheposteriorprobabilityofeachsplit(valuesabove90percentarenotindicated).thresholdof10percentwasusedtodrawthissplitsgraphi.e.onlythosesplitsoccuringinatleast10percentoftheobservedtreesareshowninthegraph.Branchlengthsrepresentthemediannumberofreconstructedsubstitutionspersiteacrossthesampledistribution.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    8/20

    98

    Chapter8

    thistypeofratevariationbyallowingadistributionof rates.As mentionedabove, weused a model ofcognateevolutionthatallowedforgammadistributedratevariationbetweencognates.

    Second, rates of lexical evolution can varythroughtimeandbetweenlineages.Clearlythiswillcauseproblemsifwearetryingtoestimateabsolutedivergencetimesontheinferredphylogeniessinceinferredbranchlengthsarenotdirectlyproportionaltotime.Again,biologistshavedevelopedanumberofmethodsfordealingwiththistypeofratevariation.Weusedthepenalizedlikelihoodmethodofratesmoothingimplementedinr8s(version1.7;Sanderson2002a),to allow forrate variation across each tree.Sanderson(2002b)hasshownthat,underconditionsofratevariation,thepenalizedlikelihoodratesmoothingalgorithmperformssignificantlybeerthanmethodsthatassumeaconstantrateofevolution.

    Inordertoinferabsolutedivergencetimes,wefirstneedtocalibrateratesofevolutionbyconstrainingtheageofknownpointsoneachtreeinaccordancewithhistoricallyaesteddates.Forexample,theRomancelanguages(derivedfromLatin)probablybegantodivergepriortothefalloftheRomanEmpire.WecanthusconstraintheageofthenodecorrespondingtothemostrecentcommonancestoroftheRomance

    languagestowithintherangeimpliedbyourhistoricalknowledge(seeFig.8.7).Weconstrainedtheageof14suchnodesonthetreeinaccordancewithhistoricalevidence(seeAtkinson&Gray2006).Theseknown

    nodeageswerethencombinedwithbranchlengthinformationtoestimateratesofevolutionacrosseachtree.Thepenalizedlikelihoodmodelallowsratestovary across the treewhilstincorporating a roughnesspenaltythatcoststhemodelmoreifratesvaryexcessively frombranchto branch.This procedureallowsustoderiveageestimatesforeachnodeonthetree.Figure8.8showstheconsensustreefortheinitialBayesiansampledistributionof1000trees, 1with

    branchlengthsdrawnproportionaltotime.Theposteriorprobabilityvaluesaboveeachinternalbranchgiveanindicationoftheuncertaintyassociatedwitheachcladeontheconsensustree(thepercentageoftreesintheBayesiandistributionthatcontaintheclade).For

    example,thevalue67abovethebranchleadingtotheItaloCeltoGermaniccladeindicatesthatthatcladewaspresentin67percentofthetreesinthesampledistribution.We canderive ageestimates fromthistree,includinganageof 8700at the baseof thetreewithintherangepredictedbytheAnatolianfarmingtheoryofIndoEuropeanorigin.

    Asingledivergencetime,withnoestimateoftheerrorassociatedwiththecalculation,isoflimitedvalue.To testbetween historical hypotheses weneed somemeasureoftheerrorassociatedwiththedateestimates.Specifically,uncertaintyinthephylogenygivesrisetoacorrespondinguncertaintyinageestimates.Inordertoaccountforphylogeneticuncertaintyweestimatedthe

    ageatthebaseofthetreesinthepostburninBayesianMCMC sampletoproducea probabilitydistributionfortheageof IndoEuropean. Oneadvantageof theBayesian framework is that prior knowledge aboutlanguage relationships can be incorporated into theanalysis. Inordertoeliminatetreesthat conflictwithknownIndoEuropeanlanguagegroupings,theoriginal1000treesamplewasfilteredusingaconstrainttreerepresentingtheseknownlanguagegroupings[(Anatolian,Tocharian,(Greek,Armenian,Albanian,(Iranian,Indic),(Slavic,Baltic),((NorthGermanic,WestGermanic),Italic,Celtic)))].ThisconstrainttreewasconsistentwiththemajorityruleconsensustreegeneratedfromtheentireBayesiansampledistribution.Thefiltereddistribution

    ofdivergencetimeestimateswasthenusedtocreateaconfidenceintervalfortheageoftheIndoEuropeanlanguagefamily.ThisdistributioncouldthenbecomparedwiththeagerangesimpliedbythetwomaintheoriesofIndoEuropeanorigin(seeFig.8.9).TheresultsareclearlyconsistentwiththeAnatolianhypothesis.

    Notallhistoricallyaestedlanguagesplitswereusedin ouranalysis. One means of validating our

    Figure8.7.TheRomancelanguages(derivedfromLatin)probablybegantodivergepriortothefalloftheRomanEmpire.Wecanthusconstraintheageofthepointonthetreewhichcorrespondstothisdivergenceevent(2).Usingthisrationale14nodeswereconstrained,includinganIberianFrenchnode(1)andaGermanicnode(3).

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    9/20

    99

    HowOldistheIndoEuropeanLanguageFamily?

    Figure8.8.MajorityruleconsensustreefromtheinitialBayesianMCMCsampleof1000trees(Gray&tkinson

    2003).Valuesaboveeachbranchindicateuncertainty(posteriorprobability)inthetreeasapercentage.Branchlengthsareproportionaltotime.Shadedbarsrepresenttheagerangeproposedbythetwomaintheoriesthenatoliantheory(greybar)andtheKurgantheory(hatchedbar).Thebasalage(8700`)supportsthenatoliantheory.

    methodology is to produce divergence timedistributions fornodes thatwere notconstrainedin theanalysisandcomparethistothehistoricallyaested

    timeofdivergence.Forexample,Figure8.10showstheinferreddivergencetimedistributionsfortheNorthand West Germanic subgroups. The grey band in

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    10/20

    100

    Chapter8

    thesefiguresindicatesthelikelyageofeachsubgroupbasedonthehistoricalrecord.TheageestimatesfortheNorthGermaniccladecorrespondwithwrienevidenceforthebreakupoftheselanguagesbetween|}900and|}1250.Similarly,estimatedagesoftheWest Germanic clade are consistentwith historical

    evidence dating the AngloSaxon migration to theBritishIslesabout1500yearsago.

    8.Testingrobustness

    AkeypartofanyBayesianphylogeneticanalysisisanassessmentoftherobustnessoftheinferences.Todothiswetestedtheeffectofalteringanumberofdifferentparametersandassumptionsofthemethod.

    8.1.BayesianpriorsInitializingeachBayesianMCMCchainrequiredthespecification of a starting treeand prior parameters(priors)fortheanalysis.ThesampleBayesiandistribu

    tionwastheproductoftenseparaterunsfromdifferentrandomstartingtrees.Divergencetimeandtopologyresultsforeachoftheseparaterunswereconsistent.Othertestanalyseswererunusingarangeofpriorsforparameterscontrollingtheratematrix,branchlengths,gammadistribution and characterstatefrequencies.Theinferredtreephylogenyandbranchlengthsdidnotnoticeablychangewhenpriorswerealtered.

    8.2.CognacyjudgementsTheDyenetal.(1992)databasecontainedinformationaboutthecertaintyofcognacyjudgements.Wordswerecodedascognateordoubtfulcognates.Intheinitialanalysisweincludedallcognateinformationinanefforttomaximizeanyphylogeneticsignal.However,wewantedtotesttherobustnessofourresultstochangesinthestringencyofcognacydecisions.Forthisreason,theanalysiswasrepeatedwithdoubtfulcognatesexcluded.Thisproducedasimilaragerangetotheinitialanalysis,indicatingthatourresultswererobusttoerrorsincognacyjudgements(seeFig.8.11).

    Figure8.9.FrequencydistributionofbasalageestimatesfromfilteredBayesianMCMCsampleoftreesfortheinitialassumptionset(n=435).Themajorityruleconsensustreefortheentire(unfiltered)sampleisshownintheupperle.

    Figure8.10.FrequencydistributionofageestimatesfortheNorthandWestGermanicsubgroupsacrossfilteredBayesianMCMCsampleoftrees(n=433).ThegreybandsindicatethehistoricallyaAestedtimeofdivergence.

    Figure8.11.FrequencydistributionofbasalageestimatesfromfilteredBayesianMCMCsampleoftreesforanalysiswithdoubtfulcognatesexcluded(n=433).Themajorityruleconsensustreefortheentiresampleisshownintheupperle.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    11/20

    101

    HowOldistheIndoEuropeanLanguageFamily?

    8.3.CalibrationsandconstrainttreesAttheconferenceonwhichthisvolumeisbased,aquestionwasraisedabouttheageconstraintusedforInsularCeltic.Itwassuggestedthatwhilst weusedamaximumageconstraintof2750,anageconstraintof between2200 and 1800 wouldhavebeenmoresuitable.WedonotwishtoengageindebatesaboutthecorrectageoftheInsularCelticdivergence,however,areanalysisofthedatausingthesuggestedagesservestodemonstratetherobustnessofourresultstovariationsinageconstraints.Figure 8.12 shows the distribution of divergencetimes using themuch later Celticage constraints.Clearly,ourresultsarerobusttoalterationsinthisageconstraint.Infact,thestepbystepremovalofeachofthe14ageconstraintsontheconsensustreerevealedthatdivergencetimeestimateswererobusttocalibrationerrors acrossthe tree. For13 nodes,thereconstructedagewaswithin390yearsoftheoriginalconstraintrange.OnlythereconstructedageforHiiteshowedanappreciablevariationfromtheconstraintrange.Thismaybeaributabletotheeffectofmissingdataassociatedwithextinctlanguages.Reconstructed ages atthe base ofthe tree rangedfrom10,400 with theremovalof theHiiteage

    constraint,to8500withtheremovaloftheIraniangroupageconstraint.Theresultsarehighlyrobustcalibration errors because of the large number ofageconstraintsweusedtocalibrateratesoflexicalevolutionacrossthetree.

    WealsowantedtobesurethattheconstrainttreeusedtofiltertheBayesiandistributionoftreeshadnotsystematicallybiasedourresults.Figure8.13showsthedivergencetimedistributionfortheinitialdatasetaPerfilteringusingaminimumsetoftopologicalconstraints[(Anatolian,Tocharian,(Greek,Armenian,Albanian, (Iranian, Indic), (Slavic, Baltic), (NorthGermanic,WestGermanic),Italic,Celtic))].Again,thedivergencetimedistributionwasconsistentwiththeAnatolianfarmingtheory.

    8.4.MissingdataAnotherpossiblebiaswastheeffectofmissingdata.SomeofthelanguagesintheDyen etal.(1992)data

    base may have contained fewer cognates becauseinformationabouttheselanguageswasmissing.Forexample,thethreeextinctlanguages(Hiite,TocharianA&TocharianB)arederivedfromalimitedrangeofsourcetextsanditispossiblethatsomecognatesweremissedbecausethetermswerenotreferredtoin

    Figure8.12.Frequencydistributionofbasalageestimatesfromfiltered

    BayesianMCMCsampleoftreesusingrevisedCelticageconstraintofbetween1800`and2200`.Themajorityruleconsensustreefortheentiresampleisshownintheupperle.

    Figure8.13.Frequencydistributionofbasalageestimatesfromfiltered

    BayesianMCMCsampleoftreesusingminimumsetoftopologicalconstraints[(natolian,Tocharian,(Greek,rmenian,lbanian,(Iranian,Indic),(Slavic,Baltic),(NorthGermanic,WestGermanic),Italic,Celtic))](n=670).Themajorityruleconsensustreefortheentiresampleisshownintheupperle.

    Figure8.14.Frequencydistributionofbasalageestimatesfromfiltered

    BayesianMCMCsampleoftreeswithinformationaboutmissingcognatesincluded(n=620).Themajorityruleconsensustreefortheentiresampleisshownintheupperle.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    12/20

    102

    Chapter8

    thesourcetext.Thismayhavebiaseddivergencetimeestimatesbyfalselyincreasingbasalbranchlengths.Nicholls&Gray(Chapter14thisvolume)pointoutthatweshouldexpectfewercognatestobepresent

    inthelanguagesatthebaseofthetreeanywaythefactthatHiitehas94cognateswhilstmostlanguageshavearound200,doesnotnecessarilyimplythatdatais missing. Nonetheless, wetested forthe effect ofmissingdatabyincludinginformationaboutwhetherornotthewordforaparticulartermwasmissingfromthedatabase.Ifwecouldnotruleoutthepossibilitythatacognatewasabsentfromalanguagebecauseithadnotbeenfoundorrecorded,thenthatcognatewascodedasmissing(representedbya?inthematrix).Encodingmissingcognateinformationinthiswaymeansthatwecanaccountforuncertaintyinthedataitselfusingthelikelihoodmodeltheunknownstatesbecomeparameterstobeestimated.Analyzing

    thisrecodeddataalsoproducedanagerangeconsistentwiththeAnatoliantheory(seeFig.8.14).

    8.5.RootofIndoEuropeanFinally,wetestedtheeffectoftherootingpointforthetrees.Inthepreviousanalyses,treeswererootedwithHiite.Although this is consistentwith independentlinguisticanalyses(Gamkrelidze&Ivanov1995;Rexov et al. 2003), otherpotential root points arepossible.ItcouldbeclaimedthataHiiterootbiasesageestimatesinfavouroftheAnatolianhypothesis.WethusrerantheratesmoothinganalysisrootingtheconsensustreeinFigure8.8withBaltoSlavic,Greek,TocharianandIndoIraniangroups.Inallfourcases

    theestimated divergence timeincreasedtobetween9500and10,700.

    9.iscussion

    Thetimedepthestimatesreportedhereareconsistentwiththetimespredictedbyaspreadoflanguagewith the expansion of agriculture from Anatolia.The branching paern and dates of internal nodesarebroadlyconsistentwitharchaeologicalevidenceindicatingthatbetweenthetenthandsixthmillenniaaculturebasedoncerealcultivationandanimalhusbandryspreadfromAnatoliaintoGreeceandtheBalkansandthenoutacrossEuropeandtheNearEast

    (Gkiastaetal.2003;Renfrew1987).Hiiteappearstohave divergedfromthe mainrotoIndoEuropeanstockaround8700yearsago,perhapsreflectingtheinitial migration out of Anatolia. Indeed, this dateexactlymatchesestimatesfortheageofEuropesfirstagriculturalselementsinsouthernGreece(Renfrew1987). Followingthe initialsplit, the language treeshowsthe formationof separate Tocharian, Greek,

    andthenArmenianlineages,allbefore6000,withall of the remaining language families formed by4000.Wenotethatthereceivedlinguisticorthodoxy (IndoEuropean is only 6000 years old) does

    approximatelyfitthedivergencedatesweobtainedformostofthebranchesofthetree.Onlythebasal

    branches leading to Hiite, Tocharian, Greek andArmenianarewellbeyondthisage.Interestingly,thedaterangehypothesizedfortheKurganexpansiondoescorrespondtoarapidperiodofdivergenceontheconsensustree.AccordingtothedivergencetimeestimatesshowninFigure8.8,manyofthemajorIndoEuropeansubfamiliesIndoIranian,BaltoSlavic,Germanic,ItalicandCelticdivergedbetweensixandseventhousandyearsagointriguinglyclosetothehypothesizedtimeoftheKurganexpansion.ThusitseemspossiblethatthereweretwodistinctphasesinthespreadofIndoEuropean:aninitialphase,involv

    ingthemovementofIndoEuropeanwithagriculture,outofAnatoliaintoGreeceandtheBalkanssome8500yearsago;andasecondphase(perhapstheKurganexpansion)whichsawthesubsequentspreadofIndoEuropeanlanguagesacrosstherestofEuropeandeast,intoersiaandCentralAsia.

    10.Responsetoourcritics

    10.1.ThepotentialpitfallsoflinguisticpalaeontologyA number of linguists have claimed that linguisticpalaeontology offers a compelling reason why theargumentswehavepresentedmustbewrong:rotoIndoEuropeansareclaimedtohavehadawordfor

    wheel(*kW(e)kWlo)butwheelsdidnotexistinEurope9000yearsago.Thecaseisbasedonawidespreaddistributionof apparently related words forwheelinIndoEuropeanlanguages.ThisisoPenpresentedasaknockdownargumentagainstanyageofIndoEuropeanolderthan5000to6000years(whenwheelsfirstappearinthearchaeologicalrecord).However,there are at least two alternative explanations forthedistributionoftermsassociatedwithwheelandwheeledtransport.

    First,independent semanticinnovationsfromacommonrootarealikelymechanismbywhichwecanaccountforthesupposedrotoIndoEuropeanreconstructionsassociated withwheeled transport

    (Trask1996;Watkins1969).Linguistscanreconstructwordformswithmuchgreatercertaintythantheirmeanings. Forexample,uponthe developmentofwheeledtransport,wordsderivedfromtherotoIndoEuropeanterm*kwel(meaningtoturn,rotate)mayhavebeenindependentlycooptedtodescribethewheel.On thebasisof thereconstructedagesshowninFigure8.8,asfewasthreesuchsemantic

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    13/20

    103

    HowOldistheIndoEuropeanLanguageFamily?

    innovationsaroundthesixthmillenniumcouldhaveaccountedfortheaesteddistributionoftermsrelatedto*kW(e)kWlowheel(oneshiPjustbeforethe break up of the ItalicCelticGermanicBalto

    SlavicIndoIranianlineage,oneshiPintheGreekArmenianlineage,andoneshiP[orborrowing]intheTocharianlineage).

    The second possible explanation for the distributionof termspertainingto wheeledvehiclesiswidespread borrowing. Good ideas spread. TermsassociatedwithanewtechnologyareoPenborrowedalong with the technology. The spread of wheeledtransportacrossEuropeandtheNearEast50006000yearsagoseemsalikelycandidateforborrowingofthissort.Linguistsareabletoidentifymanyborrowings(particularlymorerecentones)onthebasisofthepresenceorabsenceofcertainsystematicsoundcorrespondences.However,ourdateestimatessuggest

    thatmostofthemajorIndoEuropeangroupswerejustbeginningtodivergewhenthewheelwasintroduced.WewouldthusexpectthecurrentlyaestedformsofanyborrowedtermstolookasiftheywereinheritedfromrotoIndoEuropeantheymaythusbeimpossibletoreliablyidentify.

    Bothoftheseargumentsarediscussedinmoredetail elsewhere (Watkins 1969; Renfrew 1987; Atkinson&Gray2006).Itsufficestosaythatboththepowerandthepitfallsoflinguisticpalaeontologyarewellknown.Wearedisappointedthatintheirrushtodismissourpaperinthemedia,otherwisescholarlyandresponsiblelinguistshaveclaimedmuchgreatercertainty for their semantic reconstructions thanis

    justifiable.Thisdoesnotmeanthatwethinkthereisnoissuehere.Ideally,weshouldaimtosynthesizealllinesofevidencerelatingtotheageofIndoEuropean.Ringe (unpublished manuscript) presents a carefulsummary of the terms related to wheeled vehiclesinIndoEuropean.Hearguesthatwordsforthill(apolethatconnectsayokeorharnesstoavehicle)andyokecanconfidentlybereconstructedforrotoIndoEuropean.Henotesthatreflexesof *kW(e)kWlowheelhavenotbeenfoundinAnatolianlanguagesbutexistin TocharianA andB andother IndoEuropeanlanguages, and hence can be reconstructed for thecommonancestorofallnonAnatolianIndoEuropeanlanguages. Ringe claims that the specific forms of

    thesewordsmakeparallelsemanticchangesorborrowingextremelyimplausible.Itwouldbeextremelyusefultoaempttoquantifyjusthowunlikelysuchalternativescenariosare.Untilalltheassumptionsoftheseargumentsareformalized,andtheprobabilityofalternativescenariosquantified,itwillremaindifficulttosynthesizeallthedifferentlinesofevidenceontheageofIndoEuropean.

    10.2.IndependenceofcharactersAsmentionedinsection5,Evansetal.(Chapter10thisvolume)claimourevolutionarymodelofbinarycharacterevolutionispatentlyinappropriatebecause

    it assumes independence betweencharacters whenourcharactersareclearlynotindependent.However,wedonotbelievethatanyviolationofindependencenecessarilybiasesourtimedepthestimates.Wenotethattheassumptionofindependencedoesnotholdfornucleotideoraminoacidsequencedataeither.Forexample,compensatingsubstitutionsinribosomalRNAsequencesresultincorrelationbetweenpairedsitesinstemregions(Felsenstein2004).However,biologistsstillgetreasonablyaccurateestimatesofphylogenydespiteviolationsofthisassumption.Infact,nothingintheEvansetal.paperdemonstratesthatcodingthedataasbinarycharacters,ratherthanthemultistatecharacters,willproducebiasedresults.agel&Mead(Chapter15this

    volume)demonstratedthat,onthecontrary,binaryandmultistatecodeddataproducetreesthatdifferinlengthbyaconstantofproportionality.Inotherwords,thebinaryandmultistatetreesarejustscaledversionsofoneanother.Sinceweestimateratesofevolutionforeachtreeusingthebranchlengthsofthattree,scalingthebranchlengthsdoesnotaffectourresults.agel&Meade(Chapter15thisvolume)alsoapproximatedtheeffectofviolationsoftheindependenceassumptionontheMCMCanalysisbyheatingthelikelihoodscores.Theyinferredthatviolationsofindependencewouldproducehigherposteriorprobabilityvaluesbutwouldhavelileeffectontheconsensustreetopology.Thismeansthatwemayhaveunderestimatedtheer

    rorduetophylogeneticuncertaintybutourestimateswillnotbebiasedtowardsanyparticulardate.Finally,treatingcognatesetsasthefundamen

    talunitof lexicalevolutiondoesnot,asEvansetal.(Chapter10thisvolume)argue,constituteanextremeviolation of theindependenceassumption.AlmostallofthelanguagesintheDyen etal.(1992)databasecontain polymorphisms, meaning that for a givenlanguagethereexistmultiplewordsofthesamemeaning.Thepolymorphismsinourdataareareflectionof thenature of lexicalevolution.Specifically, theydemonstrate a lack of strict dependence betweencognatesetswithinmeaningcategoriesi.e.awordwitha given meaning canarise in a languagethat

    alreadyhasawordofthatmeaning.Modelsoflexicalevolutionthatdonotallowpolymorphisms(e.g.Ringeetal.2002)couldalsobelabelledas patentlyinappropriatebecausetheyassumethatforawordtoariseinalanguageanyexistingwordswiththatmeaning mustbe concomitantlylost fromthe language.Thisisnotalwaysthecase.Ringeetal.(2002)notethatalthoughthewordssmallandlilehave

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    14/20

    104

    Chapter8

    verysimilarmeanings,theyhavepersistedtogetherinEnglishforoverathousandyears.Ourbinarycodingprocedureallowsustorepresentsuchpolymorphisms withease. Thepresenceof polymorphismsmeansthat dependencies betweencognatesets arenotasstrongasEvans etal. claim.Afurtherfactorthatweakensthedependenciesbetweenthecognatesetsarisesfromthethinningprocessthatoccursinlexicalevolution.Theobservedcognatesetsdonot

    representthefullcomplimentofactualcognatesthataroseinIndoEuropean(seeNicholls&GrayChapter14thisvolume).Somecognatesthatexistedinthepastwill nothave persistedinto presentday languagesandanyuniquecognateswerenotincludedintheanalysis.Thisthinning ofthecognatesalsoacts toreducedependenciesbetweencharactersintheanalysisandthusfurtherweakensanyeffectofviolationsofindependence.FurtherresearchbyAtkinsonetal.(2005)usingsyntheticdatahasshownthatviolationsoftheindependenceassumptiondonotsignificantlyaffectdateestimates.

    10.3.Confidenceinlexicaldata

    Fromaphylogeneticviewpointthelexiconisatremendouslyaractivesourceofdatabecauseofthelargenumberofpossiblecharactersitaffords.However,weareawarethatmanyhistoricallinguistsarescepticalofinferencesbasedpurelyonlexicaldata.Garre(Chapter12thisvolume)arguesthatborrowingoflexicalterms,oradvergence,withinthemajorIndoEuropean

    subgroupscouldhavedistortedourresults.Heidentifiesanumberofcaseswhereanancestraltermhas

    beenreplacedbyadifferentterminallorsomeofthedaughterlanguages,presumablyviaborrowing:

    ThusLatinignisfirehasbeenreplacedbyreflexesofLatinfocushearththroughoutRomance,andarchaicSanskrithanti kills hasbeen replacedby reflexesof a younger Sanskrit form marayati throughoutIndoAryan.

    Garret argues correctly that, where a word hasbeen borrowed across a subgroup aPer the initialdivergenceofthe group,ourmethodwillinferthattheword evolvedin thebranchleadingup tothatsubgroup(seeLatinfocusexample:Fig.8.15a).Thiswill falsely inflate the branch lengths below thesubgroups and deflate branch lengths within eachgroup. Since we estimaterates of evolution on the

    basis of withingroup branch lengths, it is arguedthatwewillunderestimateratesofchangeandhenceoverestimatedivergencelowerinthetree.However,thisargumentrequiresthattwospecialassumptionshold.First,anyborrowingmustoccuracrossawholesubgroupandonlyacrossawholesubgroup.When

    termsarenotborrowedacrossthewholegroupthereisnosystematicbiastoinferchangesinthebranchleadingtothegroup.Dependingonthedistributionofborrowedterms,advergencecanevenproducetheoppositeeffect,falselyinflatingbranchlengthswithinsubgroups and hence causing us to underestimatedivergencetimes.Itseemsunlikelythatalloreven

    Figure8.15.a)ParsimonycharactertraceforreflexesofLatinfocus(originallyhearthbutborrowedasfire)ontheRomanceconsensustree.Blackindicatespresenceofthecharacter,greyindicatesabsenceanddashedindicates

    uncertainty.ThisshowsborrowingacrossthewholeRomancesubgroupevolutionarychangeisinferredatthebaseofthesubgroupwithnochangewithinthesubgroup,falselyinflatingdivergencetimeestimates.b)aswith(a)butforreflexesofLatintesta(originallycup,jar,shellbutborrowedashead).HeretheborrowingisnotacrossthewholeRomancesubgroupevolutionarychangeisinferredwithinthesubgroup.

    a b

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    15/20

    105

    HowOldistheIndoEuropeanLanguageFamily?

    mostborrowedtermswereborrowedacrossanentiresubgroup.Garrehighlighted16instancesofborrowing within IndoEuropeansubgroups.2 These werepresumably selected because theywere thought to

    reflectthesortofadvergencepaernthatwouldbiasourresults.Ofthese,atleast6areunlikelytofavourinferredlanguagechangeatthebaseofasubgroup.3Figure8.15bshowstheexampleoftheRomancetermforhead.

    Second,evenifweacceptthefirstassumption,wemustassume that the proposedprocess ofadvergenceis unique to contemporarylanguages.AsGarre(Chapter12thisvolume)putsit,thisrequirestheunscientificassumptionthatlinguisticchangeintheperiodforwhichwehavenodirectevidencewasradicallydifferentfromchangewecanstudydirectly.Ratherthanarguingthatborrowingwasrareatonestageandthensuddenlybecamecommonacrossall

    ofthemajorlineagesataboutthesametime,itseemsmoreplausibletosuggestthatborrowinghasalwaysoccurred.IfthesameprocessofadvergenceinrelatedlanguageshasalwaysoccurredthentheeffectofshiPingimpliedchangestomoreancestralbrancheswill

    bepropagateddownthetreesuchthatthereshouldbenonet effecton divergencetimecalculation.Forexample,borrowingwithinItalicmayshiPinferredchangesfromthemoremodernbranchestothebranchleadingtoItalic,butborrowingbetweenrotoItalicanditscontemporarieswillalsoshiPinferredchangesfromthisbranchtoancestralbranches.ThismeansthatwhilstwemayincorrectlyreconstructsomeprotoIndoEuropean roots, our divergence time calcula

    tionwillnotbeaffected.Wemaintainthatalthoughadvergence has undoubtedly occurred throughoutthehistoryofIndoEuropean,andthatthismayhaveaffectedourtrees,thiseffectislikelytoberandomandthereisnoreasontothinkitwillhavesignificantly

    biased our results. Atkinson et al. (2005) analyzedsyntheticdatawithsimulatedborrowing,andfoundthatdateestimateswerehighlyrobusttoevenhighlevelsofborrowing.

    Ringeetal.(2002)arguethatnonlexicalcharacterssuchasgrammaticalandphonologicalfeaturesarelesslikelytobeborrowed(althoughtheyalsonotethatparallelchangesinphonologicalandmorphologicalcharactersarepossible).Toavoidpotentialproblems

    duetolexicalborrowingtheycoded15phonologicaland22morphologicalcharactersasstrictconstraintsintheiranalyses(theydidnotthrowouttheremaining333lexicalcharacters).Whileweagreethatphonologicalandmorphologicalcharacterswouldbeveryuseful,webelievetherearegoodreasonstotrusttheinferencesbasedonthelexicaldatainourcase.TheDyenetal.(1992)datahashadmuchoftheknown

    borrowingfilteredfromit.Further,therelationshipsweinfer betweenIndoEuropean languagesare remarkablysimilartothoseinferredbylinguistsusingthe comparative method. Our results are not onlyconsistentwithacceptedlanguagerelationships,butalsoreflectacknowledgeduncertainties,suchasthepositionof Albanian. Ourtimedepth estimatesforinternalnodesoftheIndoEuropeantreearealsocongruentwithknownhistoricalevents(i.e.whenconstraintswereremovedstepbystepfromeachofthe13internalconstraintpoints,thereconstructedages

    werewithin390yearsoftheoriginalconstraintrange:Gray&Atkinson2003).Significantly,ifweconstrainourtreestofittheRingeetal.(2002)typologywegetverysimilardateestimatestoourinitialconsensustreetopology.Inshort,thereisnothingtoindicatethateitherourtreetypologiesordateestimateshavebeenseriouslydistortedbytheuseofjustlexicaldata.

    DeterminedcriticsmightstillclaimthattheremainingundetectedlexicalborrowingthatundoubtedlyexistsintheDyenetal.data(seeNicholls&GrayChapter14thisvolume)hasledustomakeerroneoustimedepthinferencesattherootoftheIndoEuropeantree. The Swadesh 100word list is expected to bemoreresistanttochangeandlesspronetoborrowing

    thanthe200wordlist(Embleton1991;McMahon&McMahon2003).Ifundetectedborrowinghasbiasedourtreetopologyanddivergencetimeestimatesthenthe100wordlistmightbeexpectedtoproducedifferentestimates.Toassessthispossibilitywerepeatedthe analysis using only the Swadesh 100word listitems.Figure8.16showstheresultsofthisanalysis.redictably,withasmallerdatasetvarianceintheage

    Figure8.16.Frequencydistributionofbasalage

    estimatesfromfilteredBayesianMCMCsampleoftreesusingSwadesh100wordlistitemsonly(n=97).

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    16/20

    106

    Chapter8

    estimatesincreased.However,theresultingagerangewasstillconsistentwiththeAnatoliantheoryofIndoEuropeanorigin.Interestingly,themajorityruleconsensustree(showninFig.8.17)isslightlydifferenttothatobtainedfromthefulldataset.ItcontainsaBaltoSlavicIndoIraniangroupandanItaloCelticgroup.Ringeetal.s(2002)compatibilityanalysisalsofound

    theseclades.Thelowposterior probability valuesforthesegroupsmeanthatweshouldnotoverinter

    pretthecertaintyofthesedeeper relationships, butclearlythepossibilitythatundetectedlexicalborrowing is obscuring some ofthe deeper relationshipswouldrepayfurtheraention. We emphasize thatthis possible borrowingdoesnotappear,however,to affect our timedepthestimates for the root ofthetree.

    It is interesting to

    notethatwhilstourmethodology produced consistent results using theSwadesh 100 and 200wordlist,Tischlers(1973)gloochronologicalanalysis was affected by thechoiceofwordlist.Tischlergenerated IndoEuropeandivergence times usingpairwisedistancecomparisons between languagesunder the assumption ofconstant rates of lexical

    replacement. Using theSwadesh 200word list,hecalculatedthatthecoreIndoEuropean languages(Greek,Italic,BaltoSlavic,G e rm an ic a nd I nd oIranian) diverged around5500 whilst Hiite divergedfromthecommonstockaround8400.Thisis in striking agreementwiththetimingdepictedin Figure 8.8. However,thesamecalculationusing

    theSwadesh100wordlist,producedaHiitedivergencetimeofalmost11,000.Otherinferreddivergencetimeswerealsoolder.Tischlerfavouredthe200wordlistresultsbecausetheytendedtobemoreconsistentandwerebasedonalargersamplesize.However,thedisparate100wordlistagesledTischlertoconcludethatthedivergencetimesforHiite(andanumberofotherperipheralIndo

    Figure8.17.Majorityruleconsensustree(unfiltered)forSwadesh100wordlistitemsonly.Valuesaboveeachbranchexpressuncertainty(posteriorprobability)inthetreeasapercentage.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    17/20

    107

    HowOldistheIndoEuropeanLanguageFamily?

    Europeanlanguages,includingAlbanian,ArmenianandOldIrish)wereinfactanomalousandheinsteadfavouredanageforIndoEuropeanofbetween5000and6000 years, reflecting the breakup of thecore

    languages.HeexplainedtheapparentearlierdivergenceofHiite,Albanian,OldIrishandArmenianasanartefactofborrowingwithnonIndoEuropeanlanguagesorincreasedratesofchange.

    11.Conclusion

    TheanalyseswehavepresentedherearefarfromthelastwordonthevexedissueofIndoEuropeanorigins.WeexpectthateveryspeciesofscholarandwouldbesavantwhocantakepentohandwillstillbedrawntothequestionofIndoEuropeanorigins.However,incontrasttosomeofthemorepessimisticclaimsofourcritics,wedonotthinkthatestimatingtheageof

    theIndoEuropeanlanguagefamilyisanintractableproblem.Someofthesecriticshavearguedthatitishardenoughtogetthetreetypologycorrect,letalone

    branchlengthsordivergencetimes.Fromthispointofviewalleffortstoestimatedatesshouldbeabandoneduntilwecangetthetreeexactlyright.Wethinkthatwouldbeabigmistake.Itwouldprematurelycloseofflegitimatescientificinquiry.Theprobabilityofgeingtheoneperfectphylogenyfromthe6.6610152possibleunrootedtreesfor87languagesisrathersmall.Fortunatelywedonotneedtogetthetreeexactlycorrectinordertomakeaccuratedateestimates.UsingtheBayesianphylogeneticapproachwecancalculatedivergencedatesoveradistributionofmostprobable

    trees, integrating outuncertaintyin thephylogeny.Weacknowledgethatestimatinglanguagedivergencedatesisdifficult,butmaintainitispossibleifthefollowingconditionsaresatisfied:a) adatasetofsufficientsizeandqualitycanbeas

    sembledtoenablethetreeanditsassociatedbranchlengthstobeestimatedwithsufficientaccuracy;

    b) mostoftheborrowingisremovedfromthedata;c) anappropriatestatisticalmodelofcharacterevolu

    tionisused(itshouldcontainsufficientparameterstogiveaccurateestimatesbutnotbeoverparameterized);

    d) multiplenodesonthetreearecalibratedwithreliableageranges;

    e) uncertaintyintheestimationoftreetopologyandbranchlengthsareincorporatedintotheanalysis;f) variationintherateof linguisticevolutionisac

    commodatedintheanalysis.TheanalysesofIndoEuropeandivergencedateswehaveoutlinedabovegoalongwaytomeetingtheserequirements.TheDyenetal.(1997)datasetweusedinouranalysescontainsovertwothousandcarefully

    codedcognatesets(conditiona).Dyen etal.excludedknownborrowingsfromthesesets(conditionb).Thetwostate,timereversiblemodelofcognategainsandlosseswithgammadistributedrateheterogeneitypro

    ducedaccuratetrees(i.e.congruentwiththeresultsofthecomparativemethodandknownhistoricalrelationships)4(conditionc).Whenthebranchlengthswerecombinedwiththelargenumberofwellcali

    bratednodes(conditiond),theestimateddivergencedateswerealsoinlinewithknownhistoricalevents.TheBayesianMCMCapproachallowedustoincorporate phylogenetic uncertainty into our analyses(conditione),andtoinvestigatetheconsequencesofvariationsinthepriors,treerooting,andstringencyincognatejudgements.Finally,ratesmoothingallowedustoestimatedivergencedateswithouttheassumptionofastrictglooclock(conditionf). Wechallengeourcriticstofindanypaperonmoleculardivergencedatesthat

    usesasmanycalibrationpoints,investigatestheimpactofsomanydifferentassumptions,orgoestothesamelengthstovalidateitsresults.

    InthewordsofW.S.Holt,historyisadamndimcandle overa damn dark abyss.Although weseereasonforcarefulscholarshipwhenaemptingtoestimatelanguagedivergencedates,weseenojustificationforpessimismhere.FarfromdancingaroundthequestionofIndoEuropeanoriginslikemothsaroundaflame,withthelightofcomputationalphylogeneticmethodswecanilluminatethepast.

    Notes1. Tenmillion postburnintreesweregeneratedusing

    theMrBayes(Huelsenbeck&Ronquist2001).Toensurethatconsecutivesampleswereindependent,onlyevery10,000thtreewassampledfromthisdistribution,producingasamplesizeof1000.

    2. Theproposedborrowingswere:inRomanceear,fire,liver,count,eat,headandnarrow;inGermanicleaf,sharpandthink;andinIndickill,night,play,suck,flowerandliver.WenotethatthislistwasnotintendedbyGarretobeacomprehensiveaccountofallpossibleborrowings.

    3. Borrowings that are unlikelyto favour inferredlanguagechangeatthebaseofasubgrouporthatwouldfavourinferredlanguagechangewithinasubgroupare:inRomanceear,head,narrow;inGermanicleaf;andinIndicflowerandliver.

    4. Wedo,however,agreethatthequestionofmodelspeci

    ficationwouldrepayfurtherinvestigation(seeNicholls&GrayChapter14thisvolume;agelChapter15thisvolume;Atkinsonetal.2005).

    References

    Adams,D.Q.,1999.DictionaryofTocharianB.(LeidenStudiesinIndoEuropean10.)Amsterdam:Rodopi.Avail

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    18/20

    108

    Chapter8

    ableviaonlinedatabaseatS.Starostin&A.Lubotsky(eds.),DatabaseQuerytodictionaryofTocharianB.hp://iiasnt.leidenuniv.nl/ied/index2.html.

    Atkinson,Q.D.&R.D.Gray,2006.Areaccuratedatesanintractableproblemforhistoricallinguistics?inMappingourncestry:PhylogeneticMethodsinnthropologyandPrehistory,eds.C.Lipo,M.OBrien,S.Shennan&M.Collard.Chicago(IL):Aldine,26996.

    Atkinson,Q.D.,G. Nicholls,D. Welch&R.D.Gray,2005.Fromwordstodates:waterintowine,mathemagicorphylogeneticinference?TransactionsofthePhilologicalSociety103(2),193219.

    Bateman,R.,I.Goddard,R.OGrady,V.Funk,R.Mooi,W.Kress&.Cannell,1990.Speakingofforkedtongues:Thefeasibilityofreconcilinghumanphylogenyandthehistoryoflanguage.Currentnthropology31,124.

    Bellwood,.,1991.TheAustronesiandispersalandtheoriginoflanguages.Scientificmerican265,8893.

    Bellwood, ., 1994.An archaeologists view of languagemacrofamily relationships. Oceanic Linguistics 33,

    391406.Bergsland,K.&H.Vogt,1962.Onthevalidityofgloochronology.Currentnthropology3,11553.

    Blust,R.,2000.Whylexicostatisticsdoesntwork:theUniversal Constant hypothesis and the Austronesianlanguages,inTimeDepthinHistoricalLinguistics,eds.C.Renfrew,A.McMahon&L.Trask.(apersintherehistory of Languages.) Cambridge: McDonaldInstituteforArchaeologicalResearch,31132.

    Bryant,D.,F.Filimon& R.D.Gray,2005.Untanglingourpast:acificselement,phylogenetictreesandAustronesianlanguages,inTheEvolutionofCulturalDiversity:Phylogeneticpproaches,eds.R.Mace,C.Holden&S.Shennan.London:UCLress,6985.

    Burnham,K..&D.R.Anderson,1998.ModelSelectionandInference: a Practical InformationTheoretic pproach.

    NewYork(NY):Springer.Campbell,L.,2004.HistoricalLinguistics:anIntroduction.2nd

    edition.Edinburgh:EdinburghUniversityress.CavalliSforza,L.L.,.Menozzi&A.iazza,1994.TheHis

    toryandGeographyofHumanGenes.rinceton(NJ):rincetonUniversityress.

    Clackson,J.,2000.TimedepthinIndoEuropean,inTimeDepth in Historical Linguistics, eds. C. Renfrew, A.McMahon & L. Trask. (apers in therehistory ofLanguages.)Cambridge:McDonaldInstituteforArchaeologicalResearch,44154.

    Devoto,G.,1962.OriginiIndeuropeo.Florence:InstitutoItalianodireistoriaItaliana.

    Diakonov,I.M.,1984.OntheoriginalhomeofthespeakersofIndoEuropean. Sovietnthropologyandrchaeology23,587.

    Diamond,J. & . Bellwood, 2003.Farmers andtheirlanguages:thefirstexpansions.Science300,597.

    Dyen,I.,J.B.Kruskal&.Black,1992.nIndoeuropeanClassification:aLexicostatisticalExperiment. (Transactions82(5).)hiladelphia(A):AmericanhilosophicalSociety.

    Dyen, I., J.B. Kruskal & . Black, 1997. FILE IEDATA1.Availableathp://www.ntu.edu.au/education/langs/ielex/IEDATA1.

    Embleton,S.,1986.StatisticsinHistoricalLinguistics.Bochum:Brockmeyer.

    Embleton, S.M., 1991. Mathematical methods of geneticclassification,inSprungfromSomeCommonSource,eds.S.L.Lamb&E.D.Mitchell.Stanford(CA):StanfordUniversityress,36588

    Excoffier,L. & Z. Yang,1999. Substitution ratevariationamong sites inmitochondrialhypervariableregionIofhumansandchimpanzees.MolecularBiologyandEvolution16,135768.

    Faguy,D.M.&W.F.Doolile,2000.Horizontaltransferofcatalaseperoxidasegenesbetweenarchaeaandpathogenicbacteria.TrendsinGenetics16,1967.

    Felsenstein,J.,2004.InferringPhylogenies .Sunderland(MA):Sinauer.

    Gamkrelidze,T.V.&V.V.Ivanov,1995. IndoEuropeanandtheIndoEuropeans:aReconstructionandHistoricalnalysisofaProtoLanguageandProtoCulture.Berlin:MoutondeGruyter.

    Gimbutas, M., 1973a. Old Europe c. 70003500 , the

    earliestEuropeanculturesbeforetheinfiltrationoftheIndoEuropeanpeoples.JournalofIndoEuropeanStudies1,120.

    Gimbutas,M.,1973b.ThebeginningoftheBronzeAgeinEuropeandtheIndoEuropeans35002500.JournalofIndoEuropeanStudies1,163214.

    Gkiasta,M.,T.Russell,S.Shennan&J.Steele,2003.NeolithictransitioninEurope:theradiocarbonrecordrevisited.ntiquity77,4562.

    Glover,I.&C.Higham,1996.NewevidenceforricecultivationinS.,S.E.andE.Asia,in TheOriginsandSpreadofgricultureandPastoralisminEurasia,ed.D.Harris.Cambridge:Blackwell,41342.

    Goldman,N.,1993.StatisticaltestsofmodelsofDNAsubstitution.JournalofMolecularEvolution36,18298.

    Gray,R.D.&Q.D.Atkinson,2003.Languagetreedivergence

    timessupporttheAnatoliantheoryofIndoEuropeanorigin.Nature426,4359.

    Guterbock,H.G.&H.A.Hoffner,1986. TheHiAiteDictionaryoftheOrientalInstituteoftheUniversityofChicago.Chicago(IL):TheInstitute.

    Hillis,D.M.,1992.Experimentalphylogeneticsgenerationofaknownphylogeny.Science255,58992.

    Hillis,D.M.,C.Moritz&B.K.Marble,1996.MolecularSystematics.2ndedition.Sunderland(MA):Sinauer.

    Hjelmslev,L.,1958. EssaiduneCritiquede laMethodediteGloAochronologique.ProceedingsoftheThirtysecondInternationalCongressofmericanists,Copenhagen,1956.Copenhagen:Munksgaard.

    Hoffner, H.A., 1967.n EnglishHiAite Dictionary. NewHaven(CT):AmericanOrientalSociety.

    Holden,C.J.,2002.BantulanguagetreesreflectthespreadoffarmingacrossSubSaharanAfrica:amaximumparsimonyanalysis.ProceedingsoftheRoyalSocietyofLondonSeriesB269,7939.

    Holland, B. & V. Moulton, 2003.Consensus networks:amethodforvisualisingincompatibilitiesincollectionsoftrees,inlgorithmsinBioinformatics,WBI2003 ,eds.G. Benson & R. age. Berlin: SpringerVerlag,16576.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    19/20

    109

    HowOldistheIndoEuropeanLanguageFamily?

    Huelsenbeck,J..&F.Ronquist,2001.MRBAYES:Bayesianinferenceofphylogeny. Bioinformatics17,7545.

    Huelsenbeck,J..,F.Ronquist,R.Nielsen&J..Bollback,2001.Bayesianinferenceofphylogenyanditsimpactonevolutionarybiology. Science294,231014.

    Jukes, T.H. & C.R. Cantor, 1969. Evolution of proteinmolecules,inMammalianProteinMetabolism,vol.3,ed.M.N.Munro.NewYork(NY):Academicress,21132.

    Kuhner,M.K.&J.Felsenstein,1994.Asimulationcomparisonofphylogenyalgorithmsunderequalandunequalevolutionaryrates.MolecularBiologyandEvolution 11,45968.

    Kumar, V.K., 1999.Discovery ofDravidian as theCommonSourceofIndoEuropean.RetrievedSept.27th2002fromhp://www.datanumeric.com/dravidian/.

    Levins,R.,1966.Thestrategyofmodelbuildinginpopulationbiology,mericanScientist54,42131.

    Mallory,J..,1989.InSearchoftheIndoEuropeans:Languages,rchaeologyandMyth.London:Thames&Hudson.

    McMahon,A.&R.McMahon,2003.Findingfamilies:quantitativemethodsinlanguageclassification.TransactionsofthePhilologicalSociety101,755.

    Metropolis,N., A.W. Rosenbluth, M.N.Rosenbluth,A.H.Teller&E.Teller,1953.Equationsofstatecalculationsbyfastcomputingmachines.JournalofChemicalPhysics21,108791.

    Oe,M.,1997.ThediffusionofmodernlanguagesinprehistoricEurasia,inrchaeologyandLanguage,eds.R.Blench&M.Spriggs.London:Routledge,7481.

    agel,M.,1997.Inferringevolutionaryprocessesfromphylogenies.ZoologicaScripta26,33148.

    agel,M.,1999.Inferringthehistoricalpaernsofbiologicalevolution.Nature401,87784.

    agel, M., 2000. Maximumlikelihood models for glottochronologyandforreconstructinglinguisticphy

    logenies,inTimeDepthinHistoricalLinguistics, eds.C.Renfrew,A.McMahon&L.Trask.(apersintherehistoryofLanguages.)Cambridge:TheMcDonaldInstituteforArchaeologicalResearch,41339.

    Renfrew,C.,1987.rchaeologyandLanguage:thePuzzleofIndoEuropeanOrigins.London:Cape.

    Rexov,K.,D.Frynta&J.Zrzavy,2003.Cladisticanalysisoflanguages:IndoEuropeanclassificationbasedonlexicostatisticaldata.Cladistics19,12027.

    Ringe,D.,n.d.rotoIndoEuropeanWheeledVehicleTerminology.Unpublishedmanuscript.

    Ringe,D.,T.Warnow&A.Taylor,2002.IndoEuropeanandcomputationalcladistics.TransactionsofthePhilologicalSociety100,59129.

    Rosser,Z.H.,T.Zerjal,M.E.Hurles etal.,2000.Ychromosomal diversity in Europe is clinal and influencedprimarily by geography, rather than by language.mericanJournalofHumanGenetics67,152643.

    Sanderson,M.,2002a.R8s,nalysisofRatesofEvolution,version1.50.hp://ginger.ucdavis.edu/r8s/

    Sanderson,M.,2002b.Estimatingabsoluteratesofevolution and divergence times: a penalized likelihoodapproach.MolecularBiologyandEvolution19,1019.

    Steel,M.,M.Hendy&D.enny,1988.Lossofinformation

    ingeneticdistances.Nature333,4945.Swadesh, M., 1952. Lexicostatistic dating of prehistoricethniccontacts.ProceedingsofthemericanPhilosophicalSociety96,45363.

    Swadesh,M.,1955.Towardsgreateraccuracyinlexicostatisticdating.InternationalJournalofmericanLinguistics21,12137.

    Swofford,D.L.,G.J.Olsen,.J.Waddell&D.M.Hillis,1996.hylogeneticInference,inMolecularSystematics,eds.D.M.Hillis, C. Moritz & B.K.Marble.2nd edition.Sunderland(MA):Sinauer,407514.

    Tischler,J.,1973.GloAochronologieundLexicostatistik.Innsbruck:InnsbruckerVerlag.

    Tischler, J., 1997. HethitischDeutschesWorterverzeichnis.Dresden:robedruck.

    Trask, R.L., 1996.Historical Linguistics. New York (NY):

    Arnold.Watkins,C.,1969.IndogermanischeGrammatikIII/1.Geschich

    tederIndogermanischenVerbalflexion. Heidelberg:Carl

    WinterVerlag.

  • 8/7/2019 AtkinsonGray2006a How old is IE lang Fam

    20/20