1 inferring patterns of folktale diffusion using genomic data 2 3 4 … · 2017. 8. 2. · 1 1...
TRANSCRIPT
1
Inferringpatternsoffolktalediffusionusinggenomicdata1
SIAppendix2
Eugenio Bortolini, Luca Pagani, Enrico R. Crema, Stefania Sarno, Chiara Barbieri, Alessio Boattini,Marco3Sazzini, SaraGraça da Silva,GessicaMartini,MaitMetspalu,Davide Pettener, Donata Luiselli, Jamshid J.4Tehrani 5
2
Contents1
1EXTENDEDDATASETDESCRIPTION....................................................................................................32
2DISTANCES........................................................................................................................................33
2.1SNPFILTERING...................................................................................................................................34
2.2GENETIC,FOLKTALE,ETHNOLINGUISTICANDGEOGRAPHICDISTANCES............................................................45
2.2.1GENETICDISTANCE...................................................................................................................................46
2.2.2FOLKTALEDISTANCE.................................................................................................................................47
2.2.3GEOGRAPHICDISTANCE.............................................................................................................................58
2.2.4EUCLIDEANDISTANCES..............................................................................................................................59
3SPACEMIX..........................................................................................................................................510
4.NEIGHBORNET................................................................................................................................1211
5.AMOVA..........................................................................................................................................1312
6.BIAS-CORRECTEDDISTANCECORRELATIONANDPARTIALDISTANCECORRELATION.......................1413
7.EXPLORINGASSOCIATIONBETWEENVARIABLESOVERCUMULATIVEGEOGRAPHICDISTANCE.......1514
8.DIFFUSIONOFMOSTPOPULARTALESANDESTIMATIONOFPOSSIBLEFOCALPOINTSFROMSPATIAL15
DISTRIBUTION....................................................................................................................................1616
17
18
3
1Extendeddatasetdescription1
Folktale datawere sourced from theAarne ThompsonUther (ATU) Index - a catalogue of over 2,0002
distinct “international tale types” distributed amongmore than 200 cultures [22]. Each international3
type represents an independent, self-contained storyline comprising a combination of motifs (e.g.4
specific events, characters, or artefacts) that is recognizably stable across cultures.We constructed a5
datasetrecordingthecross-culturaldistributionsoftwogroupsoffolktales:'AnimalTales'(ATU1–299),6
which featurenon-humanprotagonists, as typifiedbyAesop's fables, and 'TalesofMagic' (ATU300–7
749),whichconcernbeingsorobjectswithsupernaturalpowers,suchasfairies,witchesormagicrings8
[22].Wefocusedonthesetwogenresbecausetheyarethemostrichlydocumentedandmostculturally9
widespreadgroupsoftalesintheATUIndex.10
73ofthe198societies inwhichthetaleswererecordedcouldbematchedwithpopulationsforwhich11
wholegenomesequenceswereavailable(TableS2-I).Ofthese,33(DatasetMAIN)wereselectedbasedon12
athresholdofminimumrichness(i.e.thoseexhibitingatleast5folktales;TableS2-II)andthepresence13
ofviablegeneticproxies.Eachpopulationwasunivocallydescribedbyastringlistingthepresence(1)or14
absence(0)ofanyoftheincluded596folktales(TableS2-II).15
In addition to DatasetMAIN we generate an additional subset which is functional to testing explicit16
hypotheses, i.e. DatasetEURASIA (N=30) which does not include the 3 African population present in17
DatasetMAIN(TableS1-II,i.e.Congolese,Tanzanian,andWestAfrican); 18
2Distances19
2.1SNPfiltering20
Thewholegenomesequencesusedinthisstudyweregenerated,QCedandphasedaspartofabroader21
study[21].Thebulkof~39MSNPswereusedtocalculatethestatisticsdescribedbelow.22
23
4
2.2Genetic,Folktale,EthnolinguisticandGeographicdistances1
2.2.1Geneticdistance2
Geneticdistanceswereestimatedbytheaveragepairwisedistancesbetweentwogenomes,one from3
eachpopulation.Geneticdistance for (i,j)pairsofpopulations representedbymore thanonegenome4
eachwascalculatedastheaverageofallpossible(i,j)pairsofgenomes.Asaconsequencethediagonal5
ofthegeneticdistancematrixwasnotconstrainedtobezero(TableS2-3.1-3).6
2.2.2Folktaledistance7
Sincetheoriginaldataset(TableS1-I)andDatasetMAIN(TableS1-II)comprisebinaryevidenceofpresence8
(1)orabsence (0)of a given folktale ina setofworldwidepopulations,wecalculate folktaledistance9
betweenpopulationsasanasymmetricpairwiseJaccarddistance2.SymmetricJaccarddistancebetween10
populationAandpopulationBiscalculatedas11
12
13
in other words as the ratio between the number of differences and the sum of similarities and14
differences which can be identified by comparing A and B. In the present work, we assume Xij = 015
(absence of the jth tale in the ith population of datasetX) to be the ancestral state. Accordingly,we16
adopt an asymmetrical coefficient that does not consider absence of the jth tale in two sampled17
populations iandk(Xij+Xkj=0)asaninstanceofhomology(forthesubstantial limitsposedbydouble18
zerostoinferenceinecologyandrelateddisciplinespleaserefertoLegendreandLegendre3).19
Therefore, in a datasetX formedbyn rows each representing a populationunivocally describedby a20
stringofpresence(1)/absence(0)valuesofJfolktales,weeliminatedoublezerosandcalculatepairwise21
folktaledistance(F𝜹)betweenpopulationXiandpopulationXkas22
23
wheresquareIversonbracketsequal1iftheirinternalconditionissatisfiedand0ifitisnotsatisfied.The24
resultingvalue is the ratiobetweenthenumberof inter-populationdifferences,and thesumof inter-25
populationdifferencesandsimilaritiesbasedonthepresenceofthejthfolktaleinbothpopulations.26
5
2.2.3Geographicdistance1
GeographicdistanceswerecalculatedaspairwiseGreatCircleDistanceusingthepackagegdistanceinR2
[42] and by constraining the hypothesisedmovement of people through onewaypoint located in the3
SinaiPeninsula.Coordinates(longitudeandlatitudeindecimaldegrees)expressingthelocationofeach4
populationcomprisedinTableS2-5identifytheassumedcentreoftheareaoccupiedbyagivenfolkloric5
traditionasdefinedbyATUindex.6
2.2.4Euclideandistances7
In order to perform bias corrected and partial distance Correlation, folktale, genetic, and geographic8
distances were transformed into their exact Euclidean representations (as indicated in Szekely et al9
2007, 2013). The original folktale and genetic distance matrices were scaled through Classic10
Multidimensional Scaling using the function cmdscale in R and following the procedure for exact11
representation presented by Szekely et al. (2013a). Euclidean distances were computed from the12
obtained n-2 number of descriptors using the function dist in R with method set to “euclidean”.13
Euclideanrepresentationofgeographicdistancewasinsteadobtainedbyreprojectingtheoriginalsetof14
coordinatesonaplaneusingtwo-pointequidistantprojectionthroughthefunctionspTransforminthe15
packagespinR(PebesmaandBivand2005;Bivandetal.2013).ActualEuclideandistancebetweenthe16
newsetofcoordinateswascomputedusingthefunctionradishinthepackagefieldsinR(Nychkaetal.17
2016).18
19
3SpaceMix20
WeperformedtwoindependentSpaceMix(Bradburdetal.2013)analysesaimedatretrievingthe21
“genetic”andthe“folktale”spatialpositionsofourEurasiansamples.Forthegeneticrun,weusedthe22
126,554SNPsofchromosome22thatwerevariableinatleastoneof30sampleseachrepresentingone23
ofourstudiedpopulations.Giventhehighnumberofavailablemarkerswedeemedasingle24
chromosometobesufficienttoyieldreliableresults.25
ThegeographicinformationwasobtainedfromTableS2-5.1,thegeneticinformationwasinputtedusing26
thecount/totaloptionandSpacemixwasusedwiththedefaultparameters:27
run.spacemix.analysis(n.fast.reps=10,fast.MCMC.ngen=1e5,fast.model.option="target",28
long.model.option="source_and_target",data.type="counts",sample.frequencies=NULL,29
6
mean.sample.sizes=NULL,counts=count,sample.sizes=total,sample.covariance=NULL,1
target.spatial.prior.scale=NULL,source.spatial.prior.scale=NULL,spatial.prior.X.coordinates=coord[,1],2
spatial.prior.Y.coordinates=coord[,2],round.earth=FALSE,long.run.initial.parameters=NULL,3
k=nrow(count),loci=ncol(count),ngen=1e6,printfreq=1e2,samplefreq=1e3,mixing.diagn.freq=50,4
savefreq=1e5,directory=NULL,prefix="OurPrefix")5
Thisyieldeda“geno-geographic”positioningforeachofthe30samples.6
Thesameprocedurewasreplicatedusingthistime“folktale”informationasinputdata.Particularlywe7
generatedapseudo-geneticfilewhereeachpopulationwastypedasasingleindividualandthe8
presenceofagivenfolktalewasregisteredasanhomozygoustrait.9
The“geno-geographic”and“folk-geographic”coordinateshencegeneratedwerecomparedwiththe10
actualgeographiccoordinates,denotingatendencyforthefolk-geographiccoordinatestoapproximate11
betterthanthegeno-geographiconestheactualgeographiclocationsofthesampledpopulations12
(FigureS1-3.1).13
14
FigureS1-3.1SpaceMixanalysesforeachoftheEurasianpopulationsshowingthegeographic(dot),15geno-geographic(G)andfolk-geographic(F)coordinatesjoinedbywhitesegments.16
17
18
19
20
7
1
2
3
8
1
9
1
10
1
11
1
2
12
4.NeighborNet1
ANeighbornet analysiswas also carriedout to further explore the impactof geographyand linguistic2
ancestryonthedistributionoffolktalesinthepresentdataset.Theanalysisyieldedthefollowinggraph3
(Fig. S1-4.1), exhibiting a certain degree of spatial clustering in addition to proximity and reticulation4
among linguistic close relatives (e.g. within the Indo-European family, the Semitic family, and the5
Mongolianfamily).Overall,thespatialstructureinthedatasetseemstobestronger(e.g.thepositionof6
Hungarian, Japanese/Chinese). Some language families, notably Turkic, Uralic and Caucasian, are7
scatteredacrossthenetwork.Thedegreeofreticulationisquitehigh,astestifiedbytherelevantquartet8
statistics(deltascore=0.317;Q-residualscore=0.002335),suggestingthatculturaladmixtureprocesses9
betweendemesmayhaveanimportantrole.10
Fig. S1-4.1.Neighbornet graph based on folktale distance.Linguistic color code: red = Turkic; blue =11
Indo-European;pink=Sino-Tibetan;purple=Caucasian;turquoise=Eskimo-Aleut;orange=Semitic;12
lightgreen=Uralic;darkgreen=Japonic,brown=Mongolian;black=Austro-asiatic13
14
15
Abkhaz
Tuva
Eskimo
Turkmen
Dagestan
Vietnamese
Azerbaijan
BurmeseBuryat
MongolianLebanese
SaudiArabian
Jordanian
Armenian
Mordvinian
Yakut
Chuvash
Ossetian
Byelorussian
Russian
UkrainianLithuanian
HungarianFrench
Italian
ChineseJapanese
Iranian
Tadzhik
Uzbek
0.1
13
5.AMOVA1
WeuseAnalysisofMolecularVariation(AMOVA;Excoffieretal.1992)to formallyassesthe impactof2
ethnolinguisticboundariesonbothgeneticandcultural(folktale)variability.Todothis,weassignedeach3
populationtoanethnolinguisticgroup(derivedfromEthnologue;SItableS2-9.1),andusedthefunction4
amova in the package pegas in R to run the analysis (Paradis 2010). AMOVA is commonly used in5
populationgeneticstoassessthedegreeofpopulationstructureinametapopulation.Inotherwords,it6
measuresthedegreeofvariabilityexistingbetweenpredeterminedgroupsasopposedtotheamountof7
variability observed within each group. AMOVA returns a summary statistic (PhiST) obtained by8
computing the ratio between intergroup diversity estimate and total diversity in themetapopulation.9
AlthoughitisderivedfromthemoregeneralclassofFSTmeasures,PhiSTevaluatessymmetricdistance10
matriceswhileFSTisbasedoncorrelationsbetweenindividualvariantfrequencies.Suchmeasureshave11
already been successfully adopted to investigate co-evolutionary patterns in genetic and cultural12
datasets(Belletal.2009;Rseszuteketal.2012),inculturaldatasetsalone(Shennanetal2015),andin13
one case on the distribution of folktale variants in Europe (Ross et al. 2013). Results of these works14
consistently show that average intergroup cultural dissimilarity is stronger than genetic dissimilarity15
measuredonthesamesetofdemes,whilePhiSTvaluesobtainedforculturalmarkersareusually ina16
rangecomprisedbetween0.02(musicaldiversity;Rseszuteketal.2012)andhigherlevelsforvariantsof17
individual folktales (PhiST =0.09; Ross et al. 2013), or personal ornaments (PhiST =0.109) and pottery18
(PhiST=0.134)inNeolithicEurope(Shennanetal2015).Ourresultsconfirmthisdifferentialimpacton19
geneticvariabilityontheonehand,andculturalvariabilityontheother.20
21
22 23 24 25 26 27 28 29 30 31 32 33 34
14
6.Bias-correcteddistancecorrelationandpartialdistancecorrelation1
2Distancecorrelation isameasureofstatisticaldependencebetweentwovariableswhich isspecifically3
suitedfortestingsuchhypothesisonpairsofsymmetricdistancematrices. Itsvalueequalszero ifand4
only if thetwovariablesarestatistically independent(Székelyetal.2007). Inaddition:a) theresulting5
statisticsarenotboundto linearmodels.Onthecontrary,theyaresensitivetoalltypesofdependent6
relationships,includingnonlinearandnonmonotoneones(Székelyetal2007);b)itisnotpronetothe7
sameproblems raised for standard andpartialMantel tests (Guillot and Rousset 2013); and c) it is8
usually preferred to othermeasures of nonlinear association when small or practical sample size are9
concerned(Gorfinetal2011).10
Giventwodistanceordissimilaritymatrices,standarddistancecorrelationperformsdoublecenteringof11
eachmatrix by subtracting row and column average to rows and columns of the originalmatrix, and12
addingthegrandmeanofthedistancematrixtotheresults,sothatallcolumnsandrowsoftheresulting13
matricessumtozero.Distancecorrelationbetweentwosuchscaledmatrices-asinPearson’sProduct-14
momentcorrelationcoefficient is computedbydividing thedistanceCovarianceby theproductof the15
respective distance standard deviations. Distance Covariance is obtained by computing the summed16
cross-product between the two double-centeredmatrices and averaging it over squared sample size.17
Distance correlation is thereforenot the correlationbetweenoriginal distances. It is insteadbasedon18
cross-productsbetweenscaledmomentobtainedbydouble-canteringtheoriginalmatrices.19
Inthepresentpaper,weperformBias-CorrectedDistanceCorrelation(SzekelyandRizzo2013)suitedfor20
biggersamplesize(inthepresentstudywehave435observationwhenallpairsareconsideredover3021
populations,whichisexactlytheexamplesizeprovidedbytheauthorsanddevelopersofthemethod),22
Thismethod corrects forpotential limitationsoforiginaldistanceCovarianceanddistanceCorrelation23
measures(Székelyetal.2007)whendimensiontendstoinfinity,andisbasedonanunbiasedestimator24
or thesquareddistancepopulationcovariance.Asuitedt-testof independence isoffered. Inaddition,25
weperformPartialdistancecorrelation(Szekelyetal2013a)toassessthe impactofonevariableover26
another,whilecontrollingfortheeffectofathirdvariable.Forcalculatingpartialdistancecorrelationthe27
standard double-centering used in standard and bias-corrected distance correlation is replaced by a28
different centering technique named U-centering and based on the demonstration that such29
transformed distance and dissimilarity matrices have a corresponding U-centered Euclidean30
representation inHilbert space (ageneralizationofEuclideanplanewitha finiteor infinitenumberof31
15
dimensions;Szekelyetal2013a).Simulationstudiesconfirmthatthejointsignificancetestcontrolstype1
Ierrorrateat itsnominal level,andoutperformspartialcorrelationandpartialManteltest intermsof2
power(Szekelyetal2013a).3
4
7. Exploringassociationbetweenvariablesover cumulative geographic5
distance6
Table S1-7.1 Model comparison over cumulative geographic distance. Results report Pearson’s7
product-momentcorrelationcoefficientsplottedinFig.2andobtainedwithoriginaldistancematrices.8
9
10
11
12
13
14
15
16
17
Table 1: Model selection at cumulative geographic distances
Km N Best AIC �
Geno
�
Geo
L
Geno
L
Geo
w
Geno
w
Geo
Main 528 All -2389.8 4.2 17.5 0.12 0.00016 0.99 0.01
Eurasia 435 All -1895.48 0.09 2.29 0.96 0.32 0.75 0.25
2000 115 Geno -438.94 0.00 22.61 1.00 0.00 1.00 0.00
4000 249 All -1052.72 5.90 21.82 0.05 0.00 1.00 0.00
6000 343 All -1484.20 2.40 4.70 0.30 0.09 0.76 0.24
8000 412 All -1798.36 1.66 1.86 0.44 0.39 0.53 0.47
10000 435 Geno -1890.30 0.00 2.40 1.00 0.30 0.77 0.23
12000 435 All -1895.48 0.09 2.29 0.96 0.32 0.75 0.25
pdcor p
folktale˜genetic+geographic -0.11 1.00
folktale˜geographic+genetic 0.26 0.00
folktaleL˜geneticL+geographic 0.17 0.00
folkL˜geo+genoL 0.25 0.00
N bindist r.folk p.folk r.folk(Lw) p.folk(Lw) r.folk p.folk r.folk(Lw) p.folk(Lw)
geo geo geo geo geno geno geno(Lw) geno(Lw)
1 114 2000 0.15 0.06 0.15 0.06 0.27 <0.001 0.26 <0.001
2 245 4000 0.36 <0.001 0.64 <0.001 0.16 0.01 0.28 <0.001
3 339 6000 0.31 <0.001 0.70 <0.001 0.13 <0.001 0.40 <0.001
4 411 8000 0.28 <0.001 0.69 <0.001 0.23 <0.001 0.48 <0.001
5 433 10000 0.24 <0.001 0.62 <0.001 0.17 <0.001 0.52 <0.001
6 435 12000 0.31 <0.001 0.57 <0.001 0.20 <0.001 0.55 <0.001
N bindist r.folk p.folk r.folk(Lw) p.folk(Lw) r.folk p.folk r.folk(Lw) p.folk(Lw)
geo geo geo geo geno geno geno(Lw) geno(Lw)
115 2000 0.16 0.09 0.21 0.03 0.45 <0.001 0.40 <0.001
249 4000 0.24 <0.001 0.58 <0.001 0.34 <0.001 0.40 <0.001
343 6000 0.22 <0.001 0.68 <0.001 0.23 <0.001 0.51 <0.001
412 8000 0.22 <0.001 0.67 <0.001 0.22 <0.001 0.55 <0.001
434 10000 0.19 <0.001 0.64 <0.001 0.20 <0.001 0.55 <0.001
435 12000 0.19 <0.001 0.64 <0.001 0.20 <0.001 0.55 <0.001
3
16
8.Diffusionofmostpopulartalesandestimationofpossiblefocalpoints1
fromspatialdistribution2
3
Weidentify19“mostpopular”tales, i.e. folktalesthatarepresent inat least30populationsoutof604
OldWorldpopulationsavailableintheoriginalpresence/absencematrix(TableS2-10,S2-11),whichare5
brieflysummarizedbelow:6
7
o ATU155‘TheUngratefulSnakeReturnedtoCaptivity’:Asnake(oranotherdangerousanimal)8
attacksamanwhorescuesit,andispunishedbyotheranimals.9
o ATU300‘TheDragonSlayer’:Amanrescuesabeautifulmaidenfromadragon/monster,often10
withthehelpofhisdogs.Later,heexposesanimposterwhoclaimscreditforthedeed.11
o ATU301‘TheThreeStolenPrincesses’:Amanrescuesthreewomenfromapit.Hiscompanions12
betrayhimbyleavinghiminthepitandstealingthegirls.Withtheaidofaspirittheheroflies13
upandexposeshiscompanions,marryingtheyoungestgirl.14
o ATU303‘TheTwins,OrBloodBrothers’:Aherorescuesawomanandmarriesher,butislater15
bewitched.Histwinbrothersetsouttofindhim,andismistakenbythewomanforherhusband.16
Thetwinreleaseshisbrother,whokillshiminajealousrage,mistakenlybelievinghimtohave17
seducedhiswife.Thetwinislaterresuscitated.18
o ATU 313 ‘TheMagic Flight’: A man elopes with the daughter of a demon or king. She uses19
magicalobjectstoobstructtheirpursuersandtheyescape.20
o ATU 314 ‘Goldener’: A golden-haired man marries the king’s daughter. He is mocked by his21
brothers-in-law,butsucceedsincompletingheroicdeedswheretheyfailandismadetheheir.22
o ATU325‘TheMagicianandhisApprentice’:Aboyisgiventoamagiciantobehisapprentice.23
Theboylearnstheartofsorceryandfreeshimselffromhismasterafterabattleinwhichthey24
transformintoasuccessionofdifferentkindsofanimals.25
o ATU400‘TheManonaQuestforhisLostWife’:Amiscellaneousgroupofstoriesconcerninga26
manwho is separated fromhiswife during an adventure.When he finds her she is about to27
marryanotherman,butheproveshisidentitytoherandtheyarereconciled.28
17
o ATU403‘TheBlackandtheWhiteBride’:Agirlistomarrytheking,butherstepmothertriesto1
kill her and replace her with her own daughter. The girl proves her identity to the king and2
exposesherstepsisterasanimposter.3
o ATU 480 ‘The Kind and Unkind Girls’: A girl goes on a journey and is kind to those she4
encounters.Sheisrewarded.Herstepmothersendsherowndaughteronthesamejourneybut5
sheisunkindandgetspunished.6
o ATU531‘TheCleverHorse’:Amiscellaneousgroupofstoriesinwhichayoungmanishelpedto7
completesomenear-impossibletasksbyatalkinghorseandmarriesaprincess.8
o ATU550‘Bird,HorseandPrincess’:Threebrothersaresentonaquestbytheirfathertocatcha9
magicbird.Theyoungestbrothersucceedsbutisbetrayedbytheothertwowhotrytoclaimthe10
prize.Withthehelpofamagicalanimaltheheroexposeshisbrothers.11
o ATU554 ‘TheGratefulAnimals’:Amanhelpsa seriesof animals,who reciprocatebyhelping12
himtocompleteaseriesofnear-impossibletasks.13
o ATU560 ‘TheMagic Ring’: A boy acquires amagic ring that grants himwishes. Hemarries a14
princess,whostealstheringtoelopewithher lover.Theboyrecoversaringandpunisheshis15
faithlesswifeandherlover.16
o ATU 563 ‘The Table, the Donkey and the Stick’: A man acquires magical objects from a17
supernatural being. He is cheated out of them and given plain objects in their place, but18
managestorecoverhispossessionsandpunishthecheat.19
o ATU613‘TheTwoTravellers’:After losinganargumentwithhiscompanion,amanisblinded.20
He learns the secrets of birds and recovers his sight as well as gaining new powers. His21
companionimitateshimandispunishedbythebirds.22
o ATU670‘TheManWhoUnderstandsAnimalLanguages’:Asnaketeachesamanthelanguages23
ofanimalsonconditionhekeepsitasecret.Theman’swifenagshimtoteachherbutherefuses24
afterbeingwarnedoftheconsequencesbyamaleanimal(usuallyarooster).25
o ATU700 ‘Thumbling’:Acouplewish forachildandaregivena tinyboythroughsupernatural26
means.Theboyislostandgoesonaseriesofadventuresuntilheisreunitedwithhisparents.27
o ATU707‘TheThreeGoldenChildren’:Awomanmarriesakingandgivesbirthtothreechildren,28
whoarestolenbyherjealoussisters.Whentheygrowupthechildrengoonaquesttofindtheir29
parentsandeventuallyexposethesisters.30
18
One possible explanation for the wide dispersion of these tales is that they spread through the1
disseminationofwritten texts,whichwouldallow themto travelmuch furtherand inamuchshorter2
period than would be possible solely through the vectors of human dispersal and traditional oral3
transmission.Suchaprocesswouldmostlikelyhavecoincidedwiththeemergenceofthefairytaleasa4
popular literarygenre inthesixteenthandseventeenthcenturies (Bottigheimer2014).Althoughmany5
folktaleswereincorporatedintoliteraryworkspriortothisperiod(forexample,inmedievalromances),6
thedevelopmentofnew, cheapprinting technologies togetherwith thegrowthof international trade7
networksandEuropeancolonialismwouldhaveallowedtalestocirculatetomuchwideraudiencesthan8
waspreviouslypossible(ibid.).9
Infact,sevenofthetaleslistedabovewerepublishedinmajorfairytalecollectionsduringthisperiod,10
includingGiovanni Francesco Straparola's LePiacevoliNotti in 1550-55 (ATU314,ATU325,ATU670),11
Giambattista Basile's Lo cunto de li cunti in 1634 (ATU 301, ATU 480, ATU 560), Charles Perrault's12
Histoiresoucontesdutempspasséin1697(ATU480,ATU700).However,oneobviousquestionraised13
by the hypothesis that these tales spread via textual transmission iswhy the other stories contained14
within these collections did not achieve similarly wide cross-cultural distributions. Secondly, while15
literaryversionshaveundoubtedlymadeamajorcontributiontothemodernformsofthesetales,there16
is compelling evidence that they were derived from already well-established and widespread oral17
traditions, rather thantheotherwayround (Ben-Amosetal.2010).Forexample,ATU301 ‘TheThree18
StolenPrincesses’occursinGreekandIndianmythsthatlongpredateBasile’sItalianfairytaleof1634.19
Similarly,versionsofATU325‘TheMagicianandHisPupil’,ATU560‘TheMagicRing’andATU670‘The20
ManWhoUnderstandsAnimal Languages’appear in IndianandMiddleEasternsources (including the21
RamayanaandOneThousandandOneNights) thatare clearly independentof laterEuropean literary22
versionsofthesetales(Thompson1977).23
Analternativeexplanationforthedistributionofthesetales isthattheyreflectsignaturesofdemicor24
cultural diffusion that aremore ancient than the other patterns detected in the dataset. In order to25
characterizethesesignatures,wesoughttoidentifypossiblecentersoforiginanddispersalforthetales.26
To do so, we assessed the amount of linear correlation between geographic distance from each27
populationexhibitingagiven taleand thedistributionof theproportionof the remainingpopulations28
displayingthattaleoverthesamegeographicgradient.Morespecifically,webinnedpairsofpopulations29
intofixedintervalsofgeographicdistance(2000Km),andforeachbinwecalculatedtheproportionof30
populationsexhibitingagivenfolktale.Wethencalculatelinearcorrelationbetweenthedistributionof31
19
percentagesobtained foreachgeographicbinandgeographicdistance fromall thepopulations in the1
dataset that exhibit that folktale. All the above mentioned analyses have been performed in R. The2
assumptionisthat-ifweenvisagealong-rangeandancientdiffusionprocesswhosevectorsaresolely3
humandispersalandtraditionalculturaltransmission-weexpecttoobtainadistance-decaypatterning4
with higher percentages indicating potential “origin populations” from which increasingly lower5
percentagesdepartformingaclinaltrendovergeographicdistance.Toavoidlosinginformationwedid6
notfocusonsinglecoordinatepairsexhibitingthehighestvaluesforeachtale,andplottedinsteadthe7
distributionof correlation coefficientsonamap inorder to visuallydefine themostprobable areaof8
origin (centresoforiginexhibiting the lowest correlationcoefficients; FigureS8-I).Probability surfaces9
wereobtainedbyinterpolatingcorrelationcoefficientscomputedforeachpopulationusingthefunction10
producingplatesplineinterpolationofthefieldspackageinR[47].11
The resulting potential patterns of diffusion are represented in Figure S6-I. Although each of these12
patternspresentssomespecificities,somegeneraltrendscanbefound-assummarizedinthemaintext.13
Inparticular,fourmainmulti-directionalwavesofdiffusioncanbehypothezised:14
1. PotentialAfricanorigin(e.g.ATU670)15
2. SouthwardspreadfromnorthernEurasia(e.g.ATU700)16
3. EasternEuropeanorigin(e.g.ATU301,303,313)17
4. Middle-Eastern/Caucasianorigin(e.g.ATU314,400,480,560)18
19
While further research is needed to verify these patterns (for example, by reconstructing the20
evolutionaryhistoriesofvariantsofeachtaletypetotestwhethertheymatchthedispersalscenarios),21
the results have significant implications for current understandings about the origins of international22
folktale traditions. In particular, they suggest a less Euro-centric view of tale origins than traditional23
"historic-geographic"reconstructionsbasedonthefrequencyofvariants(i.e.thenumberofversionsof24
agiventaletyperecordedineachpopulation)andchronologyofliteraryversions.Amajorproblemwith25
thisapproachisthatconclusionsaboutatale'soriginsmayoftenbeskewedbythestrongEuropeanbias26
in both the richness of the folktale and literary records. For example, ATU 300 'TheDragon Slayer' –27
whichhas beenproposed tobe theoriginal archetype storyline fromwhich all fairy tales arederived28
(Propp 1968)– was previously believed to have originated in medieval western Europe, most likely29
France,wheretheearliestknownversionswererecorded.Ouranalysisinsteadsuggeststhatthistale–30
togetherwith the related taleATU313 'The Twins'mayhave arrived inwestern Europe from further31
20
east,either fromtheregionofmoderndayUkraineandBelarus,orofTurkeyandKurdistan.Similarly,1
whereasfolkloristshaveclaimedthatATU670'TheManWhoUnderstandsAnimalLanguages'originated2
in Europe and was transported to Africa through colonialism, our findings reverse the direction of3
transmissionandsuggestthatthetaleprobablyaroseinNorthAfrica. 4
21
1
22
FigureS1-8-I-Plotoftheprobableareasoforiginofthe19“mostpopular”tales:Probabilitysurfaces1
have been obtained interpolating correlation coefficients computed for each population. Grey dots2
indicatepopulationsthatdonotexhibitthespecifictaleofinterest.1)Tale155;2)Tale300;3)Tale301;3
4)Tale303;5)Tale313;6)Tale314;7)Tale325;8)Tale400;9)Tale403;10)Tale480;11)Tale531;12)4
Tale550;13)Tale554;14)Tale560;15)Tale563;16)Tale613;17)Tale670;18)Tale700;19)Tale707.5