inference of demographic histories of natural populations...

68
Inference of demographic histories of natural populations using sequence data Coalescence, Mutation and Recombination Richard Durbin [email protected] Cesky Krumlov 24/1/18

Upload: others

Post on 18-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Inferenceofdemographichistoriesofnaturalpopulationsusingsequence

data

Coalescence,MutationandRecombination

[email protected]/1/18

Page 2: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

WhatImeanbydemography

1.  Populationsizegoingbackintime– Actually“effectivepopulationsize”Ne(t)

•  Wewillcomebacktowhatthismeans– Approximatetimerange10k–1Myearsago

•  Againwewillseewhy2.  Populationstructure

–  Subpopulationsandwhentheysplit(andmerged?)

•  Basedonexplicitevolutionarymodels–  Relatepatternsof(shared)geneticvariation

accumulatedsinceacommonancestortohistory

Page 3: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Treeontwosequences•  GustaveMalécot(1940s)

•  Coalescenceisjoiningtogether,inourcasegoingbackwardsintime

•  Chanceofcoalescencepergenerationis1/N•  TMRCAisexponentiallydistributedwithmeanN

Page 4: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Probabilityofobservingamutation

•  Toseeamutation,itmusthavehappenedononeofthebranchessincethecommonancestor

•  P(observedmutation)=2Tµ•  E(observeddifferencerate)=θπ =2Nµ•  Humansarediploid,soθ =4Neµ, whereNeistheeffectivepopulationsize

•  Forhumans,θπ =~0.001– 1/800–1/1200dependingonpopulation

•  HardtomeasureNeandµ independently…

Page 5: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Effectivepopulationsize•  Lotsofmystique/angstaboutthis

–  Ourdefinitionisarguablyatthecoreoftheconcept•  thereciprocaloftheprobabilityofsharingaparentinthepreviousgeneration

•  =1/coalescencerate•  Whythisisdifferentfromcensuspopulationsize:

– Manyconsequencesoccuroverlargenumbers(oftenorderofNegenerations)–longtermaveraging

–  Structuregeneratesnon-randompatternsofcoalescence,andnon-independencebetweengenerations

– Maybeonlyasmallpercentageofindividualsbreed–  Selectionfavourssomeindividualsoverothers

•  Butisalwayssomethingofthisformthatwegetatbypopulationgeneticanalysis

Page 6: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

SegmentsoffixedTMRCAareseparatedbyrecombination

Past

Mutations

Recombination in some ancestor

Page 7: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

PairwiseSequentiallyMarkovianCoalescent

LiandDurbin(2010):Inferenceofhumanpopulationhistoryfromindividualgenomesequences

Hidden Markov Model

Page 8: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

PSMCHiddenMarkovModel

•  Movefromlefttorightinthegenome– LetP(x|t)=prob(datauptox|TMRCAatx = t)– CalculateP(x+1|t)=(∑sP(x|s) r(t|s))e(x)

•  e(x)=“emissionatx”=2𝜇t ifahet,else(1-2𝜇t)•  r(t|s)=prob(recombinationfromTMRCAstot)=2𝜌s prob(coalescebacktot)

+ (1-2𝜌s) ift = s

DependsonN(t’) t’ < s,t

Page 9: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Markovassumption

•  Thismodelassumesthatdatatotheleftofx|TMRCAatx = t

isindependentofdatatotherightofx|TMRCAatx = t

•  Forstandardmixingpopulationsthisisaverygoodassumption– SequentiallyMarkovianCoalescentapproximation,McVean&Cardin2005

Page 10: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

PSMC-HMMreconstructsindividualhistory

•  PairwiseSequentiallyMarkovianCoalescent–HiddenMarkovModel

•  Datasimulatedusingms(Hudson)•  Modelthecoalescenttimetbye.g.50discretebins,spreadlogarithmically

Page 11: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Singlehumangenomewithbootstrap

Page 12: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Humanpopulationhistory,withNeanderthals

HengLiScaled coalescent time

Page 13: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

AdvancessincetheoriginalPSMC

1.  UseSMC’modelwhichcorrectlyhandlesrecombinationscoalescingbacktothesameancestor(Schiffels,…)

–  Minortweaktoequations,butsignificant–  Cannowfitrecombination:mutationratio–  ImplementedinMSMC/MSMC2

2.  Timespeedup:linearnotquadraticinnumberoftimeslices(Harris,...Song,2014)

Page 14: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

CoalesecentNe(t)reflectsancestralstructureaswellaspopulationsize

•  PSMCactuallymeasuresλ = 1/coalescence rate

•  Structurecanalsochangecoalescencerate– Li&Durbinsupplement– OlivierMazet…Chikhi

N-island model Migration between islands controls coalescent rate

Page 15: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Humanpopulationhistory,withNeanderthals

rise of anatomicalmodern humans 1-200kya

origin of Homo1.5-2MyaHengLi

Page 16: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

DramaticrecentradiationsofhaplochromimecichlidsintheAfricanriftvalleygreatlakes

•  LakeMalawi~500specieswithinlast1Myears

Sofarwehavesequenced~80speciesat15-20xcoverage

Briefintroductiontoanothersystem

Page 17: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

LakeMalawicichlidPSMCAlticorpusAulonocaraBuccochromisCopadichromisLethrinopsMylochromisNimbochromisOtopharynxPlacidochromisTremitochranus

popu

latio

n siz

e

104

105

106

107

108

time [years]104 105 106

(Very) approximate 4x 4x 4x

Page 18: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Isstructureassociatedwithspeciation?

•  Perhaps–  Ideasofhybridspeciation,reuseofallelesselectedindifferentenvironments,hybridswarmsandgeneflow

•  Butthisisanothertalk…

Page 19: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Mightstructurebe(partly)identifiableinthePSMCmodel?

•  TheinferredvaluesN(t)havedimensionT,thenumberoftimebins

•  ButthetransitionmatrixMhasdimensionT2•  CurrentlywederiveMfromNbytheoryassumingpanmixia–  Istherearichertheoryforstructuredpopulations?–  HowtoparameterisestructuralcomplexityS(t)attimet,withassociatedtheoryforM(N,S)

•  OrcanwefitthetransitionmatrixMunconstrained?–  Thensearchforevidenceofstructurewithinit–  Andordogoodnessoffit?

Page 20: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Addinganothersequence

•  Chanceofcoalescencepergenerationfromthreesequencesis3/N

•  Oncewehaveacoalescencewearebacktothesituationwithtwosequences

•  Fromisequenceschanceisi(i-1)/2N

Page 21: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Digression:Coalescentmodel(Kingman,1980)Adistributionontrees

•  T(i)~exponentialwithmean2N/i(i-1)

Page 22: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Propertiesofthecoalescent

•  Asweaddextrasequences,theyareincreasinglylikelytocoalesceveryfast,andincreasinglyunlikelytoaffectthefullTMRCA

•  Treesareveryvariable– E.g.4samples

on6leaves

Theexpectedheightofthetreeformanysamplesisonlytwicethatwithtwosamples

Page 23: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Relationshipbetweenforwardsintime(Wright-Fisher)andbackwardsintime(Coalescent)models

Populationevolutionforwards

Coalescenttreebackwards

ThecoalescenttreedescribesasamplefromtheforwardprocessKingmancoalescentgeneratesan“exact”samplefromWright-Fisher

Page 24: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Geneticvariationinasample

•  Mutationsoccuratrandomonthetree– Separationofsourcesofrandomness

•  Randomdemographytreestructurefromcoalescent•  Randomsamplingofmutationsonthetree

Watterson’stheta

LetSbethenumberofmutations=segregatingsites

E(S)~θlogn

Page 25: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

•  Densityofmutationswithfrequencyiinasampleofnisθ/i

•  1/fdistributionofpopulationallelefrequencies

•  Populationminorallelefrequencydistributionofadifferenceobservedbetweentwosequencesisflat–  Probability(1/f).2f(1-f)=2(1-f),foldedat½is2

Distributionofvariantallelefrequencies

sitefrequencyspectrumSFS

Page 26: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Relaxationofassumptions(1)

•  E.g.changeinpopulationsizechangesthesitefrequencyspectrum•  Tajima’sD

– Sensitivetonumberofraremutations,sochangeinNe

–  IfDispositivethereisadeficiencyofraremutations•  Excessrecentcoalescences,recentsmallNe-selection

D>0 D<0

ThisisthebasisofSFS-baseddemographyinference

Page 27: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Fig. 2 The expected site frequency spectrum (SFS) of the derived allele (the new mutation arisen in the population) for three different demographic models: (i) a population that has been of constant size throughout history; (ii) a model previously fit to the derived allele

frequency spectrum of Europeans, which includes an out-of-Africa population bottleneck and a second, more recent, population bottleneck (21); and (iii) the same two-bottleneck model of

European history with the addition of recent exponential growth from a population size of 10,000 at the advent of agriculture to an extant effective population size of 10,000,000, which

amounts to 1.7% growth per generation during the last 400 generations.

A Keinan, A G Clark Science 2012;336:740-743

Page 28: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Individualsinhumanoutbredpopulationsstillcarrymanyvariantsnotinthelargesequencedatasets(1000Genomesetc.)

•  Exponentialpopulationgrowthinlast10,000yearsgiveslongtipstothetree

•  In“big”populations,tipsarehundredsofgenerationslong,sotensofthousandsofprivatevariantspersample,hundredsfunctional

Page 29: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Thisbehaviourisverydependentonpopulationstructure.

Ingeneticisolatestherecenteffectivepopulationsizeissmaller,andthetipsareshorter

Page 30: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Whataboutrecombination?•  Ifpointsonthegenomeareveryclose,e.g.adjacent,theysharethesametree

•  Ifpointsareveryfar,theirtreesaresampledfromthecoalescentindependently

•  Whathappensinbetween?

•  Arecombinationintheancestorofamodernsequencemadeitoutoftwoseparatesequences,onecontributingtotheleftandonetotheright

Page 31: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Recombinationchangesthetreeasyoumovealongthesequence

Typicallyrecombinationrateiscomparabletoorlargerthanthemutationrate:both~10-8/bp/geninhumanSo“genetree”varieseverysiteinmixingpopulations

Page 32: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

AncestralRecombinationGraph(ARG)

•  TheAncestralRecombinationGraphdescribesthewaythatindividualsequencesinapopulationarerelated–  Atalocus,sequencesarerelatedbyatree–  Ancestralrecombinationschangethetreeasyoumovealongthechromosome

a ..C..G..A..b ..T..G..C..c ..T..A..A..d ..T..A..C..

0 0 01 0 11 1 01 1 1

a b c d a b c da b c d

1

a b c d

2 3

R

ARG “Pruneandgraft”operationgoinglefttoright

Page 33: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Coalescentwithrecombination

•  ARGisastructure(datatype)•  TheprobabilitydistributionoverARGsthatariseswhenrecombinationisaddedtothestandard(Wright-Fisher)modeliscalledtheCoalescentwithRecombination– Hudson’smssoftwareistheclassicsimulator– NewmsprimefromJeromeKelleherMUCHfaster

•  Nowtwopossibleeventsgoingbackwardsintime–  Coalescence:whichmergestwosequences

•  Forisequences,rateisi(i-1)/2N–  Recombination:whichsplitsasequenceintotwo

•  Forisequences,rateisiLρ

Page 34: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Extendingtomultiplesequences

•  Therecenttimelimitof~20kyaforPSMCissetbecausewerunoutofrecentcoalescencesbetweentwohaplotypes

•  Ifweaddmorehaplotypes,thentherearemorerecentcoalescencesandwecouldlookatmorerecenthistory

•  But,…thehiddenstateisthenatree(withbranchlengths):impracticaltomodelfully– MCMCisnotoriouslydifficult

Page 35: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Option1:Firstcoalescenceofonesequencetothetreeoftheothers

•  ThisisrelatedtotheLiandStephensmodel(orStephensandDonnelly)–chromopainter

t1

t2

t3

t4

t1

t2

t3

t4

Page 36: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

•  M(t)isarandomvariable,andweneedtheentirehistoryofM(t)tocalculatetransitionprobabilitiesq

•  Hugeincreaseinstatespaceand/orthisbreaksMarkovassumptions

t1

t2

t3

t4

Problem:Coalescenceofchosensequencetotheothersdependsonthenumberoflineages

M(t)remainingattimet

Page 37: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

MCMCapproach:ARGweaver•  Repeatedlyremoveasequence*andadditback,samplingconditionalonremainingARG

•  HMM:samplewithforward-backwardalgorithm

Genome-wideinferenceofancestralrecombinationgraphsRasmussenMD,HubiszMJ,GronauI,SiepelA.PLoSGenet.10:e1004342(2014)

•  Costly–useforinferencegivenhistory

Page 38: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Option2:firstcoalescencebetweenanypair

•  Thisremains(approximately)Markov•  StatespaceisO(M2T)–pairofstatesandtimetheycoalesce– ButtransitionupdatesareonlyO(M2T2),becausetransitionsarememoryless

•  EmissionsfromXijaresingletonsoniorj– Non-singletonsthatarediscrepantbetweeniandjwipeoutdensityatXij

Page 39: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

MSMC

StephanSchiffelsandDurbin(NatureGenetics,2015)

Page 40: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

MSMCcanfitbothpopulationsizehistoryandseparationhistory

•  Separationviathe(scaled)ratioofcoalescencebetweenandwithinpopulations

Page 41: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Accessmorerecenthistory

Uselowermutationratehere~0.5x10-9/year

Page 42: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Divergencebetweenpopulations

FirstCoalescencewithinPopulation2

FirstCoalescencewithinPopulation1

FirstCoalescenceacrossbothpopulations

• MSMCcaninferseparatecoalescencerateswithinandbetweenpopulations

• Givenrateswithinpopulations,λ11(t)andλ22(t),andacrosspopulations,λ12(t),computerelativegeneflowasratio

λ12(t) [λ11(t)+ λ22(t)] / 2

m(t) =

•  Idea:Inferseparatecoalescencerateswithinandbetweenpopulations:

Page 43: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Testinggeneflowinferencewithsimulatedsplit

☜m(t)=1:perfectlymixed

☜m(t)=0:perfectlysplit

4haplotypes:goodforsplits50-200kya.8haplotypes:goodforsplits5-50kya.

Page 44: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Separationhistory

Page 45: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

AlternativestoMSMC

•  MSMC2(Schiffels:inMalaspinas2016/unpub.)– RunPSMC’onallpairsofsequencesindependently– Multiplythelikelihoods–Compositelikelihood

•  Assumesthepairsareindependent,whichisfalse•  Butgivesunbiasedestimation(thoughoverconfident)

•  SMC++(Terhorst,Kamm,Song:NatGen2017)– Pair,withp(het|othersequences)– Verycool–worksevenongenotypedata!– ButthereareapproximationproblemsanalogoustothoseinMSMC–notapanacea

Page 46: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Usingrarevariantstoinferdemographichistory

•  Rarevariantscontaininformationaboutrecentpopulationhistoryandstructure

•  Shownhere:numberofdoubletonssharedamongEuropeansamples

•  Wewouldliketoestimatepopulationsplittimesandpopulationsizesfromthefrequencyofrarevariants

CEU FIN GBR IBS TSI

[1000GenomesProject,Phase3]

ComparetoChromoPainterdata

Page 47: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

AncientsamplesfromHinxton

12884A,Ironage

12883A,Saxon

12880A,Ironage

12881A,Saxon12885A,Saxon

Page 48: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

MoresamplesfromLinton/Oakington

Page 49: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Sharingpatternsbetweenancientandmodernsamples

•  DifferencebetweenAnglo-SaxonandIronAgesharingwithNEDandIBSconsistentacrossdifferentAlleleCounts

•  SmallbutsignificantdifferencesalsowithinmodernBritain(UK10K):SamplesfromWalesandScotlandsharefewerrarevariantswithDutchpeople

Page 50: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

EstimatesofAnglo-SaxoncontributiontomodernBritishgenomes

•  Suggests~30%SaxoncontributiontosamplesinEastofEngland,and~20%toUK10KsamplesfromWalesandScotland

•  Consistentwith20-40%indirectestimatefromPOBI(PeoplesoftheBritishIsles)study

Page 51: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Therareallelecoalescent• Goal:Estimatedemographichistory(populationsizesandsplittimes)fromrarevariants

•  Computelikelihoodofdemographicmodelgivenadistributionofrarevariants

Page 52: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

RareCoalmodel

•  Idea:Definerecursionequationsforprobabilityofobservingiderivedallelesinpopulationk:

•  Givenademographicmodel,propagatethisprobabilitybackwardsintimetogetfulllikelihoodofthedata.

•  Keysimplification:Treatnumberofancestralallelesovertimeasaverage(mean-fieldapproximation):

Page 53: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Testinferencewithsimulateddata

100

200

300

400

Time(generations)

10 5 2 4 5

0.4

0.5

0.2

1scaledpopulationsizes

92

200

299

388

10.3 5 2 4 5

0.47

0.5

0.22

0.87

Simulated Estimated

Fits(100samplesperpop.):Pattern Count(real) Count(predicted)

0,1,0,2,1 1114 1159

2,1,0,0,0 140585 139657

1,0,2,0,0 1138 1205

thousandsofrows…

Fittingpopulationsizesandsplittimesseparatesdriftfromdivergence->differentfromTreemix,qpGraphetc.

Page 54: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

EuropeanTree(Fits)

Page 55: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Placingancientsamplesonthetree

•  PlotsshowthelikelihoodformergingthepopulationN=1sampleontothetreeasaheatmap

Page 56: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Moredirectcalculationofthelikelihoodofthejointsitefrequency

spectrumwithmomi

JackKamm…Song2016,andunpublished

•  ComplexityofancestralallelestateisreducedbyusingMoranmodel

•  UseAutomaticDifferentiationtocalculategradientstomaximiselikelihoodoverdemographywith(limited)geneflow

Page 57: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

MomiappliedtocentralAsiandata

•  Includeancientsamples–  Conditionascertainmentonmodern/deepsamples

•  Totalbranchlengthonthese

–  Randomallelesamplingforlowcoveragesamples

•  Estimatessplittimes•  Bootstrapforconfidenceintervals–  Butbewaremodelmisspecification

withPeterdeBarrosDamgaard,RuiMartiniano,MartinSikora,EskeWillerslevetal.

Page 58: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Momicalculations

•  TocalculateP(x1,x2,x3,…)– SetleavestoIndicator(xi),e.g.[0,0,1,0…0] forxi=2– Propagatelikelihoodsuptree(“tree-peeling”)

•  Cancorrespondinglycalculatetheexpectationofanymulti-linearfunctionofallelecounts– 𝔼[f1(x1)f2(x2)f3(x3)…]

•  bysettingleafito[fi(0), fi(1), . . . , fi(n1)]– Worksbecausepropagationislinear

Page 59: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Examples

•  Totalbranchlength∝chanceofanymutation–  fi(j) = 1, vectoris[1,1,1…1]

•  TMRCAforpopi(i arbitraryunlessancientmodel)–  fi(j) = j/ni , vectoris[0,0.2,0.4,0.6,0.8,1] for ni=5–  fk(j) = 1, k ≠ i

•  f3 = 𝔼[(X1-X3)(X2-X3)], f4 = 𝔼[(X1-X2)(X3-X4)]–  Requirestermssuchas𝔼[X1X2]forwhich–  f1(j) = j/n1, f2(j) = j/n2, fk(j) = 1, k > 2

•  Alsonumerators,denominatorsofFST,Tajima’sD

Page 60: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Summary

•  PSMC(‘)estimatesdemographyfromasinglepairofsequences– Samplesizeisinlengthnotnumber– Quiteacleanmodel– Majorissueispopulationstructure

•  MSMC,MSMC2,SMC++useadditionalsamplestogetatmorerecenttimes

•  RareCoal/MomiusecoalescentmodellingoftheSFSonmoresamplestoestimatetrees– WithlimitedmodelledgeneflowforMomi

Page 61: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence
Page 62: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Experimentaldesign

•  (Sequence)datacollectioncostsmoney•  Wealwaysneedtomakedecisionsinhowtosampleandsequence– Numberofsamples– Numberofpopulations– Depthofsequencing– WholeGenomeShotgunorRADseqorExomes…

Page 63: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

1000GenomesProject

•  Pilot(averylongtimeago!)– 2triosathighdepth30x

•  Phasing,accuratesingle-samplegenotypecalling,mutationrates

– 3populationsx60samplesatlowdepth2-4x+exomes

•  Mainproject– 26populationsof~100(2504total)at6-8x(+exomes)–  (150triosathighdepth–butwhoremembersthem?)

Page 64: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Malawicichlidsequencing

•  Phase1– Threetriosat30x:mutationrateestimation,controls– ~70speciesat15-20x,additionalsamplesforsomeat8-12x

•  Phase2– 7setsof20at15x– Morespecies– Somesetsof24or48toaddressspecificquestions

•  MassokoGWAS(Turner)– 200samplesat4x+100samplesforreplication

Page 65: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Lowcoveragesequencingstrategy

•  Typicallyoneneedstosequenceat~30xdepthtofind(almost)allvariantsinasample

•  Tofindlowfrequencyvariantswewanttosequencemanysamples

•  Spreadsequenceacrossmoresamples

Page 66: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Phase1powerandgenotypingaccuracy

SNPdetection Genotypingaccuracy

HyunMin-Kang(UMichigan)

Page 67: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Callingfromlowcoveragesequence

•  Multi-samplecallsiteswithsamtoolsorGATK•  Obtaingenotypelikelihoodsateachsiteineachsame(alsosamtoolsorGATK)– Likelihood=P(data|genotype)

•  CombineinanimputationframeworkusingBEAGLE(Browning),orMINIMAC(Abecasis),orperhapsSTITCH(Mott)?

•  PhaseusingSHAPEIT2(Marchini)orEAGLE2(Loh)

Page 68: Inference of demographic histories of natural populations ...evomicsorg.wpengine.netdna-cdn.com/wp-content/... · Inference of demographic histories of natural populations using sequence

Sequencingdepth•  30xisstandardfornear-completeaccuracy

–  Sufficienttoestimatemutationratesintrios(needseveraltriosformostspecies)

•  15xisgoodenoughforSNPs(~97%),notquitesogoodforindels(perhaps90-95%)

•  4-8xgivesgoodlowcoverageimputationasinpreviousslides

•  Peoplehaveused1-2x,butthisishardwork…•  60x+isnecessaryforsubclonalstructure,e.g.cancer,highploidy

•  Inacross,sequencethefounderstohighdepth,andtheF2/F3tolowdepth(1xorlessisfine)andimputeusingSTITCHorotherRichardMotttools