tracking the genetic legacy of past human populations through the grid
DESCRIPTION
Tracking the genetic legacy of past human populations through the grid. Nicolas Ray University of Bern / CMPG (University of Geneva & UNEP/GRID-Europe). ECSAC09, Veli Lošinj, August 26 th 2009. Adapted from Cavalli-Sforza & Feldman, 2003. Human migrations. [12,000]. [55,000]. - PowerPoint PPT PresentationTRANSCRIPT
ECSAC09, Veli Lošinj, August 26th 2009
Human migrationsHuman migrations
Adapted from Cavalli-Sforza & Feldman, 2003
[12,000]
[55,000]
Homo sapiens sapiens
1. Better understand human evolution
• Origin of modern human (when, where, how many?)
• Relationship with other members of the Homo genus
2. Distinguish between the effect of demography and those
of selection (biomedical applications)
Gene-specific factorsmutationsrecombinationselection
A complex past demographyfluctuation in effective pop. sizesubstructuremigrations
Observed patterns of genetic diversity in contemporary populations
50 loci in non-genic regions (Chen and Li, 2001)
About 500 bp each, 24,425 bp in total30 individuals: 10 Africans, 8 Asians, 12 AmerindiansChimpanzee sequenced to get estimation of mutation rates assuming 6 My divergence time
Statistical Evaluation of Alternative Models of Human Statistical Evaluation of Alternative Models of Human EvolutionEvolutionNelson Fagundes, Nicolas Ray, Mark Beaumont, Samuel Neuenschwander, Nelson Fagundes, Nicolas Ray, Mark Beaumont, Samuel Neuenschwander, Francisco Salzano, Sandro Bonatto, and Laurent Excoffier. 2007. Francisco Salzano, Sandro Bonatto, and Laurent Excoffier. 2007. PNASPNAS, , 104: 17614-17619
AF AS AM
AFRIGAFRIG ASIGAF AS AM
ASIGAF AS AM
ASEGAF AS AM
ASEGAF AS AM
AFREGAF AS AM
AFREGAF AS AM
MRE1SAF AS AM
MRE1SAF AS AM
MRE2SAF AS AM
MRE2SAF AS AM
MREBIGAF AS AM
MREBIGAF AS AM
MREBEGAF AS AM
MREBEGAF AS AM
ModelsModelsAfrican replacement Assimilation
Multiregional evolution
tim
e
Model parameters and priorsModel parameters and priors
Africa Asia Americas
SimulationsSimulations
Coalescence theory
A retrospective model of population genetics
Traces all copies of a gene in a sample from a population to a single ancestral copy shared by all members (MRCA)
Assumes no recombination, no selection
Tim
e
Simulated genealogySimulated genealogy
MutationModèle de mutation
AC
CTA
GTA
CA
ATC
GG
TA
ATG
CC
ATTG
GT
TCCTTGTA…ATTGGT
ACCGAGTA…GTTGGT
Summary statistics– Within population:
• S, – Between populations
• Pairwise FST
• Global FST
– Globally• S,
Approximate Bayesian Computations (ABC)
• Calculate summary statistics (S) for observed data sets
• Draw parameter values φ’ from prior distributions, and use them to simulate data
• Calculate summary statistics (S’) on the simulated data set and compare them to the observations: δ = ||S - S’|| (Euclidean distance)
• Accept φ’ if δ is arbitrarily small, otherwise reject sample
The rejection-sampling approach:
The ABC approach (Beaumont et al. 2002)Modification: a local regression is added within the set of accepted φ’
values
Neuenschwander (2006)
COMPUTATIONAL ISSUESCOMPUTATIONAL ISSUES
1-10 mio.
Computer clusters
UBELIX (>500 nodes)
Zooblythii (~40 nodes)
For ABC, 5 mio. demographic simulations are necessary to obtain robust parameter estimations
Each demographic simulation is followed byn genetic simulations (n = num. of loci)
Example
8 simple models, 50 loci, 30 individuals 2 CPU-year
0.218
0.4610.4220.048 0.069
0.9580.042 0.091 0.909
0.001
AF AS AM
AFRIGAFRIG
0.781
ASIGAF AS AM
ASIGAF AS AM
ASEGAF AS AM
ASEGAF AS AM
AFREGAF AS AM
AFREGAF AS AM
MRE1SAF AS AM
MRE1SAF AS AM
MRE2SAF AS AM
MRE2SAF AS AM
MREBIGAF AS AM
MREBIGAF AS AM
MREBEGAF AS AM
MREBEGAF AS AM
Relative probabilities of models of human evolutionRelative probabilities of models of human evolutionAfrican replacement Assimilation
Multiregional evolution
2e-6
1e-53e-5
5e-5
NA-AF
NbASNbMH NbAM
NAMNASNAF
8,0007,0006,0005,000
de
ns
ity
0.0004
0.0003
0.0002
0.0001
0
TMH TASTAM
8,000 4,000 1,600
2e-6
1e-53e-5
5e-5
NA-AF
NbASNbMH NbAM
NAMNASNAF
8,0007,0006,0005,000
de
ns
ity
0.0004
0.0003
0.0002
0.0001
0
TMH TASTAM
8,000 4,000 1,600
142 Kya(104 – 186)
Speciation time
2e-6
1e-53e-5
5e-5
NA-AF
NbASNbMH NbAM
NAMNASNAF
4,0003,5003,0002,5002,000
de
ns
ity
0.0012
0.0009
0.0006
0.0003
0
TMH TASTAM
8,000 4,000 1,600
2e-6
1e-53e-5
5e-5
NA-AF
NbASNbMH NbAM
NAMNASNAF
4,0003,5003,0002,5002,000
de
ns
ity
0.0012
0.0009
0.0006
0.0003
0
TMH TASTAM
8,000 4,000 1,600
51.1 Kya(40.1 – 70.9)
Out-of-Africa time
2e-6
1e-53e-5
5e-5
NA-AF
NbASNbMH NbAM
NAMNASNAF
1,6001,200800400
de
ns
ity
0.008
0.006
0.004
0.002
0
TMH TASTAM
8,000 4,000 1,600
2e-6
1e-53e-5
5e-5
NA-AF
NbASNbMH NbAM
NAMNASNAF
1,6001,200800400
de
ns
ity
0.008
0.006
0.004
0.002
0
TMH TASTAM
8,000 4,000 1,600
10.3 Kya(7.6 – 15.9)
Americas colonization time
A complex demographyA complex demography
Adapted from Cavalli-Sforza & Feldman, 2003
[10,000]
[55,000]
demographic and spatial expansions
population bottlenecks
fast migration events
population isolation
secondary contacts
From environment to demographyFrom environment to demography
Spatial resolution: 100 km
low
high
Carrying capacity
low
high
Friction
From environment to demographyFrom environment to demography
Demographic simulationsDemographic simulations
stepping-stone model (cellular automata)
Cell or deme
Pop
. si
ze
time
SPLATCHESPLATCHESPatiaL And Temporal Coalescences in Heterogeneous Environment
(http://cmpg.unibe.ch/software/splatche)
Vegetation mapsVegetation maps
<Empty Picture><Empty Picture>
Present potential vegetationVegetation at the Last Glacial MaximumLast Glacial Maximum
present potential
Ray et Adams. 2001. Internet Archaeology 11
Taking into account altitudes
Expert system
Demography and spatial expansionDemography and spatial expansion
Population density
Dynamic vegetationDynamic vegetation
intermediateLGM
PP
N[t]
Generations4'0003'5003'0002'5002'0001'5001'0005000
Nu
mb
er
of
pe
op
le p
er
ce
ll
280
260
240
220
200
180
160
140
120
100
80
60
40
20
0
Genetic simulationsGenetic simulations
Computational issuesComputational issues
A fully spatially-explicit model using 500 loci in 800 individuals:
10 CPU-years
Adding long-distance dispersal:
20 CPU-years
SPLATCHE on the gridSPLATCHE on the grid
early 2005: joined the Biomed VO of the EGEE project
mid 2005: tested on GILDA test bed, and deployed on the Grid
since late 2005: testing and improvement
since mid 2006: production mode and optimization
Use of SPLATCHE on the gridUse of SPLATCHE on the grid
N simulations
Independent simulations: - the more CPUs, the better- job failures are not that bad
GRID
Posterior distribution of demographic/genetic parameters of interest
Statistical tools
OptimizationsOptimizations
5 mio. simulations
GRID
Reduction of the number of simulations (Daniel Wegmann)By MCMC. Promising results (~10 times less sims)
Submission timemulti-threaded application using up to 30 RBs (used for the WISDOM project)
Fetching time of job outputsin-house multi-threaded solution for checking status and getting outputs
Geographic origin of human dispersalGeographic origin of human dispersal
Ray et al. (2005) Genome Research
Mutations surfing during a range expansionMutations surfing during a range expansion
Mutations surfing during a range expansionMutations surfing during a range expansion
• Some mutation can travel with the wave of advance
• New mutations can reach high
frequencies
• More pronounced in small populations
Klopfstein, Currat and Excoffier (2006) MBE 23(3): 482-490
Selection ?Selection ?
(2005) Science 509 (5741)
Currat, Excoffier, Maddison, Otto, Ray, Whitlock and Yeaman (2006) Science 313:172a
Interactions among Interactions among populationspopulations
Interaction between modern humans and Neanderthals in Europe
Currat & Excoffier (2004), PLoS Biol.
Plausible introduction site 1LAGOON CREEK (first sight: 1979)
Initial introduction site in AustraliaGORDONVALE (1935)
KDM
NW
B
T
RE
120 0 120 240 360Kilometers
19991982
19881992
1995
1996
1997
1998
Plausible introduction site 2NORMANTON (first sight: 1964)
Cane toad invasion in AustraliaCane toad invasion in Australia
Estoup, A., Baird, S. J. E., Ray, N., Currat, M., Cornuet, J.-M., Santos, F., Beaumont, M. A. and L. Excoffier. Combining genetic, historical and geographic data to reconstruct the dynamics of the bioinvasion of cane toad Bufo marinus. In prep
Take-home messageTake-home message