lecture 2: population structuresssykim/teaching/s13/slides/lecture2.pdfpopulation structure from...
TRANSCRIPT
![Page 1: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/1.jpg)
Lecture 2: Population Structure
02-‐715 Advanced Topics in Computa8onal Genomics
1
![Page 2: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/2.jpg)
What is population structure?
• Popula8on Structure – A set of individuals characterized by some measure of gene8c
dis8nc8on
– A “popula8on” is usually characterized by a dis8nct distribu8on over genotypes
– Example Genotypes aa aA AA
Popula8on 1 Popula8on 2
2
![Page 3: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/3.jpg)
Motivation
• Reconstruc*ng individual ancestry: The Genographic Project – hIps://genographic.na8onalgeographic.com/genographic/index.html
• Studying human migra*on – Out of Africa
– Mul*-‐regional hypothesis
• Study of various traits – Lactose intolerance
– Origins in Europe?
– Infer from
• Migra8on studies
• Muta8on studies in popula8ons
3
![Page 4: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/4.jpg)
200,000 years ago
50,000 years ago
30,000 years ago 10,000 years ago
hIps://genographic.na8onalgeographic.com/genographic/index.html
4
![Page 5: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/5.jpg)
Overview
• Background – Hardy-‐Weinberg Equilibrium
– Gene8c driZ – Wright’s FST
• Inferring popula8on structure from genotype data – Structure (Falush et al., 2003) – Matrix factoriza8on/dimensionality reduc8on methods (Engelhardt &
Stephens, 2010)
5
![Page 6: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/6.jpg)
Hardy-Weinberg Equilibrium
• Hardy-‐Weinberg Equilibrium – Under random ma8ng, both allele and genotype frequencies in a
popula8on remain constant over genera8ons.
– Assump8ons of the standard random ma8ng • Diploid organism
• Sexual reproduc8on • Nonoverlapping genera8ons • Random ma8ng
• Large popula8on size • Equal allele frequencies in the sexes • No migra8on/muta8on/selec8on
– Chi-‐square test for Hardy-‐Weinberg equilibrium
6
![Page 7: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/7.jpg)
Hardy-Weinberg Equilibrium
• p q: allele frequencies of A and a • D, H, R: genotype frequencies for AA, Aa, aa, respec8vely.
– D = p2 – H=2pq – R=q2
7
![Page 8: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/8.jpg)
Hardy-Weinberg Equilibrium
• p q: allele frequencies of A and a • D, H, R: genotype frequencies for AA, Aa, aa, respec8vely.
8
![Page 9: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/9.jpg)
Hardy-Weinberg Equilibrium
• The genotype and allele frequencies of the offspring
9
![Page 10: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/10.jpg)
Testing Whether Hardy-Weinberg Equilibrium Holds
• Chi-‐square test – Null hypothesis: HWE holds in the observed data
– Test if the null hypothesis is violated in the data by comparing the observed genotype frequencies (in the parent genera8on) with the expected frequencies (in the offspring genera8on)
![Page 11: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/11.jpg)
Testing Whether Hardy-Weinberg Equilibrium Holds
Genotype AA Aa aa Total
Observed 224 64 6 294
Expected ? ? ? 294
![Page 12: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/12.jpg)
Testing Whether Hardy-Weinberg Equilibrium Holds
Genotype AA Aa aa Total
Observed 224 64 6 294
Expected 222.9 66.2 4.9 294
Step 3: Compute the test sta8s8c
€
χ2 =(observed - expected)2
expected∑
=(224 − 222.9)2
222.9+(64 − 66.2)2
66.2+(6 − 4.9)2
4.9= 0.32
€
p =224 × 2 + 64294 × 2
= 0.871
q =1− p = 0.129
Step 1: Compute allele frequencies from the observed data
€
Expected(AA) = p2n = 0.87072 × 294 = 222.9Step 2: Compute the expected genotype frequencies
![Page 13: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/13.jpg)
Genetic Drift
• The change in allele frequencies in a popula8on due to random sampling
• Neutral process unlike natural selec8on – But gene8c driZ can eliminate an allele from the given popula8on.
• The effect of gene8c driZ is larger in a small popula8on
13
![Page 14: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/14.jpg)
Population Divergence
• Wright’s FST – Sta8s8cs used to quan8fy the extent of divergence among mul8ple
popula8ons rela8ve to the overall gene8c diversity
– Summarizes the average devia8on of a collec8on of popula8ons a way from the mean
– FST = Var(pk)/p’(1-p’) • p’: the overall frequency of an allele across all subpopulations • pk :the allele frequency within population k
14
![Page 15: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/15.jpg)
Scenarios of How Populations Evolve
15
![Page 16: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/16.jpg)
Methods for Learning Population Structure from Genetic Markers
• Low-‐dimensional projec8on – Matrix-‐factoriza8on-‐based methods (PaIerson et al., PLoS Gene8cs 2006)
• Model-‐based clustering – STRUCTURE (Pritchard et al., Gene8cs 2000)
16
![Page 17: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/17.jpg)
Low-dimensional Projections
• Gene8c data is very large – Number of markers may range from a few hundreds to hundreds of
thousands
– Thus each individual is described by a high-‐dimensional vector of marker configura8ons
– A low-‐dimensional projec8on allows easy visualiza8on
• Allows projec8on of individuals into a low dimensional space
• Usually projected to 2 dimensions to allow visualiza8on
17
![Page 18: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/18.jpg)
Matrix Factorization and Population Structure
• Matrix factoriza8on for learning popula8on structure
Genotype Data (NxP matrix)
N: number of samples P: number of genotypes
Individuals’ ancestry propor8ons (NxK matrix) K: number of subpopula8ons
Subpopula8on Allele Frequencies (KxP matrix) = x
18
![Page 19: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/19.jpg)
Unifying Framework of Matrix Factorization
• PCA – Based on eigen decomposi8on: columns of Λ are orthogonal, rows of F
are orthnormal. – Works well for the case of isola8on-‐by-‐distance (con8nuous varia8on
of popula8ons among individuals)
• Admixture – Based on probability models: rows of Λ and columns of F should sum
to 1. – Works well if the individuals are admixtures of discretely separated
popula8ons
• Sparse factor model – Sparsity via automa8c relevance determina8on prior
19
![Page 20: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/20.jpg)
Principal Component Analysis
• Most common form of factor analysis
• The new variables/dimensions ... – Are linear combina8ons of the original ones
– Are uncorrelated with one another • Orthogonal in original dimension space
– Capture as much of the original variance in the data as possible
– Are called Principal Components
20
![Page 21: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/21.jpg)
What are the new axes?
Original Variable A
PC 1 PC 2
• Orthogonal direc8ons of greatest variance in data • Projec8ons along PC1 discriminate the data most along any one axis
Original Variable B
21
![Page 22: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/22.jpg)
Principal Components
• First principal component is the direc8on of greatest variability (covariance) in the data
• Second is the next orthogonal (uncorrelated) direc8on of greatest variability – So first remove all the variability along the first component, and then find the next direc8on of greatest variability
• And so on …
22
![Page 23: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/23.jpg)
Dimensionality Reduction
Can ignore the components of lesser significance.
You do lose some informa8on, but if the eigenvalues are small, you don’t lose much
– n dimensions in original data – calculate n eigenvectors and eigenvalues – choose only the first p eigenvectors, based on their eigenvalues – final data set has only p dimensions
23
![Page 24: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/24.jpg)
PCA Analysis (Cavalli-sforza,1978)
• Plot of geographical distribu8on of 3 PCs (Intensity propor8onal to value of each component) – First – blue
– Second -‐ green
– Third -‐ red
24
![Page 25: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/25.jpg)
Discrete/Admixed Populations
SFA
PCA
Admixture
Loading (popula8on) 1 Loading 2 Loading 3
25
![Page 26: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/26.jpg)
Analysis of European Genotype Data
PCA SFAm Admixture 26
![Page 27: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/27.jpg)
Probabilistic Models for Population Structure
• Mixture model – Cluster individuals into K popula8ons
• Admixture model – The genotypes of each individual are an admixture of mul8ple
ancestor popula8ons
– Assumes alleles are in linkage equilibrium
• Linkage model – Model recombina8on, correla8on in alleles across chromosome
27
![Page 28: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/28.jpg)
• Organizing data into clusters such that there is
• high intra-‐cluster similarity
• low inter-‐cluster similarity
• Informally, finding natural groupings among objects.
![Page 29: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/29.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
k1
k2
k3
• For a pre-‐defined number of clusters K, ini8alize K centers randomly
![Page 30: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/30.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
k1
k2
k3
• Iterate between the following two steps – Assign all objects to the nearest center.
– Move a center to the mean of its members.
![Page 31: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/31.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
k1
k2
k3
• AZer moving centers, re-‐assign the objects…
![Page 32: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/32.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
k1
k2
k3
• AZer moving centers, re-‐assign the objects to nearest centers.
• Move a center to the mean of its new members.
![Page 33: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/33.jpg)
k1
k2 k3
• Re-‐assign and move centers, un8l no objects changed membership.
![Page 34: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/34.jpg)
Soft-Clustering of Individuals into Three Clusters with Gaussian Mixture Model
Cluster 1 Cluster 2 Cluster 3
0.1 0.4 0.5
0.8 0.1 0.1
0.7 0.2 0.1
0.10 0.05 0.85
… … …
… … …
… … …
… … …
… … …
… … …
Probability of
Individual 1
Individual 2
Individual 3
Individual 4
Individual 5
Individual 6
Individual 7
Individual 8
Individual 9
Individual 10
Sum
1
1
1
1
1
1
1
1
1
1 • Each individual can assigned to more than one clusters with a certain probability. • For each individual, the probabili8es for all clusters should sum to 1. (i.e., each row should sum to 1.) • Each cluster is explained by a cluster center variable (i.e., cluster mean)
![Page 35: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/35.jpg)
Mixture Model
• The goal is to discover K clusters for K popula8ons from NxJ genotype matrix (N: # of samples, J: # of loci) (xi,n in the diagram on the right)
• Assume K popula8ons (clusters)
• θ = Distribu8on over popula8ons – Mixing propor8ons in mixture model
• β = Distribu8on over alleles at each locus in each popula8on – Mixture component model in mixture model
• To generate an individual’s genome – All individuals share the same θ – Sample zi from Mul8nomial(θ) – For each locus
• Sample xi,n from β corresponding to the popula8on chosen by zi
35
βki =1…I λ
xi,n
zi,
θ
i=1…J
n=1…N
α
k=1…K
![Page 36: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/36.jpg)
Admixture Model
• Relax the assump8on of one popula8on per individual in mixture model
• Individuals can be assigned to mul8ple different popula8ons in different loci
36
![Page 37: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/37.jpg)
The Admixture Model
• β = Distribu8on over alleles – One per popula8on –locus pair
• To generate an individual’s genome – Sample θn from Dirichlet(α)
– For each locus • Sample zi,n from Mul8nomial(θn)
• Sample xi,n from β corresponding to the popula8on chosen by zi,n
37
![Page 38: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/38.jpg)
Structure Model
• Hypothesis: Modern popula8ons are created by an intermixing of ancestral popula8ons.
• An individual’s genome contains contribu8ons from one or more ancestral popula8ons.
• The contribu8ons of popula8ons can be different for different individuals.
• Other assump8ons – Hardy-‐weinberg equilbrium
– No linkage disequilbrium – Markers are i.i.d (independent and iden8cally distributed)
38
![Page 39: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/39.jpg)
Linkage Model
• From admixture model, replace the assump8on that the ancestry labels zil for individual i, locus l are independent with the assump8on that adjacent zil are correlated.
• Use Poisson process to model the correla8on between neighboring alleles – dl : distance between locus l and locus l+1 – r: recombina8on rate
39
![Page 40: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/40.jpg)
Linkage Model
• As recombina8on rate r goes to infinity, all loci become independent and linkage model becomes admixture model.
• Recombina8on rate r can be viewed as being related to the number of genera8ons since admixture occurred.
• Use MCMC algorithm to fit the unkown parameters.
40
![Page 41: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/41.jpg)
Population Structure from Ancestry Proportion of Each Individual
• How to display popula8on structure?
Genetic structure of Human Populations (Rosenberg et al., Science 2002)#
Africa Europe Mid-‐East Cent./S. Asia East Asia Oceania
Ancestral proportion
41
![Page 42: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/42.jpg)
Population of Origin Assignments of a Single Individual
True origin
Es8mated Origin (Unphased data)
Es8mated Origin (Phased data)
42
![Page 43: Lecture 2: Population Structuresssykim/teaching/s13/slides/Lecture2.pdfPopulation Structure from Ancestry Proportion of Each Individual • ’How’to’display’populaon’structure?’](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35e57d3d93d6412a3cc2ed/html5/thumbnails/43.jpg)
Comparison of Different Methods
PCA Model-‐based Clustering
Advantages • Sta8s8cal tests for significance of results (PaIerson et al. 2006) • Easy visualiza8on
• Genera8ve process that explicitly models admixture • Clustering is probabilis8c: it is possible to assign confidence level of clusters
Disadvantages • No intui8on about underlying processes
• Computa8onal more demanding • Based on assump8ons of evolu8onary models: • Structure: No models of muta8on, recombina8on • Recombina8on added in extension by Falush et al.
43