evolutionsssykim/teaching/f11/slides/lecture11.pdf · 2011. 10. 4. · correcting for ascertainment...
TRANSCRIPT
![Page 1: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/1.jpg)
Evolution
02-‐715 Advanced Topics in Computa8onal Genomics
![Page 2: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/2.jpg)
Ascertainment Bias
• SNP discovery phase – Assume SNPs have been ascertained in an alignment of different
sequences of fixed depth d
– The final sample size n – Ascertainment condi8on: the locus was variable in the ascertainment
sample
![Page 3: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/3.jpg)
Correcting for Ascertainment Bias
• Likelihood for allele frequencies aMer condi8oning on ascertainment (i.e., unobserved true allele frequencies)
![Page 4: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/4.jpg)
Various Extensions
• Varia8on in d – Informa8on about d is not known, but we know the distribu8on of d
among loci
• Allele frequencies in the ascertainment sample is unknown – The ascertainment sample may not have been included in the final
typed sample.
– Ascertainment condi8on: the variability in the ascertainment sample and variability in the typed sample.
![Page 5: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/5.jpg)
Correcting for Ascertainment Bias (Nielson et al., 2004)
• Illustra8on through simula8on study (20 genes, 10,000 SNPs, 5 genes for ascertainment)
![Page 6: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/6.jpg)
Ascertainment Bias from HapMap Analysis
![Page 7: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/7.jpg)
Cross-species Sequence Analysis
• Func8onal regions of genomes are conserved across species
• Cross-‐species sequence conserva8on is believed to occur because of nega8ve (purifying) selec8on
• About 5% or more of bases in mammalian genomes are under purifying selec8on
• Protein coding genes account for 1.5% of the regions under purifying selec8on
![Page 8: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/8.jpg)
Phylo-HMM
• Parse aligned sequences into two classes – Conserved vs. nonconserved
• Maximum likelihood es8ma8on of parameters of Phylo-‐HMM
![Page 9: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/9.jpg)
Phylo-HMM
• μ, ν: transi8on probabili8es
• States – c: conserved region – n: non-‐conserved region
• ψn, ψc: emission probabili8es as a tree
![Page 10: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/10.jpg)
Phylo-HMM
• ψn, ψc: emission probabili8es as a phylogene8c model – Iden8cal phylogene8c model structure for two states
– ρ: scaling factor for branch length 0≤ρ≤1 • Average subs8tu8on rate
![Page 11: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/11.jpg)
Datasets
• Vertebrate species – Human, mouse, rat, chicken, fugu rubripes
– Alignment with human as reference sequence
• Insect species – Three species of Drosophila and Anopheles gambiae
– Alignment with D. melanogaster as reference sequence
• Two species of Caenorhabdi8s – Alignment with C. elegans as reference sequence
• Seven species of saccharomyces – Alignment with S. cerevisiae as reference sequence
![Page 12: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/12.jpg)
Phylogenetic Models: Assumed Topologies and Estimated Branch Lengths
![Page 13: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/13.jpg)
Estimated Conserved Elements
• More complex organisms have more conserved regions outside of coding regions
Vertebrate
Insect
Worm
Yeast
![Page 14: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/14.jpg)
Conservation Around GRIA2 in Human
![Page 15: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/15.jpg)
Extreme Conservation
• Extreme conserva8on at the 3’ end of the ELAVL4 gene
![Page 16: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/16.jpg)
Key Observations
• Conserved regions – 3%-‐8% of the human genome conserved in vertebrates and other
mammals – 37-‐53% in D. melanogaster – 18-‐37% in C. elegans – 47-‐68% in S. cerevisiae
• Highly conserved regions (HCE) – 42% of HCEs overlap with exons in vertebrate genomes – >93% for insects, worms, yeasts
• Extreme conserva8ons in 3’ UTRs – Post-‐transrip8onal regula8on?
• HCEs in intron regions – Enriched for RNA secondary structure: encoding func8onal RNAs?
![Page 17: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/17.jpg)
Phylogenetics vs. Population Genetics
• Phylogene8cs – Assumes a single correct species phylogeny that holds across genomes
– Ignores varia8ons among individuals of the same species or assumes a negligible variability within species
– Reduces the en8re popula8on of a species into a single individual
• Popula8on gene8cs – Usually concerned with within-‐species varia8on in genomes – Individuals within a species are related by genealogies
Siepel, A. Genome Res. 19(11):1929-‐41. 2009. Phylogenomics of primates and their ancestral popula8ons.
![Page 18: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/18.jpg)
Population-aware Phylogenetics
• Primate species – Divergence 8me is short rela8ve to ancestral popula8on sizes
– Phylogene8cs assump8ons do not hold – Non-‐negligible popula8on gene8c effects
• Interspecies comparison, taking into account selec8ve forces within species, ancestral popula8ons, modes of specia8on
![Page 19: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/19.jpg)
Phylogeny of Primates
Siepel, A. Genome Res. 19(11):1929-‐41. 2009. Phylogenomics of primates and their ancestral popula8ons.
![Page 20: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/20.jpg)
Darwin’s Phylogeny
![Page 21: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/21.jpg)
Genealogies in Wright-Fisher Model
![Page 22: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/22.jpg)
Population Genetic Interpretation of Speciation
• T: coalescent 8me
• τ: specia8on 8me
![Page 23: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/23.jpg)
Population Genetic Interpretation of Speciation
• τ>>Ne: – Divergence between individual chromosomes as an es8mate of specia8on 8me
– the phylogene8cs assump8on holds
![Page 24: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/24.jpg)
Population Genetic Interpretation of Speciation
• τ<<Ne: – Coalescent 8me dominates
– Equivalent to the coalescent in popula8on gene8cs
![Page 25: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/25.jpg)
Population Genetic Interpretation of Speciation
• τ~Ne: – Both ancestral popula8on dynamics and interspecies divergence must be considered
– Popula8on-‐aware phylogene8cs
![Page 26: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/26.jpg)
Three-Species Phylogeny
• Three species X, Y, and Z with specia8on 8me and coalescent 8me – X: human – Y: chimpanzee – Z: gorilla
• Black phylogeny: discordance with the phylogeny among the three species
• Gray phylogeny: concordant with the phylogeny among the three species
• ILS: incomplete lineage sor8ng with deep coalescent
![Page 27: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/27.jpg)
Three-Species Phylogeny
• When Nxy, Nxyz are small, τxy and τxyz approximate the divergence 8me well
• Otherwise, the coalescent 8me Txy, Txyz need to be taken into account
![Page 28: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/28.jpg)
Ancestral Recombination Graph for Three Individuals
Ancestral Recombina8on Graph
Phylogene8c Ancestral Recombina8on Graph
![Page 29: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/29.jpg)
Coal-HMM (Hobolth et al., 2009)
• Four states corresponding to different phylogenies with ILS
• Transi8ons to other states correspond to recombina8ons
![Page 30: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/30.jpg)
Coal-HMM
• HC1 state (with no ILS) explains only ~50% of sites
• Remaining states explain the other 50% propor8oned roughly equally
![Page 31: Evolutionsssykim/teaching/f11/slides/Lecture11.pdf · 2011. 10. 4. · Correcting for Ascertainment Bias • Likelihood’for’allele’frequencies’aer’condi8oning’on’ ascertainment(i.e.,’unobserved’true’allele](https://reader035.vdocument.in/reader035/viewer/2022071404/60f9161fbcc11921d61b0825/html5/thumbnails/31.jpg)
What if We Ignore Incomplete Lineage Sorting
• Aligned human (Hom), chimpanzee (Pan), gorilla (Gor), orangutan (Pon) sequences
• Two different es8mated lineages • Without considera8on of ILS, subs8tu8on rates are
overes8mated