666.20 -- the lander-green algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 ·...

40
The Lander-Green Algorithm Biostatistics 666 Lecture 22

Upload: others

Post on 14-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

The Lander-Green Algorithm

Biostatistics 666Lecture 22

Page 2: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Last Lecture…Relationship Inferrence

Likelihood of genotype data

Adapt calculation to different relationships• Siblings• Half-Siblings• Unrelated individuals

Importance of modeling error

Page 3: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Today …

The Lander-Green Algorithm

Multipoint analysis in general pedigrees

The basis of modern pedigree analysis packages

Page 4: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Hidden Markov Model

1X 2X 3X MX

2I 3I MI1I

)|( 12 IIP )|( 23 IIP (...)P

)|( 11 IXP )|( 22 IXP )|( 33 IXP )|( MM IXP

IBD states form a Hidden Markov Chain along the chromosome …

Page 5: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Fundamental Calculations

Enumerate possible IBD states

Transition probability for neighboring IBD states

Probability of genotype data given IBD state

Page 6: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Lander-Green Algorithm

More general definition for I, the "IBD vector"Probability of genotypes given “IBD vector”Transition probabilities for the “IBD vectors”

∑ ∏∑ ∏==

−=1 12

11 )|()|()(...I

m

iii

I

m

iii IXPIIPIPL

m

Page 7: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Part I

“IBD Vectors”

Inheritance VectorsDescent Graphs

Gene Flow Pattern

Page 8: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

“IBD Vector” Specifications

Specify IBD between all individuals

Must be compact

Must allow calculation of:• Conditional probabilities for neighboring markers• Probability of observed genotypes

Page 9: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

“IBD Vector”

Specify the outcome of each meiosis• Which of the two parental alleles transmitted?

Implies founder allele carried by each individual

Implies whether a pair of chromosomes is identical-by-descent

Page 10: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

For any pedigree, consider …

What are the meioses?

What are the possible outcomes for the entire set of meioses?

Page 11: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Example …

Page 12: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

What we are doing ...

Listing meioses

Alternating outcomes

The outcomes of all meioses define our “IBD vector” V1 V2 V3 V4

Meiosis 1Meiosis 1

Meiosis 2 Meiosis 2

Page 13: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Example …Descent Graph

A B

C D E F

G H

This is one representation of inheritance in the pedigree. The 8 founder allelesare labeled A-H and we their descent through the pedigree is specified when

we fix the outcome of each meiosis.

Page 14: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

So far …

A set of 2n binary digits specifies IBD in a pedigree with n non-founders

There are 22n such sets …

Next, must calculate the probability of the observed genotypes for each one…

Page 15: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Part II

Probability of Observed Genotypes

Founder Allele GraphFounder Allele Frequencies

Page 16: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Founder Allele Graphs / Sets

Calculated for each marker individually

List of founder alleles compatible with:• Observed genotypes for all individuals• A particular gene flow pattern

Likelihood of each set is a product of allele frequencies

Page 17: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Observed Genotypes

For each familyFor each marker

Some pattern of observed genotypes

? ? ? ?

1 1 1 1

Page 18: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Gene flow pattern

In turn, specify gene flow throughout the pedigree

For each individual, we know precisely what founder allele they carry

Page 19: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Combine the two…

Conditional on gene flow…

Founder allele states are restricted• In this case, there is only one

founder allele set: {1, 1, 1, ?}

Likelihood is a product of allele frequencies• P(allele 1)3 P(any allele)

?

1 1 1 1

1 1 1

Page 20: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Finding founder allele sets

Group founder alleles transmitted to the same genotyped individuals

If a founder allele passes through an homozygote or different heterozygotes • Its state is either fixed or impossible• Fixes state of other alleles in the group

Page 21: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

No. of Possible States for Grouped Founder Alleles

No compatible states

One Possible State• If >1 founder allele passes through a person with a

homozygous genotype or two heterozygous persons

Two Possible States For Each Allele• Observed genotypes are all identical and heterozygous

Every marker allele is possible• Founder alleles that are not passed to genotyped persons

Page 22: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Example:Observed Genotypes

1 2

1 21 2

1 2

1 3 2 4

Page 23: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Example …Descent Graph

A B

C D E F

G H

{1,2}

{1,2} {1,2}

{1,2}

{1,3} {2,4}

Page 24: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Possible founder allele states…

Founder Alleles in Group

Corresponding Allele States Probability

(B) (any allele) 1(A,C,E) (1,2,1) or (2,1,2) P(1)²P(2)+P(2)P(1)²(D,F,G,H) (1,2,3,4) P(1)P(2)P(3)P(4)

Page 25: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Lander-Green inheritance vector

22n elements

Meiotic outcomes specified in index bit

Stores probability of genotypes for each set of meiotic outcomes

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

L L L L L L L L L L L L L L L L

Page 26: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

So far …

Generalized the “IBD vector”

Probability of observed genotypes

Next step: Transition probabilities• HMM to combine information along the genome

Page 27: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Part III

Transition Probabilities

Recombination FractionChanges in IBD Along Chromosome

Page 28: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

With one meiosis

⎥⎦

⎤⎢⎣

⎡−

−=

)1()1(

θθθθ

T

Page 29: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

With two meioses

⎥⎦

⎤⎢⎣

⎡−

−=⊗

TTTT

T)1(

)1(2

θθθθ

Page 30: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

With two meioses

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

−−−−−−−−−

−−−

=⊗

22

22

22

22

2

)1()1()1()1()1()1(

)1()1()1()1()1()1(

θθθθθθθθθθθθθθθθθθ

θθθθθθ

T

Page 31: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

With three meioses

⎥⎦

⎤⎢⎣

−−

=⊗⊗

⊗⊗⊗

22

223

)1()1(

TTTT

Tθθθθ

Page 32: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

With three meioses

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−

=⊗

32222223

23222232

22322322

22233222

22233222

22322322

23222232

32222223

3

)1()1()1()1()1()1()1()1()1()1()1()1()1()1(

)1()1()1()1()1()1()1()1()1()1()1()1()1()1()1()1()1()1()1()1()1(

)1()1()1()1()1()1()1()1()1()1()1()1()1()1(

)1()1()1()1()1()1()1(

θθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθθ

θθθθθθθθθθθθθθ

T

Page 33: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

In general …

Transition matrix is patterned

Transition probability depends on:• No. of meiosis were outcome changed• No. of meiosis were outcome did not change

Product of powers of θ and (1 – θ)

Page 34: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Recursive Formulation

⎥⎦

⎤⎢⎣

−−

=⊗⊗

⊗⊗+⊗

nn

nnn

TTTT

T)1(

)1(1

θθθθ

Page 35: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Lander-Green Markov Model

Transition matrix T⊗2n

vℓ|1..ℓ = v ℓ-1|1..ℓ-1 T⊗2n°vℓ

vℓ|ℓ..m = v ℓ+1|ℓ+1..m T⊗2n°vℓ

vℓ|1..m = (v1..ℓ-1 T⊗2n) ° vℓ ° (vℓ+1..m T⊗2n)

⎥⎦

⎤⎢⎣

⎡−

−=

θθθθ

11

T

Page 36: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

All The Ingredients To …Single Marker

Left Conditional

Right Conditional

Full Likelihood

Page 37: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Appropriate Problems

Large number of markers• Analysis of >5,000 markers possible

Relatively small pedigrees• 20-30 individuals• 2x larger pedigrees for the X chromosome.

Why?

Page 38: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

So far …

Key components for Lander-Green

Extending definition of IBD vectorProbability of genotypes given IBDTransition probabilities

Next: Practical applications!

Page 39: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Lander-Green Algorithm

More general definition for I, the "IBD vector"Probability of genotypes given “IBD vector”Transition probabilities for the “IBD vectors”

∑ ∏∑ ∏==

−=1 12

11 )|()|()(...I

m

iii

I

m

iii IXPIIPIPL

m

Page 40: 666.20 -- The Lander-Green Algorithmcsg.sph.umich.edu/abecasis/class/666.20.pdf · 2006-03-30 · zThe Lander-Green Algorithm zMultipoint analysis in general pedigrees zThe basis

Reading

Historically, two key papers:

• Lander and Green (1987)PNAS 84:2363-7

• Kruglyak, Daly, Reeve-Daly, Lander (1996)Am J Hum Genet 58:1347-63