1 haplotyping algorithms qunyuan zhang division of statistical genomics gems course m21-621...

42
1 Haplotyping Haplotyping Algorithms Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012 https://dsgweb.wustl.edu/qunyuan/presentations/ Haplotyping_GEMS_2012.ppt

Upload: rebecca-hodge

Post on 05-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

1

Haplotyping AlgorithmsHaplotyping Algorithms

Qunyuan Zhang

Division of Statistical Genomics

GEMS Course M21-621

Computational Statistical Genetics

Mar. 29, 2012

https://dsgweb.wustl.edu/qunyuan/presentations/Haplotyping_GEMS_2012.ppt

Page 2: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

2

Questions

WHAT is haplotype?

WHY study haplotype?

WHY use algorithms for haplotyping?

HOW ? (Data, Hypotheses, Algorithms)

Page 3: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

3

WHAT is Haplotype?

A haplotype (Greek haploos = simple) is a combination of alleles at multiple linked loci that are transmitted together. Haplotype may refer to as few as two loci or to an entire chromosome depending on the number of recombination events that have occurred between a given set of loci. The term haplotype is a portmanteau of "haploid genotype.“

In a second meaning, haplotype is a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated. It is thought that these associations, and the identification of a few alleles of a haplotype block, can unambiguously identify all other polymorphic sites in its region. Such information is very valuable for investigating the genetics behind common diseases, and is collected by the International HapMap Project.

From http://en.wikipedia.org/wiki/Haplotype

Page 4: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

4

Haplotype = Genotype of Haploid

Haplotypes: Ab//aBGenotype: Aa Bb

Haplotype

C G

Haplotype

T A

GenotypeCT GA

Haplotypes: AB//abGenotype: Aa Bb

Page 5: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

5

WHY Study Haplotype?

An efficient way of presentation of genetic variation/polymorphism, useful in genomics, population genetics, and genetic epidemiology

Population evolution

LD analysis

Missing genotype imputation

IBD estimation

Tag marker (SNP) selection

Multi-locus linkage & association

Page 6: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

6

WHY use algorithm in haplotyping?

Most of current molecular genotyping techniques mix DNA pieces from two complementary chromosomes and only provide genotypes of diploid (mixture of haplotypes)

genotype(AaBb) haplotype (Ab//aB or AB//ab)

Some molecular techniques can directly measure haplotypes, but expensive (money, labor, time ….), especially for genome-wide study.

So, at least now, we need algorithms …

?

Page 7: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

7

Ambiguity of Haplotype

Haplotypic ambiguity/uncertainty happens while ≥2 makers/loci are heterozygous and their genetic phase is unknown

Genotype Haplotypes

AA BB AB//AB

Aa bb Ab//ab

Aa Bb Ab//aB or AB//ab

Aa Bb Cc ABC//abc, ABc//abC, Abc//aBC or aBC//Abc

Page 8: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

8

Rule-based Approaches(Parsimony & Phylogeny)

Search an optimal set of haplotypes that satisfies some specific rules

Page 9: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

9

Parsimony Approaches

1.List all unambiguous haplotypes

2.Resolve ambiguous individuals one by one using listed haplotypes

3. If only half-resolved, add new haplotype to the list

4. Continue 2 & 3

5. Until on one can be solved

ABC, abc, abC Abc

AaBbCC => ABC//abC

AABbCc => ABC//Abc

Continue …

Until on one can be resolved

Clark, 1990, Mol. Biol. Evol., 7(2): 111-122

Parsimony rules: Maximum-resolution of genotypes

and/or Minimum set of haplotypes

Clark’s Algorithm

Page 10: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

10

Phylogeny Approaches

D. Gusfield. 2002. Proc. of the 6th Annual Inter. Conf. on Res. In Comput. Mol. Biology, p166–175.

Given a set of genotypes, find a set of explaining haplotypes, which defines a perfect phylogeny. Perfect Phylogeny Haplotype (PPH) rule: coalescent rule (no recombination, infinite-site mutation, but only once for one site)

Page 11: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

11

Probability-based Approaches(EM & MCMC)

Calculate probability of haplotype, conditional on genotypes. Pr(H|G)=?

Page 12: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

12

Gene/haplotype frequencies HWE, LD

Data Structure for Haplotyping

Haplotypes

LinkageS

ubje

cts(

1,2,

3…)

Loci (A,B,C…)

G1,A G1,B G1,C …

G2,A G2,B G2,C …

G3,A G3,B G4,C …

… … … …

A CB

Genetic RelationshipGenoty

pes

Page 13: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

13

HWE & LD

Hardy-Weinberg Equilibrium (HWE)Hardy-Weinberg Disequilibrium (HWD)

HWE: random combination of alleles from the same locus Under HWE, allele freq. determines genotype freq. HWE => Pr(AA)=Pr(A)*Pr(A), Pr(aa)=Pr(a)*Pr(a), Pr(Aa)=2*Pr(A)*Pr(a)

Linkage Equilibrium (LE)Linkage Disequilibrium (LD) LE: random combination of alleles from different loci LD: association between alleles from different loci Under LE, allele freq. determines haplotype freq. LE => Pr(ABC)=Pr(A)*Pr(B)*Pr(C)

Page 14: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

14

Genetic Relationship (R) & Linkage (r)

AaBb

AABB

AaBb

AB//ab or aB//Ab

AB//ab

(if r=0) AB//ab

(if r>0) AB//ab, Ab//aB

Recombination rate (r)

r =0, complete Linkage

0< r <0.5, incomplete Linkage

r =0.5, no Linkage

AaBb

AaBb

AABB aabb

Page 15: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

15

Haplotyping & Conditional Probability

AaBB: Pr(AB//aB)=1

AAbB: Pr(AB//Ab)=1

AaBb: Pr(AB//ab)=0.5, Pr(Ab//aB)=0.5

AABB, aabb, AABB, aabb, AABB, AABb, aabb

AaBB, aabb, AABB, AABB, AABB, AABB, aabb

aabb, AABB, AABB, AABB, AaBb, AABB,aabb

aabb, AABB, AABB, aabb, AABB, aabb, AABB …

Pr(AB//ab)=Pr(Ab//aB)=0.5 ?HWE or HWD?

LD or LE?

P(H|G, R, r)=?

P(H|G)=?

Page 16: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

16

EM Algorithm

for unrelated individuals

Pr(H|G,F)=?

Excoffier et al., 1995, Mol. Biol. Evol., 12(5): 921-927

Hawley et al., 1995, J Hered., 86:409-411 (software: HAPLO)

Pr(AB)=0.25, Pr(Ab)=0.25

Pr(aB)=0.25, Pr(ab)=0.25

ORPr(AB)=0.01, Pr(Ab)=0.49

Pr(aB)=0.49, Pr(ab)=0.01

AaBbPr(AB//ab)=?

Pr(Ab//aB)=?

Page 17: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

17

Likelihood: L(G|F)

)()|(

constraint1

)//(0

)//(1

)|Pr(

)|Pr()|(

),,,,,(

),,,,,(

),,,,,(

1 1 1

1

1 1

1

21

21

21

g

kba

h

a

h

b

kab

h

ii

kba

kbakab

ba

h

a

h

b

kabk

g

kk

gk

hi

hi

ffcFGL

f

GHH

GHHc

ffcFG

FGFGL

GGGGG

ffffF

HHHHH

Haplotypes

Joint Likelihood of G given F

Genotypes

Haplotype Frequencies

Prbability of the k-th individual’s G given F & HWE

Haplotype-Genotype compatibility index of the k-th individual

F=? => Max. L(G|F)

Page 18: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

18

EM AlgorithmMaximum Likelihood

Estimation of Haplotype Freq.

Lagrange multiplier

0

0

))(()(),(

?)},(max{

)(

Qx

Q

cxgxqxQ

xxq

cxg

g

k tb

ta

h

a

h

b

kab

tb

ta

h

a

h

b

kab

iab

ti

g

kba

h

a

h

b

kab

ba

h

a

h

b

kab

iab

i

i

h

ii

g

kba

h

a

h

b

kab

g

kba

h

a

h

b

kab

h

ii

ffc

ffcz

gf

ffc

ffcz

gf

Q

fQ

fffcFQ

ffcFGLFq

fFg

1 )()(

1 1

)()(

1 1)1(

1

1 1

1 1

11 1 1

1 1 1

1

2

1

2

1

0

0

)1()log(),(

)log())|(log()(

01)(

...),|Pr(...),|Pr(),|Pr( )1()(,

)()1(,

)1()0(,

)0( ttba

tbaba FFGHFFGHFFGHF

Prior Expectation Maximization E … M E M …

EM Recursion

Partial

Derivative

Equations

z=1 if i in (a,b), or z=0 c=1 if (a,b)=>G, or c=0

Page 19: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

19

Posterior Probability of Haplotype

0588.0),|Pr(

9412.0005.008.0

08.0

1.0*1.0*5.04.0*4.0*5.0

4.0*4.0*5.0

**)|Pr(**)|Pr(

**)|Pr(),|Pr(

4.0,1.0,1.0,4.0:

5.0)|//Pr()|Pr(

5.0)|//Pr()|Pr(

,,,:

:

*)Pr(*)Pr()Pr(

)Pr(*)|Pr(

)Pr(*)|Pr(),|Pr(

3,2

323,2414,1

414,14,1

4321

3,2

4,1

4321

),(,

,,

FGH

ffGHffGH

ffGHFGH

ffffF

DdEedEDeGH

DdEedeDEGH

deHdEHDeHDEHH

DdEeG

Example

ffHHF

FGH

FGHFGH

k

kk

kk

k

k

k

baba

bakba

kbakba

Prior Prob.

Posterior Prob.

Page 20: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

20

Limitation of EM Algorithm

For diploid(2n) organism, a genotype of L heterozygous markers may have 2L possible haplotypes, EM is unpractical for large L

Only suitable for small number of loci, 2~12

While L=20, 2L=1,048,576 …Large space of F

Subseting approaches (partition-ligation & block partitioning etc.) have been used to reduce computational burden …

Page 21: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

21

MCMC

Markov Chain Monte Carlo Algorithmfor unrelated individuals

by sampling from Pr(H|G,F)

Stephens et al., 2001, Am. J. Hum. Genet., 68:978-989 (software: PHASE)

Page 22: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

22

Markov Chain

)()()()(

)()()()()(

)1()1()1()1()2(

)1()1()1()1()1(

)0()0()0()1()1(

)0()0()0()0()1(

)0()0()0()0()0(

......

....

......

......

......

),|Pr(

......

),|Pr(

......

......

),|Pr(

......

),|Pr(

......

21

121

121

11

121

121

22

121

11

121

NtG

NtG

NtG

NtG

tG

tG

tG

tG

tG

GGGGG

GG

GGGGG

GG

GGGGG

GG

GGGGG

GG

GGGGG

gk

gkk

gkk

gkk

gg

gkk

gkk

gkk

HHHH

HHHHH

HHHHH

HGH

HHHHH

HGH

HHHHH

HGH

HHHHH

HGH

HHHHH

MCMC Estimation

Random sampling based on Pr(H|G,H_)

Repeat many times

After getting close to stationary distribution of P(H|G)

Collect samples

Average over samples

Page 23: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

23

Transition Probability ),|Pr(kk GG HGH

))/(2()/(2.

.

),...,,(

)/(

)/()/()/(

:),(

0

),...,(

),...,(

22

''

21

2

)(,

2,1

2,1

)(,

MpMprobwithphasechoserandomlyHHFor

ppprobwithhaplotypeconstructHHFor

ppppgetFinally

MnpthenHHif

MMnMnpthenHHif

checkandHHGthenHGif

pthenHGif

HfromHremove

Gpick

nnnncount

HHHHlist

GallforlociLofHgiven

L

ii

Li

iiii

m

iij

jiij

jikik

iik

Gtba

k

m

m

ktba

k

Add the newly constructed haplotype to list H, pick Gk+1 …kGt

baH )1(,

Coalescent hypothesis, Mutation rate, M haplotypes

subseting loci, reducing time

Page 24: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

24

EM vs. MCMC

EM MCMCSearch F, Max. L(G|F)

Haplo. freq. => Haplo. construction

Maximum likelihood approach

“Analytical” posterior distribution

Less loci

Convergence: Local Maximum

Sample from Pr(H|G,F)

Haplo. construction => Haplo. freq.

Sampling approach

“Empirical” posterior distribution

More loci

Better convergence: whole parameter space (more computer time)

Page 25: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

25

EM Algorithmfor family data

(no recombination, r=0)

Pr(H{fam.}|G,R,F)=?

Rohde et al., 2001, Human Mutation, 17: 289-295 (software: HAPLO)Becher et al., 2004, Genetic Epidemiology, 27:21-32 (software: FAMHAP)O’Connell, 2000, Genetic Epidemiology, 19(Suppl 1):S64-S70 (software: ZAPLO)

Page 26: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

26

Haplotype Configuration of Family

AaBb AaBb

AaBb

AB//ab AB//ab

AB//ab

Ab//aB Ab//aB

Ab//aB

AB//ab AB//ab

Ab//aB

Genotypes

Possible Haplotype Configurations

recombinant, as r=0 or nearly =0, impossible or very low prob. , ignored

Page 27: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

27

EM AlgorithmHaplotype Freq. Estimation using Nuclear Families

.

1 1

2211

2 2

2211

1 1

2211

2 2

22112211

1.

1 1

)()()()(

1 1

.

1 1

)()()()(

1 1

.

.

)1(

1 )()(

1 1

)()(

1 1)1(

4

1

.

2

1

..

famN

famh

a

h

b

t

b

t

a

t

b

t

a

h

a

h

b

fam

baba

h

a

h

b

t

b

t

a

t

b

t

a

h

a

h

b

fam

baba

i

baba

fam

ti

g

k tb

ta

h

a

h

b

kab

tb

ta

h

a

h

b

kab

iab

ti

ffffc

ffffcz

Nf

FamiliesNuclear

ffc

ffcz

gf

IndvUnrelatedTips:

Only use parents to calculate haplotype freq. (f)

Use parents+children ’s info to determine compatibility (c)

Page 28: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

28

EM AlgorithmHaplotype Freq. Estimation for General Pedigrees

.

2211

22112211

2211

221122112211

.1.

,,...,,,

,,...,,,,

)()()()()()(.

...

,,...,,,

,,...,,,,

)()()()()()(.

......

1.

'.

)1(

...

...1 fam

nn

nnnn

nn

nnnnnn

fam

N

famhhhhhh

bababa

t

b

t

a

t

b

t

a

t

b

t

a

fam

bababa

hhhhhh

bababa

t

b

t

a

t

b

t

a

t

b

t

a

fam

bababa

i

bababa

N

famfam

ti

ffffffc

ffffffcz

n

f

Tips:

Only use founders to calculate haplotype freq. (f)

Use all members (founders & non- founders) to determine compatibility (c)

Discard the cases with too small probabilities to save time

Page 29: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

29

Posterior Probability of Haplotype Configuration

22112211 ***)Pr(*)Pr(*)Pr(*)Pr()Pr(

)Pr(*)|Pr(

)Pr(*)|Pr(),|Pr(

*)Pr(*)Pr()Pr(

)Pr(*)|Pr(

)Pr(*)|Pr(),|Pr(

.).(,

,,

11

.).(,

,,

babababaparents

configsallparents

famk

famba

parentsfam

kfam

baparents

famk

famba

N

jba

N

jbafounders

configsallfounders

famk

famba

foundersfam

kfam

bafounders

famk

famba

ffffHHHHF

FGH

FGHFGH

FamilyNuclear

ffHHF

FGH

FGHFGH

FamilyGeneral

founders

jj

founders

jj

Dad Mom

Page 30: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

30

A Middle Summary …Subject-oriented Algorithms

Large/General Pedigree & Allowing Recombination (r>0) ?

A CB

X

X

X

Joint Prob. / Likelihood

indiv. by indiv.unrelated

family by familyr=0

Page 31: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

31

Next … Locus-oriented Algorithm (Lander-Green)

A CB

X X X Joint Prob./

Likelihood

Locus by Locus

A Pedigree

For Large/General Pedigree Data & Allowing Recombination (r>0)

A CB

Page 32: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

32

Inheritance Vector (V) of a pedigree

Lander & Green, 1987, Proc. Natl. Acad. Sci., 84: 2363-2367Kruglyak et al., 1996, Am. J. Hum. Genet., 58:1347-1363 (software: GENEHUNTER)Abecasis et al., 2005, Am. J. Hum. Genet., 77:754-767(software: MERLIN)

Sobel et al., 1996, Am. J. Hum. Genet., 58:1323-1337 (software: SIMWALK2)

Prob.

A

Page 33: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

33

Inheritance Vector & Haplotype

5: AaBb

1101 AB//ab 1101

1101 Ab//aB 1111

Page 34: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

34

Lander-Green Algorithm

A CB

VA VB VC

Pr(VB|VA) Pr(VC|VB)

…Pr(Vt+1|Vt)

GA

Pr(GA |VA)

GB

Pr(GB |VB)

GC

Pr(GC |VC)

Loci A,B,C,…

One pedigree

Hidden status (inheritance vectors)

Transition Prob.=f(r)

Emission Prob.

Observations (genotypes)

Page 35: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

35

Lander-Green Algorithm Based (or Similar) Approaches

Kruglyak et al., 1996, Am. J. Hum. Genet., 58:1347-1363 (software: GENEHUNTER)Viterbi algorithm, the best haplotype configuration

Sobel et al., 1996, Am. J. Hum. Genet., 58:1323-1337 (software: SIMWALK2)MCMC: Annealing & Metropolis Process

Abecasis et al., 2005, Am. J. Hum. Genet., 77:754-767(software: MERLIN)Allowing LD & Marker Cluster/Block

Page 36: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

Haplotyping

based on sequencing data

(can be done for individual subject with no population data)

36

Page 37: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

Rationale

37Bansal et al. Genome Res. 2008 August; 18(8): 1336–1346.

Page 38: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

Data Structure

38Bansal et al. Genome Res. 2008 August; 18(8): 1336–1346.

Page 39: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

Algorithms

39Bansal et al. Genome Res. 2008 August; 18(8): 1336–1346.

ML

Or MCMC when H space is huge

Page 40: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

Prob(sequence/haplotype)

40Bansal et al. Genome Res. 2008 August; 18(8): 1336–1346.

haplotype

=1 if observed sequence X matches assumed haplotype=0 otherwise(for the j-th variant site of i-th fragment )

Sequencing/mapping error

observed sequence

Page 41: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

Markov Chain

41Bansal et al. Genome Res. 2008 August; 18(8): 1336–1346.

Sampling H from .

Page 42: 1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Mar. 29, 2012

42

Practices(1) If a child’s genotype of 4 loci is AaBbCcDD, list all possible haplotype pairs of the child, calculate the probability of each pair, given no any extra information.

(2) If you know his/her father’s genotype is also AaBbCcDD and mother is AaBbCCDD, list all possible haplotype configurations of his/her family, calculate the probability of each configuration. (Assume recombination rate r=0)

(3) If you know the haplotype frequencies below in population: ABCD(0.2),ABcD(0.1),AbcD(0.1)aBCD(0.1),aBcD(0.2),abcD(0.3)calculate the posterior probabilities in (1) .

Within a week, send your answers to (E-mail: [email protected])