hidden markov models (1)

38
Hidden Markov Models (1) Brief review of discrete time finite Markov Chain Hidden Markov Model Examples of HMM in Bioinformatics Estimations Basic Local Alignment Search Tool (BLAST) The strategy Important parameters Example Variations

Upload: jessamine-crawford

Post on 02-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Basic Local Alignment Search Tool (BLAST). The strategy Important parameters Example Variations. Hidden Markov Models (1). Brief review of discrete time finite Markov Chain Hidden Markov Model Examples of HMM in Bioinformatics Estimations. BLAST. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hidden Markov Models  (1)

Hidden Markov Models (1)

Brief review of discrete time finite Markov ChainHidden Markov ModelExamples of HMM in BioinformaticsEstimations

Basic Local Alignment Search Tool (BLAST)The strategy Important parameters Example Variations

Page 2: Hidden Markov Models  (1)

BLAST“Basic Local Alignment Search Tool”

Given a sequence, how to find its ‘relatives’ in a database of millions of sequences?

An exhaustive search using pairwise alignment is impossible.

BLAST is a heuristic method that finds good, but not necessarily optimal, matches. It is fast.

Basis: good alignments should contain “words” that match each other very well (identical or with very high alignment score).

Page 3: Hidden Markov Models  (1)

BLAST

Pandit et al. In Silico Biology 4, 0047

Page 4: Hidden Markov Models  (1)

BLASTThe scoring matrix: example:BLOSUM62

The expected score of two random sequences aligned to each other is negative. High scores of alignments between two random sequences follow the Extreme Value Distribution.

0, jiji Spp

Page 5: Hidden Markov Models  (1)

BLAST

The general idea of the original BLAST:

Find a match of words in a database. This is done efficiently using pre-processed database and hashing technique.

Extend the match to find longer local alignments.

Report results exceeding a predefined threshold.

Page 6: Hidden Markov Models  (1)

http://www.cbi.pku.edu.cn/images/blast_algorithm.gif

Page 7: Hidden Markov Models  (1)

BLAST

Two key parameters:

w: word size

Normally a word size of 3~5 is used for protein. The word size 3 yields 203=8000 possible words. Word size of 11~12 is used for nucleotides.

t: threshold of word matching score

Higher t less word matches lower sensitivity/higher specificity

Page 8: Hidden Markov Models  (1)

BLAST

Page 9: Hidden Markov Models  (1)

BLAST

Threshold for extention

Threshold for reporting significant hit

Page 10: Hidden Markov Models  (1)

BLASTHow significant is a hit ?

S: the raw alignment score.

S’: the bit score. A normalized score. Bit scores from different alignments, or different scoring matrices are comparable.

S’ =(λS-lnK)/ln2K &λ: constants, adjusting for scoring matrix

E: the expected value. number of chance alignments with scores equivalent to or better than S’.

E = mN2-s’

m: query sizeN: size of database

Page 11: Hidden Markov Models  (1)

BLAST

Blasting a sample sequence.

Page 12: Hidden Markov Models  (1)

BLAST

………

Page 13: Hidden Markov Models  (1)

BLAST

Page 14: Hidden Markov Models  (1)

BLASTThe “two-hit” method.

Two sequences with good similarity should share >1 word.

Only extend matches when two hits occur on the same diagonal within certain distance.

It reduces sensitivity. Pair it with lower per-word threshold t.

Page 15: Hidden Markov Models  (1)

BLAST

There are many variants with different specialties. For example

Gapped BLAST

Allows the detection of a gapped alignment.

PSI-BLAST

“Position Specific Iterated BLAST”To detect weak relationships.Constructs Position Specific Score Matrix automatically

……

Page 16: Hidden Markov Models  (1)

Discrete time finite Markov Chain

Possible states: finite discrete set S {E1, E2, … … , Es}

From time t to t+1, make stochastic movement from one state to another

The Markov Property:

At time t, the process is at Ej,Then at time t+1, the probability it is at Ek only depends

on Ej

The temporally homogeneous transition probabilities property:

Prob(Ej → Ek) is independent of time t.

Page 17: Hidden Markov Models  (1)

Discrete time finite Markov Chain

The transition probability matrix:

Pij=Prob(Ei → Ej)

An N step transition: Pij(N)=Prob(Ei → ……. → Ej in N steps)It can be shown that

ssss

s

s

ppp

ppp

ppp

P

...

............

...

...

21

22221

11211

NPNP )(

Page 18: Hidden Markov Models  (1)

Discrete time finite Markov Chain

Consider the two step transition probability from Ei to Ej:

This is the ijth element of

So, the two-step transition matrix

Extending this argument, we have ----------------------------------------------------------------------------• Absorbing states:

pii=1. Once enter this state, stay in this state.We don’t consider this.

k

kjikij ppp )2(

2P

2)2( PP NN PP )(

Page 19: Hidden Markov Models  (1)

Discrete time finite Markov Chain

The stationary state:

The probability of being at each state stays constant.

is the stationary state.

For finite, aperiodic, irreducible Markov chains, exists and is unique.

Periodic: if a state can only be returned to at t0, 2t0, 3t0,…….. , t0>1Irreducible: any state can be eventually reached from any state

itt

ip

ii

kkjki

),1()(

,

s ,,.........,, 321

Page 20: Hidden Markov Models  (1)

Discrete time finite Markov Chain

...)( nn PP

Note:

Markov chain is an elegant model in many situations in bioinformatics, e.g. evolutionary process, sequence analysis etc.

BUT the two Markov assumptions may not hold true.

Page 21: Hidden Markov Models  (1)

Hidden Markov Model

More general than Markov model.

A discrete time Markov Model with extra features.

E1

E2

E3

A Markov Chain

O1

O2

Emission

Initiation

Page 22: Hidden Markov Models  (1)

Hidden Markov Model

The emissions are probablistic, state-dependent and time-independent

Example: E1

E2

A

B

0.1 0.8

0.5

0.5

0.25

0.75

We don’t observe the states of the Markov Chain, but we observe the emissions.

If both E1 and E2 have the same chance to initiate the chain, and we observe sequence “BBB”, what is the most likely state sequence that the chain went through?

Page 23: Hidden Markov Models  (1)

Hidden Markov Model

Let Q denote the state sequence, and O denote the observed emission. Our goal is to find:

Q

Q

QPQOP

QPQOPOQP

OQP

)()|(

)()|()|(

)|(maxarg

O=“BBB”,

E1->E1->E1: 0.5*0.9*0.9*0.5*0.5*0.5= 0.050625E1->E1->E2: 0.5*0.9*0.1*0.5*0.5*0.75= 0.0084375E1->E2->E1: 0.5*0.1*0.8*0.5*0.75*0.5= 0.0075E1->E2->E2: 0.5*0.1*0.2*0.5*0.75*0.75= 0.0028125E2->E1->E1: 0.5*0.8*0.9*0.75*0.5*0.5= 0.0675E2->E1->E2: 0.5*0.8*0.1*0.75*0.5*0.75= 0.01125E2->E2->E1: 0.5*0.2*0.8*0.75*0.75*0.5= 0.0225E2->E2->E2: 0.5*0.2*0.2*0.75*0.75*0.75= 0.0084375

)()|( QPQOP

Page 24: Hidden Markov Models  (1)

Hidden Markov Model

},,{ BP Parameters:

initial distributiontransition probabilitiesemission probabilities

Common questions:

How to efficiently calculate emissions:

How to find the most likely hidden state:

How to find the most likely parameters:

)|( OP

)|(maxarg OQPQ

)|(maxarg

OP

Page 25: Hidden Markov Models  (1)

Hidden Markov Models in Bioinformatics

A generative model for multiple sequence alignment.

Page 26: Hidden Markov Models  (1)

Hidden Markov Models in Bioinformatics

Data segmentation in array-based copy number analysis.

R package aCGH in Bioconductor.

Page 27: Hidden Markov Models  (1)

Hidden Markov Models in Bioinformatics

Amplified

Normal

Deleted

An over-simplified model.

Page 28: Hidden Markov Models  (1)

The Forward and Backward Algorithm

Q

QPQOPOP )|(),|()|(

If there are a total of S states, and we study an emision from T steps of the chain, then #(all possible Q)=ST

When either S or T is big, direct calculation is impossible.

Let

Let denote the hidden state at time t,

Let bi denote the emission probabilities from state Ei

Consider the “forward variables”:

)O , ,O ,(OO t21(t)

itt EqOPit )()( ,,

)(tq

Emissions up to step t chain at state i at step t

Page 29: Hidden Markov Models  (1)

The Forward and Backward Algorithm

At step 1:

At step t+1:

… … …

By induction, we know all

Then P(O) can be found by

A total of 2TS2 computations are needed.

),( iT

S

i

iT1

),(

)(),1( 1Obi ii

S

ktiki Obpktit

11)(),(),1(

Page 30: Hidden Markov Models  (1)

The Forward and Backward Algorithm

Back to our simple example, find P(“BBB”)

Now. What saved us the computing time? It is the reusing the shared components.And what allowed us to do that?.

75.0*2.0*)2,2(75.0*1.0*)1,2()2,3(

5.0*8.0*)2,2(5.0*9.0*)1,2()1,3(

75.0*2.0*)2,1(75.0*1.0*)1,1()2,2(

5.0*8.0*)2,1(5.0*9.0*)1,1()1,2(

75.0*5.0)2,1(

5.0*5.0)1,1(

The short memory of the Markov Chain.

Page 31: Hidden Markov Models  (1)

The Forward and Backward Algorithm

The backward algorithm:

From T-1 to 1, we can iteratively calculate:

S

ktkik

itTtt

ktObpit

jjT

EqOOOPit

1

21

),()(),1(

,1),(

|,......,,),(

Emissions after step t chain at state i at step t

Page 32: Hidden Markov Models  (1)

The Forward and Backward Algorithm

Back to our simple example, observing “BBB”, we have

......

1*75.0*1.01*5.0*9.0

)2,3()"(")1,3()"("

)1|""()1,2(

1)2,3(

1)1,3(

212111

23

BbpBbp

EqBOP

Page 33: Hidden Markov Models  (1)

Posterior state probabilities

),(),(

)|,...,(),,...,,(

),,...,,|,...,(),,...,,(

),(

)(1

)(21

)(211

)(21

)(

itit

EiqOOPEiqOOOP

EiqOOOOOPEiqOOOP

EiqOP

tTt

tt

ttTt

tt

t

S

i

iTOP1

),()(

S

i

tt

iT

itit

OP

EiqOPOEiqP

1

)()(

),(

),(),(

)(

),()|(

Page 34: Hidden Markov Models  (1)

The Viterbi Algorithm

To find:

So, to maximize the conditional probability, we can simply maximize the joint probability.

Define:

)|(maxarg OQPQ

),(maxarg)(

),(maxarg)|(maxarg

)(

),(max)|(max

OQPOP

OQPOQP

OP

OQPOQP

QQQ

QQ

(t)tt

qqqt OEiqqqqPi

t

and ,,,......,,max)( 121,...,, 121

Page 35: Hidden Markov Models  (1)

The Viterbi Algorithm

Then our goal becomes:

So, this problem is solved by dynamic programming.

)(max),(max iOQP TiQ

)()(max)(:

)1()(:

1

1

OtbpkiInduction

ObiInitiation

ikitk

t

ii

(t)tt

qqqt OEiqqqqPi

t

and ,,,......,,max)( 121,...,, 121

Page 36: Hidden Markov Models  (1)

The Viterbi Algorithm

Time: 1 2 3

States: E1 0.5*0.5 0.15 0.0675

E2 0.5*0.75 0.05625 0.01125

The calculation is from left border to right border, while in the sequence alignment, it is from upper-left corner to the right and lower borders.

Again, why is this efficient calculation possible?The short memory of the Markov Chain!

Page 37: Hidden Markov Models  (1)

The Estimation of parameters

When the topology of an HMM is known, how do we find the initial distribution, the transition probabilities and the emission probabilities, from a set of emitted sequences?

Difficulties:

Usually the number of parameters is not small. Even the simple chain in our example has 5 parameters.

Normally the landscape of the likelihood function is bumpy. Finding the global optimum is hard.

}{ ),|(maxarg )(dOOOP

Page 38: Hidden Markov Models  (1)

The Estimation of parameters

The general idea of the Baun-Welch method:

(1) Start with some initial values (2) Re-estimate the parameters

(1) The previously mentioned forward- and backward-probabilities will be used.

(2) It is guaranteed that the likelihood of the data improves with the re-estimation

(3) Repeat (2) until a local optimum is reached

We will discuss the details next time.

To be continued