the university of manchesterstudentnet.cs.manchester.ac.uk/ugt/comp14112/... · • the naïve...

Th

e U

niv

ers

ity

of

Ma

nch

est

er

COMP14112

Lecture 11Lecture 11

Markov Chains, HMMs and Speech

Revision

1

Th

e U

niv

ers

ity

of

Ma

nch

est

er What have we covered in the speech

lectures?

• Extracting features from raw speech data

• Classification and the naive Bayes classifier

• Training

• Sequence data• Sequence data

• Markov models

• Hidden Markov models

2

Th

e U

niv

ers

ity

of

Ma

nch

est

er

1. Features and data

• We have to represent sensory information in a useful way: sound waves and robust sensor data are two examples.

• Good “features” are domain specific, but we often end up with a vector of numbers called a feature vector or up with a vector of numbers called a feature vector or data point

• For speech we use MFCC features derived form segmented data

• Methods for processing the feature vectors are general

• Probabilistic approaches are popular- not the only approach, but certainly a leading one

3

Th

e U

niv

ers

ity

of

Ma

nch

est

er

2. Classification

• Given a data point x, what class does it belong

to?

• You constructed probabilistic classifiers in Labs

2 and 3to distinguish between “yes” and “no”2 and 3to distinguish between “yes” and “no”

• You should know what makes a good classifier

– how would you assess its performance?

• Lots of applications – one of the key AI tools

4

Th

e U

niv

ers

ity

of

Ma

nch

est

er

2.1 Probabilistic classification

• For a data point x …

– Estimate the probability density p(x|Ci)for each class i

– Apply Bayes theorem

( )( ) ( )CpCp x

– Apply classification rule: for two classes, p(C1|x) > 0.5 � Class of x = C1

• Multiple classes?

( )( ) ( )

( ) ( )∑=

i

ii CpCp

CpCpCp

x

x

x11

1

5

Th

e U

niv

ers

ity

of

Ma

nch

est

er

2.2 Naïve Bayes classifier

• The naïve Bayes assumption can be used if

data are vectors

– Feature vector components are conditionally

independent given the classindependent given the class

– See lecture notes and Lab 2 for application to time

averaged MFCC features derived from speech

– Examples sheet 6 for discrete valued data example

( ) ( ) ( ) ( ) ( )idiiii CxpCxpCxpCxpCp L221

=x

6

Th

e U

niv

ers

ity

of

Ma

nch

est

er

2.3 1-D Classification

• You’ve seen some example classification rules

• For 1-D data, a single feature x

7

Th

e U

niv

ers

ity

of

Ma

nch

est

er

2.4 n-D Classification

• For 2-D data with feature vector x = [x1, x2]

8

Th

e U

niv

ers

ity

of

Ma

nch

est

er

3. Training

• When we fit a probability density or probabilistic model to data, we have an example of training

• In the Labs, you’ve seen data being used to estimate parameters of a normal distribution and a HMM

• The data that’s used for this is training data• The data that’s used for this is training data

• Training is fundamental to machine learning, a large and important area of research in CS

• NB the performance of the Lab classifier would have improved with more training data

9

Th

e U

niv

ers

ity

of

Ma

nch

est

er

4. Sequence data

• In some cases the data arrives in a sequence

– We used speech data

• Other examples

– Video – Video

– Sequential games

• Anything real-time

– DNA sequence data

10

Th

e U

niv

ers

ity

of

Ma

nch

est

er

5. Markov chains

• You should know– Definition of a first order Markov process

– Parameters are transition probabilities

( ) ( )1121

,,,−−−

= ttttt sspssssp L

– Parameters are transition probabilities

– Normalisation condition

– Can be represented as a directed graph or a transition matrix

– Can be unfolded in time to show all paths of a fixed length (Examples sheet 7 and past paper)

– How to do a simple probabilistic calculation

11

Th

e U

niv

ers

ity

of

Ma

nch

est

er

5. Markov chains

ENDSTART

hh

b

ay

0.5

0.50.5

0.5

0.25

• What are the missing numbers?

• Unroll the model for exactly three time steps

• What is the probability that the sequence will be “hi”?

• What is the probability that a sequence of length 3 will be “hi”?

b

12

Th

e U

niv

ers

ity

of

Ma

nch

est

er

5. Markov chains

• Naïve application of probabilistic calculations is prohibitively slow in Markov chains

• In the lectures we saw a more efficient method based on recursion (Examples sheet 8)

Don’t need to remember the recursive algorithm • Don’t need to remember the recursive algorithm used there, but should be able to apply it to a similar example

• Computationally efficient algorithms are very important – imagine what happens when a problem is scaled up.

13

Th

e U

niv

ers

ity

of

Ma

nch

est

er

6. Hidden Markov models

• HMMs have two parts

– Markov chain model of states. The parameters of

the Markov chain model are the transition

probabilities: p(st|st-1)probabilities: p(st|st-1)

– Emission probability distribution for feature

vectors: p(xt|st)

– In Lab 3 this is a normal density parameterised by

mean and variance for each component of x

14

Th

e U

niv

ers

ity

of

Ma

nch

est

er


• In Lab 3 you explored three things

– Training: constructing an HMM from labelled data (what is labelled data?)

– Classification: using the Forward algorithm to calculate p(x1,x2,…,xT|Ci) and plugging it into Bayescalculate p(x1,x2,…,xT|Ci) and plugging it into Bayestheorem

– Decoding: using the Vitterbi algorithm to find the most likely path through the hidden states

• You should be able to understand the tasks, but don’t have to recall details of the algorithms

15

Th

e U

niv

ers

ity

of

Ma

nch

est

er


• Simple example of decoding (Lab 3) is

removing the silence from speech signals

• The data without silence is easier to classify

(as in Lab 2)(as in Lab 2)

START STOPsil sil

yes

no

1.0

0.960.96 0.02

0.02

0.01

0.01

0.99

0.99

0.04

16

Th

e U

niv

ers

ity

of

Ma

nch

est

er

7. Applications to speech

• Survey of tasks and performance (Examples

sheet 5)

• Segmentation and MFCC features

• Phonemes and phoneme HMMs• Phonemes and phoneme HMMs

• Triphones

• Decoding speech

• Simple language models

17

Th

e U

niv

ers

ity

of

Ma

nch

est

er

Other applications

• These methods can be generalised to many applications– TrueSkill Ranking system in Xbox live

• http://research.microsoft.com/mlp/trueskill/

– Vision applications• http://videolectures.net/mlss09uk_blake_cv

– Speech– Speech• http://videolectures.net/mlss09uk_bishop_ibi

– Medicine• Probabilistic “graphical models” to update probability of illness given

symptoms

– Biology• Standard way to determine gene function and location of genes in

DNA sequence

18

Th

e U

niv

ers

ity

of

Ma

nch

est

er

How to revise

• Work through Example class sheets and past

paper(s)

• Make sure you understand the relationship

between the labs and the notesbetween the labs and the notes

• Notes, lectures, example sheet solutions and

on the course website

http://intranet.cs.man.ac.uk/csonly/courses/COMP10412/

19

the university of manchesterstudentnet.cs.manchester.ac.uk/ugt/comp14112/... · • the naïve...

Documents