homework 3: naive bayes classification. bayesian networks reading assignment: s. wooldridge,...

87
Homework 3: Naive Bayes Classification

Upload: darrell-lawson

Post on 17-Dec-2015

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Homework 3: Naive Bayes Classification

Page 2: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Bayesian Networks

Reading assignment:

S. Wooldridge, Bayesian Belief Networks

(linked from course webpage)

Page 3: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)
Page 4: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

A patient comes into a doctor’s office with a fever and a bad cough.

Hypothesis space H:

h1: patient has flu

h2: patient does not have flu

Data D:

coughing = true, fever = true, smokes = true

Page 5: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Naive Bayes

flu

cough fever

Cause

Effects

Page 6: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

What if attributes are not independent?

flu

cough fever

Page 7: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

What if more than one possible cause?

flu

cough fever

smokes

Page 8: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

In principle, the full joint distribution can be used to answer any question about probabilities of these combined parameters.

However, size of full joint distribution scales exponentially with number of parameters so is expensive to store and to compute with.

Full joint probability distribution

Fever Fever Fever Fever

flu p1 p2 p3 p4

flu p5 p6 p7 p8

cough cough Sum of all boxesis 1.

fever fever fever fever

flu p9 p10 p11 p12

flu p13 p14 p15 p16

cough cough

smokes

smokes

Page 9: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

For example, what if we had another attribute, “allergies”?

How many probabilities would we need to specify?

Full joint probability distribution

Fever Fever Fever Fever

flu p1 p2 p3 p4

flu p5 p6 p7 p8

cough cough

fever fever fever fever

flu p9 p10 p11 p12

flu p13 p14 p15 p16

cough cough

smokes

smokes

Page 10: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Fever Fever Fever Fever

flu p1 p2 p3 p4

flu p5 p6 p7 p8

cough cough

fever fever fever fever

flu p9 p10 p11 p12

flu p13 p14 p15 p16

cough cough

smokes

smokes

Allergy

Allergy

Fever Fever Fever Fever

flu p17 p18 p19 p20

flu p21 p22 p23 p24

cough cough

fever fever fever fever

flu p25 p26 p27 p28

flu p29 p30 p31 p32

cough cough

smokes

smokes

Allergy

Allergy

Page 11: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Fever Fever Fever Fever

flu p1 p2 p3 p4

flu p5 p6 p7 p8

cough cough

fever fever fever fever

flu p9 p10 p11 p12

flu p13 p14 p15 p16

cough cough

smokes

smokes

Allergy

Allergy

Fever Fever Fever Fever

flu p17 p18 p19 p20

flu p21 p22 p23 p24

cough cough

fever fever fever fever

flu p25 p26 p27 p28

flu p29 p30 p31 p32

cough cough

smokes

smokes

Allergy

Allergy

But can reduce this if we know which variables are conditionally independent

Page 12: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Bayesian networks

• Idea is to represent dependencies (or causal relations) for all the variables so that space and computation-time requirements are minimized.

smokes flu

cough fever

“Graphical Models”

Allergies

Page 13: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)
Page 14: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)
Page 15: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Bayesian Networks = Bayesian Belief Networks = Bayes Nets

Bayesian Network: Alternative representation for complete joint probability distribution

“Useful for making probabilistic inference about models domains characterized by inherent complexity and uncertainty”

Uncertainty can come from:– incomplete knowledge of domain– inherent randomness in behavior in domain

Page 16: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

true 0.01

false 0.99

flu

smoke

cough fever

true false

true 0.9 0.1

false 0.2 0.8

fever

flu

true 0.2

false 0.8

smoke

true false

True True 0.95 0.05

True False 0.8 0.2

False True 0.6 0.4

false false 0.05 0.95

coughsmokeflu

flu

Conditional probabilitytables for each node

Example:

Page 17: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Inference in Bayesian networks

• If network is correct, can calculate full joint probability distribution from network.

where parents(Xi) denotes specific values of parents of Xi.

Page 18: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

flu

cough fever

Naive Bayes Example

Page 19: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Example

• Calculate

Page 20: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Example

• Calculate

Page 21: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

In general...

• If network is correct, can calculate full joint probability distribution from network.

where parents(Xi) denotes specific values of parents of Xi.

But need efficient algorithms to do this (e.g., “belief propagation”, “Markov Chain Monte Carlo”).

))(|(),...,(1

1 i

d

iid XparentsXPXXP

Page 22: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Example from the reading:

Page 23: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

What is the probability that Student A is late?

What is the probability that Student B is late?

Page 24: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

What is the probability that Student A is late?

What is the probability that Student B is late?

Unconditional (“marginal”) probability. We don’t know if there is a train strike.

Page 25: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

What is the probability that Student A is late?

What is the probability that Student B is late?

Unconditional (“marginal”) probability. We don’t know if there is a train strike.

Page 26: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Now, suppose we know that there is a train strike. How does this revise the probability that the students are late?

Page 27: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Now, suppose we know that there is a train strike. How does this revise the probability that the students are late?

Evidence: There is a train strike.

Page 28: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Now, suppose we know that Student A is late.

How does this revise the probability that there is a train strike?

How does this revise the probability that Student B is late?

Notion of “belief propagation”.

Evidence: Student A is late.

Page 29: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Now, suppose we know that Student A is late.

How does this revise the probability that there is a train strike?

How does this revise the probability that Student B is late?

Notion of “belief propagation”.

Evidence: Student A is late.

Page 30: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Now, suppose we know that Student A is late.

How does this revise the probability that there is a train strike?

How does this revise the probability that Student B is late?

Notion of “belief propagation”.

Evidence: Student A is late.

Page 31: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Another example from the reading:

pneumonia smoking

temperature cough

Page 32: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

What is P(cough)?

In-class exercises

Page 33: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Three types of inference

• Diagnostic: Use evidence of an effect to infer probability of a cause. – E.g., Evidence: cough=true. What is P(pneumonia | cough)?

• Causal inference: Use evidence of a cause to infer probability of an effect– E.g., Evidence: pneumonia=true. What is P(cough | pneumonia)?

• Inter-causal inference: “Explain away” potentially competing causes of a shared effect. – E.g., Evidence: smoking=true. What is P(pneumonia | cough and

smoking)?

Page 34: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Diagnostic: Evidence: cough=true. What is P(pneumonia | cough)?

pneumonia smoking

temperature cough

Page 35: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Diagnostic: Evidence: cough=true. What is P(pneumonia | cough)?

pneumonia smoking

temperature cough

Page 36: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Causal: Evidence: pneumonia=true. What is P(cough | pneumonia)?

pneumonia smoking

temperature cough

Page 37: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Causal: Evidence: pneumonia=true. What is P(cough | pneumonia)?

pneumonia smoking

temperature cough

Page 38: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Inter-causal: Evidence: smoking=true. What is P(pneumonia | cough and smoking)?

pneumonia smoking

temperature cough

Page 39: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

“Explaining away”

Page 40: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Math we used:

• Definition of conditional probability:

• Bayes Theorem

• Unconditional (marginal) probability

• Probability inference in Bayesian networks:

Page 41: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Complexity of Bayesian Networks

For n random Boolean variables:

• Full joint probability distribution: 2n entries

• Bayesian network with at most k parents per node:

– Each conditional probability table: at most 2k entries– Entire network: n 2k entries

Page 42: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

What are the advantages of Bayesian networks?

• Intuitive, concise representation of joint probability distribution (i.e., conditional dependencies) of a set of random variables.

• Represents “beliefs and knowledge” about a particular class of situations.

• Efficient (approximate) inference algorithms

• Efficient, effective learning algorithms

Page 43: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Issues in Bayesian Networks

• Building / learning network topology

• Assigning / learning conditional probability tables

• Approximate inference via sampling

Page 44: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Real-World Example: The Lumière Project at Microsoft Research

• Bayesian network approach to answering user queries about Microsoft Office.

• “At the time we initiated our project in Bayesian information retrieval, managers in the Office division were finding that users were having difficulty finding assistance efficiently.”

• “As an example, users working with the Excel spreadsheet might have required assistance with formatting “a graph”. Unfortunately, Excel has no knowledge about the common term, “graph,” and only considered in its keyword indexing the term “chart”.

Page 45: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)
Page 46: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Networks were developed by experts from user modeling studies.

Page 47: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)
Page 48: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Offspring of project was Office Assistant in Office 97, otherwise known as “clippie”.

http://www.youtube.com/watch?v=bt-JXQS0zYc

Page 49: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

The famous “sprinkler” example(J. Pearl, Probabilistic Reasoning in Intelligent Systems,

1988)

Page 50: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Recall rule for inference in Bayesian networks:

Example: What is

Page 51: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

In-Class Exercise

Page 52: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

More Exercises

What is P(Cloudy| Sprinkler)?

What is P(Cloudy | Rain)?

What is P(Cloudy | Wet Grass)?

Page 53: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

More Exercises

What is P(Cloudy| Sprinkler)?

Page 54: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

More Exercises

What is P(Cloudy| Rain)?

Page 55: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

More Exercises

What is P(Cloudy| Wet Grass)?

Page 56: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

General question: What is P(X|e)?

Notation convention: upper-case letters refer to random variables; lower-case letters refer to specific values of those variables

Exact Inference in Bayesian Networks

Page 57: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

More Exercises

1. Suppose you observe it is cloudy and raining. What is the

probability that the grass is wet?

2. Suppose you observe the sprinkler to be on and the grass is

wet. What is the probability that it is raining?

3. Suppose you observe that the grass is wet and it is raining. What is the probability that it is cloudy?

Page 58: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

General question: Given query variable X and observed evidence variable values e, what is P(X | e)?

Page 59: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Example: What is P(C |W, R)?

Page 60: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)
Page 61: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Worst-case complexity is exponential in n (number of nodes)

• Problem is having to enumerate all possibilities for many variables.

s

rswPcsPcrPcP ),|()|()|()(

Page 62: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Can reduce computation by computing terms only once and storing for future use.

E.g., “variable elimination algorithm”. (We won’t cover this.)

Page 63: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• In general, however, exact inference in Bayesian networks is too expensive.

Page 64: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Approximate inference in Bayesian networks

Instead of enumerating all possibilities, sample to estimate probabilities.

X1 X2 X3 Xn

...

Page 65: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Direct Sampling

• Suppose we have no evidence, but we want to determine P(C,S,R,W) for all C,S,R,W.

• Direct sampling: – Sample each variable in topological order,

conditioned on values of parents.

– I.e., always sample from P(Xi | parents(Xi))

Page 66: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

1. Sample from P(Cloudy). Suppose returns true.

2. Sample from P(Sprinkler | Cloudy = true). Suppose returns false.

3. Sample from P(Rain | Cloudy = true). Suppose returns true.

4. Sample from P(WetGrass | Sprinkler = false, Rain = true). Suppose returns true.

Here is the sampled event: [true, false, true, true]

Example

Page 67: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Suppose there are N total samples, and let NS (x1, ..., xn) be the observed frequency of the specific event x1, ..., xn.

• Suppose N samples, n nodes. Complexity O(Nn).

• Problem 1: Need lots of samples to get good probability estimates.

• Problem 2: Many samples are not realistic; low likelihood.

),...,(),...,(

lim 11

nnS

NxxP

N

xxN

),...,(),...,(

11

nnS xxP

N

xxN

Page 68: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Markov Chain Monte Carlo Sampling

• One of most common methods used in real applications.

• Uses idea of Markov blanket of a variable Xi:

– parents, children, children’s other parents

• Fact: By construction of Bayesian network, a node is conditionally independent of its non-descendants, given its parents.

Page 69: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Proposition: A node Xi is conditionally independent of all other nodes in the network, given its Markov blanket.

What is the Markov Blanket of Rain?

What is the Markov blanket of Wet Grass?

Page 70: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Markov Chain Monte Carlo (MCMC) Sampling Algorithm

• Start with random sample from variables, with evidence variables fixed: (x1, ..., xn). This is the current “state” of the algorithm.

• Next state: Randomly sample value for one non-evidence variable Xi , conditioned on current values in “Markov Blanket” of Xi.

Page 71: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Example

• Query: What is P(Rain | Sprinkler = true, WetGrass = true)?

• MCMC: – Random sample, with evidence variables fixed:

[Cloudy, Sprinkler, Rain, WetGrass]

= [true, true, false, true]

– Repeat:

1. Sample Cloudy, given current values of its Markov blanket: Sprinkler = true, Rain = false. Suppose result is false. New state: [false, true, false, true]

2. Sample Rain, given current values of its Markov blanket:

Cloudy = false, Sprinkler = true, WetGrass = true. Suppose

result is true. New state: [false, true, true, true].

Page 72: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Each sample contributes to estimate for query

P(Rain | Sprinkler = true, WetGrass = true)

• Suppose we perform 100 such samples, 20 with Rain = true and 80 with Rain = false.

• Then answer to the query is

Normalize (20,80) = .20,.80

• Claim: “The sampling process settles into a dynamic equilibrium in which the long-run fraction of time spent in each state is exactly proportional to its posterior probability, given the evidence.”

– That is: for all variables Xi, the probability of the value xi of Xi appearing in a sample is equal to P(xi | e).

• Proof of claim: Reference on request

Page 73: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Issues in Bayesian Networks

• Building / learning network topology

• Assigning / learning conditional probability tables

• Approximate inference via sampling

• Incorporating temporal aspects (e.g., evidence changes from one time step to the next).

Page 74: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Learning network topology

• Many different approaches, including:

– Heuristic search, with evaluation based on information theory measures

– Genetic algorithms

– Using “meta” Bayesian networks!

Page 75: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Learning conditional probabilities

• In general, random variables are not binary, but real-valued

• Conditional probability tables conditional probability distributions

• Estimate parameters of these distributions from data

• If data is missing on one or more variables, use “expectation maximization” algorithm

Page 76: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Speech Recognition

• Task: Identify sequence of words uttered by speaker, given acoustic signal.

• Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc.

• Thus speech recognition is viewed as problem of probabilistic inference.

Page 77: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Speech Recognition

• So far, we’ve looked at probabilistic reasoning in static environments.

• Speech: Time sequence of “static environments”. – Let X be the “state variables” (i.e., set of non-evidence

variables) describing the environment (e.g., Words said during time step t)

– Let E be the set of evidence variables (e.g., features of acoustic signal).

Page 78: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

– The E values and X joint probability distribution changes over time.

t1: X1, e1

t2: X2 , e2

etc.

Page 79: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• At each t, we want to compute P(Words | S).

• We know from Bayes rule:

• P(S | Words), for all words, is a previously learned “acoustic model”.

– E.g. For each word, probability distribution over phones, and for each phone, probability distribution over acoustic signals (which can vary in pitch, speed, volume).

• P(Words), for all words, is the “language model”, which specifies prior probability of each utterance.

– E.g. “bigram model”: probability of each word following each other word.

)()|()|( WordsPWordsPWordsP SS

Page 80: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

• Speech recognition typically makes three assumptions:

1. Process underlying change is itself “stationary”

i.e., state transition probabilities don’t change

2. Current state X depends on only a finite history of previous states (“Markov assumption”).– Markov process of order n: Current state depends

only on n previous states.

3. Values et of evidence variables depend only on current state Xt. (“Sensor model”)

Page 81: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

From http://www.cs.berkeley.edu/~russell/slides/

Page 82: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

From http://www.cs.berkeley.edu/~russell/slides/

Page 83: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Hidden Markov Models

• Markov model: Given state Xt, what is probability of transitioning to next state Xt+1 ?

• E.g., word bigram probabilities give

P (wordt+1 | wordt )

• Hidden Markov model: There are observable states (e.g., signal S) and “hidden” states (e.g., Words). HMM represents probabilities of hidden states given observable states.

Page 84: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

From http://www.cs.berkeley.edu/~russell/slides/

Page 85: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

From http://www.cs.berkeley.edu/~russell/slides/

Page 86: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

Example: “I’m firsty, um, can I have something to dwink?”

From http://www.cs.berkeley.edu/~russell/slides/

Page 87: Homework 3: Naive Bayes Classification. Bayesian Networks Reading assignment: S. Wooldridge, Bayesian Belief Networks (linked from course webpage)

From http://www.cs.berkeley.edu/~russell/slides/