tdt4171 artificial intelligence methods · pdf file tdt4171 artificial intelligence methods...

Click here to load reader

Post on 28-Jun-2020

1 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • TDT4171 Artificial Intelligence Methods Lecture 3 & 4 – Probabilistic Reasoning Over Time

    Norwegian University of Science and Technology

    Helge Langseth IT-VEST 310

    [email protected]

    1 TDT4171 Artificial Intelligence Methods

  • Outline

    1 Leftovers from last time Inference

    2 Probabilistic Reasoning over Time Set-up Basic speech recognition Inference: Filtering, prediction, smoothing Inference for Hidden Markov models Kalman Filters Dynamic Bayesian networks Summary

    3 Speech recognition Speech as probabilistic inference Speech sounds Word sequences

    2 TDT4171 Artificial Intelligence Methods

  • Leftovers from last time

    Summary from last time

    Bayes nets provide a natural representation for (causally induced) conditional independence

    Topology + CPTs = compact representation of joint distribution

    Generally easy to construct – also for non-experts

    Canonical distributions (e.g., noisy-OR) = compact representation of CPTs

    Announcements

    The first assignment due next Friday

    Deliver it using It’s Learning

    There will be no lecture next week!

    3 TDT4171 Artificial Intelligence Methods

  • Leftovers from last time Inference

    Inference tasks

    Simple queries: compute posterior marginal P(Xi|E= e), e.g., P (NoGas|Gauge= empty, Lights= on, Starts= false)

    Conjunctive queries: P(Xi,Xj |E= e) = P(Xi|E= e)P(Xj |Xi,E= e)

    Optimal decisions: decision networks include utility information; probabilistic inference required for P (outcome|action, evidence)

    Value of information: which evidence to seek next?

    Sensitivity analysis: which probability values are most critical?

    Explanation: why do I need a new starter motor?

    4 TDT4171 Artificial Intelligence Methods

  • Leftovers from last time Inference

    Inference tasks – Inference by enumeration

    Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation.

    Simple query on the burglary network:

    P(B|j,m) = P(B, j,m)/P (j,m) = αP(B, j,m)

    = α Σe Σa P(B, e, a, j,m)

    B E

    J

    A

    M Rewrite full joint entries using product of CPT entries:

    P(B|j,m) = ΣeΣaP(B)P (e)P(a|B, e)P (j|a)P (m|a)

    Recursive depth-first enumeration: O(n) space, O(n · dn) time

    5 TDT4171 Artificial Intelligence Methods

  • Leftovers from last time Inference

    Inference tasks – Inference by enumeration

    Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation.

    Simple query on the burglary network:

    P(B|j,m) = P(B, j,m)/P (j,m) = αP(B, j,m)

    = α Σe Σa P(B, e, a, j,m)

    B E

    J

    A

    M Rewrite full joint entries using product of CPT entries:

    P(B|j,m) = ΣeΣaP(B)P (e)P(a|B, e)P (j|a)P (m|a) = αP(B) ΣeP (e)Σa P(a|B, e)P (j|a)P (m|a)

    Recursive depth-first enumeration: O(n) space, O(dn) time

    5 TDT4171 Artificial Intelligence Methods

  • Leftovers from last time Inference

    Enumeration algorithm

    function Enumeration-Ask(X,e,bn) returns distr. over X inputs: X, the query variable

    e, observed values for variables E bn, a Bayesian network with variables {X } ∪ E ∪ Y

    Q(X )← a distribution over X, initially empty for each value xi of X do

    extend e with value xi for X Q(xi)←Enum-All(Vars[bn],e)

    return Normalize(Q(X ))

    function Enum-All(vars,e) returns a real number if Empty?(vars) then return 1.0 Y←First(vars) if Y has value y in e

    then return P (y|Pa(Y )) ×Enum-All(Rest(vars),e) else return

    y P (y|Pa(Y ))×Enum-All(Rest(vars),ey) where ey is e extended with Y = y

    6 TDT4171 Artificial Intelligence Methods

  • Leftovers from last time Inference

    Evaluation tree

    P(j|a) .90

    P(m|a) .70 .01

    P(m| a)

    .05 P(j| a) P(j|a)

    .90

    P(m|a) .70 .01

    P(m| a)

    .05 P(j| a)

    P(b) .001

    P(e) .002

    P( e) .998

    P(a|b,e) .95 .06

    P( a|b, e) .05 P( a|b,e)

    .94 P(a|b, e)

    Enumeration is inefficient, as we have repeated computation of e.g., P (j|a)P (m|a) for each value of e. ⇒ Nice to know that better methods are available. . .

    7 TDT4171 Artificial Intelligence Methods

  • Leftovers from last time Inference

    Summary of Chapter 14

    Bayes nets provide a natural representation for (causally induced) conditional independence

    Topology + CPTs = compact representation of joint

    Generally easy to construct – also for non-experts

    Canonical distributions (e.g., noisy-OR) = compact representation of CPTs

    Efficient inference calculations are available (but the good ones are outside the scope of this course)

    What you should know:

    How to build models (and verify them using Conditional Independence and Causality)

    What drives the . . .

    model building burden complexity of inference

    8 TDT4171 Artificial Intelligence Methods

  • Probabilistic Reasoning over Time Set-up

    Time and uncertainty

    Motivation: The world changes; we need to track and predict it Static (Vehicle diagnosis) vs. Dynamic (Diabetes management)

    Basic idea: copy state and evidence variables for each time step

    Raint = Does it rain at time t

    This assumes discrete time; step size depends on problem Here: A timestep is one day, I guess (?)

    9 TDT4171 Artificial Intelligence Methods

  • Probabilistic Reasoning over Time Set-up

    Markov processes (Markov chains)

    If we want to construct a Bayes net from these variables, then what are the parents?

    Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:

    P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)

    10 TDT4171 Artificial Intelligence Methods

  • Probabilistic Reasoning over Time Set-up

    Markov processes (Markov chains)

    If we want to construct a Bayes net from these variables, then what are the parents?

    Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:

    P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)

    First-order Markov process:

    P(Raint+1|Rain0, . . . , Raint) = P(Raint+1|Raint) “Future is cond. independent of Past given Present”

    10 TDT4171 Artificial Intelligence Methods

  • Probabilistic Reasoning over Time Set-up

    Markov processes (Markov chains)

    If we want to construct a Bayes net from these variables, then what are the parents?

    Assume we have observations of Rain0, Rain1, . . . , Raint and want to predict whether or not it rains at day t + 1: P(Raint+1|Rain0, Rain1, . . . , Raint) Try to build a BN over Rain0, Rain1, . . . , Raint+1:

    P(Raint+1) 6= P(Raint+1|Raint); base on Raint. P(Raint+1|Raint) ≈ P(Raint+1|Raint, Raint−1) (Do you agree?)

    k’th-order Markov process: P(Raint+1|Rain0, . . . , Raint) = P(Raint+1|Raint−k+1, . . . , Raint)

    10 TDT4171 Artificial Intelligence Methods

  • Probabilistic Reasoning over Time Set-up

    Markov processes as Bayesian networks

    If we want to construct a Bayes net from these variables, then what are the parents?

    Markov assumption: Xt depends on bounded subset of X0:t−1

    First-order Markov process: P(Xt|X0:t−1) = P(Xt | Xt−1) Second-order Markov process:

    P(Xt|X0:t−1) = P(Xt|Xt−2,Xt−1)

    X t −1 X tX t −2 X t +1 X t +2

    X t −1 X tX t −2 X t +1 X t +2First−order

    Second−order

    11 TDT4171 Artificial Intelligence Methods

  • Probabilistic Reasoning over Time Set-up

    Is a first-order Markov process suitable?

    First-order Markov assumption not exactly true in real world!

    Possible fixes: 1 Increase order of Markov process 2 Augment state, e.g., add Tempt, Pressuret

    State augmentation is enough!

    Any k’th-order Markov process can be expressed as a First order Markov process – Focus on first order processes from now on.

    “Proof”: 1 Assume for simplicity that the process contains only variable

    X, and that we have a second-order Markov process 2 Create a new variable X ′t identical to Xt−1. 3 Let Xt+1 have both Xt and X

    ′ t as parent.

    4 Do for all t. Augmented model is first-order Markov process 12 TDT4171 Artificial Intelligence Methods

  • Probabilistic Reasoning over Time Basic speech recognition

    Speech as probabilistic inference

    How can we recognize speech?

    Speech signals are noisy, variable, ambiguous

    What is the most likely word sequence, given the speech signal?

    Why not choose Words to maximize P(Words|signal)?? Use Bayes’ rule:

    P(Words|signal) = αP(signal|Words)P(Words)

    I.e., decomposes into acoustic model + language model

    Need to be able to do the required calculations!!

    13 TDT4171 Artificial Intellige