knowledge representation & reasoning lecture #4 uiuc cs 498: section ea professor: eyal amir...

85
Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 ed on slides by Lise Getoor and Alvaro Cardenas (UM turn based on slides by Nir Friedman (Hebrew U)))

Upload: emil-wright

Post on 14-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Knowledge Representation & ReasoningLecture #4

UIUC CS 498: Section EA

Professor: Eyal Amir

Fall Semester 2005(Based on slides by Lise Getoor and Alvaro Cardenas (UMD) (in turn based on slides by Nir Friedman (Hebrew U)))

Page 2: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Today and Next Class

1. Probabilistic graphical models

2. Treewidth methods:1. Variable elimination

2. Clique tree algorithm

3. Applications du jour: Sensor Networks

Page 3: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Probabilistic Representation of Knowledge

• Knowledge that is deterministic– If there is rain, there are clouds:

Clouds v Rain

• Knowledge that includes uncertainty– If there are clouds, there is a chance for rain

• Probabilistic knowledge– If there are clouds, the rain has probability 0.3

Pr(Rain=True | Clouds=True)=0.3

• How do we write probabilistic knowledge?

Page 4: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

How to Represent Probabilistic Knowledge?

• A probability distribution is a function from measurable sets of events to [0,1]

Pr : Pow() [0,1]– Example: domain = {T,F} x {T,F}

Random variables: Rain, Clouds• If the domain is discrete, Pr specifies

probabilities for all 2|| sets• But, it is enough to represent only || values:

Pr(a1a2)=Pr(a1)+Pr(a2)-Pr(a1a2)• This is not good enough if =2#rv’s

(rv’s=Random Variables) and #rv’s is large

Page 5: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: How Many RV’s?

Page 6: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Independent Random Variables

• Two variables X and Y are independent if– P(X = x|Y = y) = P(X = x) for all values x,y– That is, learning the values of Y does not

change prediction of X

• If X and Y are independent then – P(X,Y) = P(X|Y)P(Y) = P(X)P(Y)

• In general, if X1,…,Xp are independent, then P(X1,…,Xp)= P(X1)...P(Xp)

– Requires O(n) parameters

Page 7: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Conditional Independence

• Unfortunately, most of random variables of interest are not independent of each other

• A more suitable notion is that of conditional independence

• Two variables X and Y are conditionally independent given Z if– P(X = x|Y = y,Z=z) = P(X = x|Z=z) for all values x,y,z– That is, learning the values of Y does not change

prediction of X once we know the value of Z

– notation: I ( X , Y | Z )

Page 8: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Modeling assumptions:Ancestors can effect descendants' genotype only by passing genetic materials through intermediate generations

Example: Family trees

Noisy stochastic process:

Example: Pedigree

• A node represents an individual’sgenotype

Homer

Bart

Marge

Lisa Maggie

Page 9: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Markov Assumption• We now make this

independence assumption more precise for directed acyclic graphs (DAGs)

• Each random variable X, is independent of its non-descendents, given its parents Pa(X)

• Formally,I (X, NonDesc(X) | Pa(X))

Descendent

Ancestor

Parent

Non-descendent

X

Y1 Y2

Non-descendent

Page 10: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Markov Assumption Example

• In this example:– I ( E, B )– I ( B, {E, R} )– I ( R, {A, B, C} | E )– I ( A, R | B,E )– I ( C, {B, E, R} | A)

Earthquake

Radio

Burglary

Alarm

Call

Page 11: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

I-Maps• A DAG G is an I-Map of a distribution P if all

Markov assumptions implied by G are satisfied by P(Assuming G and P both use the same set of random variables)

Examples:

X Y

x y P(x,y)0 0 0.250 1 0.251 0 0.251 1 0.25

X Y

x y P(x,y)0 0 0.20 1 0.31 0 0.41 1 0.1

Page 12: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Factorization

• Given that G is an I-Map of P, can we simplify the representation of P?

• Example:

• Since I(X,Y), we have that P(X|Y) = P(X)

• Applying the chain ruleP(X,Y) = P(X|Y) P(Y) = P(X) P(Y)

• Thus, we have a simpler representation of P(X,Y)

X Y

Page 13: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Factorization Theorem

From assumption: )X(NonDesc)X(Pa}X,X{

}X,X{)X(Pa

ii1i,1

1i,1i

i

iip1 ))X(Pa|X(P)X,...,X(P

Thm: if G is an I-Map of P, then

i

1i1ip1 )X,...,X|X(P)X,...,X(PProof:• By chain rule:

• wlog. X1,…,Xp is an ordering consistent with G

• Since G is an I-Map, I (Xi, NonDesc(Xi)| Pa(Xi))

• We conclude, P(Xi | X1,…,Xi-1) = P(Xi | Pa(Xi) )

))X(Pa|)X(Pa}X,X{,X(I ii1i,1i • Hence,

Page 14: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Factorization Example

P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E)

Earthquake

Radio

Burglary

Alarm

Call

versusP(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A)

Page 15: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Consequences

• We can write P in terms of “local” conditional probabilities

If G is sparse,– that is, |Pa(Xi)| < k ,

each conditional probability can be specified compactly– e.g. for binary variables, these require O(2k) params.

representation of P is compact– linear in number of variables

Page 16: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Summary

We defined the following concepts• The Markov Independences of a DAG G

– I (Xi , NonDesc(Xi) | Pai )

• G is an I-Map of a distribution P– If P satisfies the Markov independencies implied by G

We proved the factorization theorem• if G is an I-Map of P, then

i

iin1 )Pa|X(P)X,...,X(P

Page 17: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

• Let Markov(G) be the set of Markov Independencies implied by G

• The factorization theorem shows

G is an I-Map of P

• We can also show the opposite:

Thm:

G is an I-Map of P

Conditional Independencies

i

iin PaXPXXP )|(),...,( 1

i

iin PaXPXXP )|(),...,( 1

Page 18: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Proof (Outline)

Example:X

Y

Z

)|()()|()|()(

),(),,(

),|(XYPXP

XZPXYPXPYXPZYXP

YXZP

)|( XZP

Page 19: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Implied Independencies

• Does a graph G imply additional independencies as a consequence of Markov(G)?

• We can define a logic of independence statements

• Some axioms:– I( X ; Y | Z ) I( Y; X | Z )

– I( X ; Y1, Y2 | Z ) I( X; Y1 | Z )

Page 20: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

d-separation

• A procedure d-sep(X; Y | Z, G) that given a DAG G, and sets X, Y, and Z returns either yes or no

• Goal: d-sep(X; Y | Z, G) = yes iff I(X;Y|Z) follows

from Markov(G)

Page 21: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Paths

• Intuition: dependency must “flow” along paths in the graph

• A path is a sequence of neighboring variables

Examples:• R E A B• C A E R

Earthquake

Radio

Burglary

Alarm

Call

Page 22: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Paths

• We want to know when a path is– active -- creates dependency between end

nodes– blocked -- cannot create dependency end

nodes

• We want to classify situations in which paths are active.

Page 23: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Blocked Unblocked

E

R A

E

R A

Path Blockage

Three cases:– Common cause

Blocked Active

Page 24: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Blocked Unblocked

E

C

A

E

C

A

Path Blockage

Three cases:– Common cause

– Intermediate cause

Blocked Active

Page 25: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Blocked Unblocked

E B

A

C

E B

A

CE B

A

C

Path Blockage

Three cases:– Common cause

– Intermediate cause

– Common Effect

Blocked Active

Page 26: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Path Blockage -- General Case

A path is active, given evidence Z, if• Whenever we have the configuration

B or one of its descendents are in Z

• No other nodes in the path are in Z

A path is blocked, given evidence Z, if it is not active.

A C

B

Page 27: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

A

– d-sep(R,B)?

Example

E B

C

R

Page 28: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

– d-sep(R,B) = yes– d-sep(R,B|A)?

Example

E B

A

C

R

Page 29: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

– d-sep(R,B) = yes– d-sep(R,B|A) = no– d-sep(R,B|E,A)?

Example

E B

A

C

R

Page 30: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

d-Separation

• X is d-separated from Y, given Z, if all paths from a node in X to a node in Y are blocked, given Z.

• Checking d-separation can be done efficiently (linear time in number of edges)– Bottom-up phase:

Mark all nodes whose descendents are in Z– X to Y phase:

Traverse (BFS) all edges on paths from X to Y and check if they are blocked

Page 31: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Soundness

Thm: If – G is an I-Map of P– d-sep( X; Y | Z, G ) = yes

• then– P satisfies I( X; Y | Z )

Informally: Any independence reported by d-separation is satisfied by underlying distribution

Page 32: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Completeness

Thm: If d-sep( X; Y | Z, G ) = no

• then there is a distribution P such that– G is an I-Map of P– P does not satisfy I( X; Y | Z )

Informally: Any independence not reported by d-separation might be violated by the underlying distribution

• We cannot determine this by examining the graph structure alone

Page 33: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Summary: Structure

• We explored DAGs as a representation of conditional independencies:

– Markov independencies of a DAG

– Tight correspondence between Markov(G) and the factorization defined by G

– d-separation, a sound & complete procedure for computing the consequences of the independencies

– Notion of minimal I-Map

– P-Maps

• This theory is the basis for defining Bayesian networks

Page 34: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Inference

• We now have compact representations of probability distributions:– Bayesian Networks– Markov Networks

• Network describes a unique probability distribution P

• How do we answer queries about P?• We use inference as a name for the process

of computing answers to such queries

Page 35: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Today

1. Probabilistic graphical models

2. Treewidth methods:1. Variable elimination

2. Clique tree algorithm

3. Applications du jour: Sensor Networks

Page 36: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Queries: Likelihood

• There are many types of queries we might ask. • Most of these involve evidence

– An evidence e is an assignment of values to a set E variables in the domain

– Without loss of generality E = { Xk+1, …, Xn }

• Simplest query: compute probability of evidence

• This is often referred to as computing the likelihood of the evidence

1x

1 ),,,( )(kx

kxxPP ee

Page 37: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Queries: A posteriori belief• Often we are interested in the conditional

probability of a variable given the evidence

• This is the a posteriori belief in X, given evidence e

• A related task is computing the term P(X, e) – i.e., the likelihood of e and X = x for values of

X

x

xXPxXP

xXP),(

),()|(

ee

e

)(),(

)|(eP

eXPeXP

Page 38: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

A posteriori beliefThis query is useful in many cases:

• Prediction: what is the probability of an outcome given the starting condition– Target is a descendent of the evidence

• Diagnosis: what is the probability of disease/fault given symptoms– Target is an ancestor of the evidence

• the direction between variables does not restrict the directions of the queries

Page 39: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Queries: MAP

• In this query we want to find the maximum a posteriori assignment for some variable of interest (say X1,…,Xl )

• That is, x1,…,xl maximize the probability

P(x1,…,xl | e)

• Note that this is equivalent to maximizing

P(x1,…,xl, e)

Page 40: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Queries: MAP

We can use MAP for:

• Classification – find most likely label, given the evidence

• Explanation – What is the most likely scenario, given the

evidence

Page 41: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Complexity of Inference

Thm:

Computing P(X = x) in a Bayesian network is NP-hard

Not surprising, since we can simulate Boolean gates.

Page 42: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Approaches to inference

• Exact inference –Inference in Simple Chains–Variable elimination–Clustering / join tree algorithms

• Approximate inference – later in semester–Stochastic simulation / sampling methods–Markov chain Monte Carlo methods–Mean field theory – your presentation

Page 43: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Variable Elimination

General idea:• Write query in the form

• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product

kx x x i

iin paxPXP3 2

)|(),( e

Page 44: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example

Visit to Asia

Smoking

Lung CancerTuberculosis

Abnormalityin Chest

Bronchitis

X-Ray Dyspnea

• “Asia” network:

Page 45: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

),|()|(),|()|()|()|()()(),,,,,,,(

badPaxPltaPsbPslPvtPsPvPdxbaltsvP

• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,b

Initial factors

“Brute force approach”

P(d) P(v,s,t,l,a,b,x,d)v

s

t

l

a

b

x

Complexity is exponential in the size of the graph (number of variables) = T. N=number of states for each variable

O(NT )

Page 46: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: v,s,x,t,l,a,bInitial factors

Eliminate: v

Note: fv(t) = P(t)In general, result of elimination is not necessarily a probability term

Compute: v

v vtPvPtf )|()()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

Page 47: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: s,x,t,l,a,b

• Initial factors

Eliminate: s

Summing on s results in a factor with two arguments fs(b,l)In general, result of elimination may be a function of several variables

Compute: s

s slPsbPsPlbf )|()|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

Page 48: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: x,t,l,a,b

• Initial factors

Eliminate: x

Note: fx(a) = 1 for all values of a !!

Compute: x

x axPaf )|()(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

Page 49: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: t,l,a,b

• Initial factors

Eliminate: t

Compute: t

vt ltaPtflaf ),|()(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

),|(),()(),( badPlafaflbf txs

Page 50: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: l,a,b

• Initial factors

Eliminate: l

Compute: l

tsl laflbfbaf ),(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

),|(),()(),( badPlafaflbf txs

),|()(),( badPafbaf xl

Page 51: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

),|()|(),|()|()|()|()()( badPaxPltaPsbPslPvtPsPvP

• We want to compute P(d)• Need to eliminate: b

• Initial factors

Eliminate: a,bCompute:

b

aba

xla dbfdfbadpafbafdbf ),()(),|()(),(),(

),|()|(),|()|()|()()( badPaxPltaPsbPslPsPtfv

),|()|(),|(),()( badPaxPltaPlbftf sv

),|(),|()(),()( badPltaPaflbftf xsv

),|()(),( badPafbaf xl),|(),()(),( badPlafaflbf txs

)(),( dfdbf ba

Page 52: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

V S

LT

A B

X D

P(v)P(s)P(t | v)P(l | s)P(b | s)P(a | t,l)P(x | a)P(d | a,b)

• Different elimination ordering:• Need to eliminate: a,b,x,t,v,s,l

• Initial factors

ga (l,t,d,b,x)

gb (l,t,d,x,s)

gx (l, t,d,s)

gt (l,t,s,v)

gv (l,d,s)

gs(l,d)

gl (d)

Intermediate factors:

Complexity is exponential in the size of the factors!

Page 53: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Variable Elimination

• We now understand variable elimination as a sequence of rewriting operations

• Actual computation is done in elimination step

• Exactly the same computation procedure applies to Markov networks

• Computation depends on order of elimination

Page 54: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Markov Network(Undirected Graphical Models)

• A graph with hyper-edges (multi-vertex edges)

• Every hyper-edge e=(x1…xk) has a potential function fe(x1…xk)

• The probability distribution is

11

11

),...,(.../1

),...,(),...,(

x xn Eeekee

Eeekeen

xxfZ

xxfZXXP

Page 55: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

x

kxkx yyxfyyf ),,,('),,( 11

m

ilikx i

yyxfyyxf1

,1,1,11 ),,(),,,('

Complexity of variable elimination

• Suppose in one elimination step we compute

This requires • multiplications

– For each value for x, y1, …, yk, we do m multiplications

• additions

– For each value of y1, …, yk , we do |Val(X)| additionsComplexity is exponential in number of variables in the

intermediate factor

i

iYXm )Val()Val(

i

iYX )Val()Val(

Page 56: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Undirected graph representation• At each stage of the procedure, we have an

algebraic term that we need to evaluate• In general this term is of the form:

where Zi are sets of variables• We now plot a graph where there is undirected edge

X--Y if X,Y are arguments of some factor– that is, if X,Y are in some Zi

• Note: this is the Markov network that describes the probability on the variables we did not eliminate yet

1

)(),,( 1y y i

ikn

fxxP iZ

Page 57: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Chordal Graphs

• elimination ordering undirected chordal graph

Graph:• Maximal cliques are factors in elimination• Factors in elimination are cliques in the graph• Complexity is exponential in size of the largest

clique in graph

LT

A B

X

V S

D

V S

LT

A B

X D

Page 58: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Induced Width

• The size of the largest clique in the induced graph is thus an indicator for the complexity of variable elimination

• This quantity is called the induced width of a graph according to the specified ordering

• Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

Page 59: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

PolyTrees

• A polytree is a network where there is at most one path from one variable to another

Thm:• Inference in a polytree is linear in the

representation size of the network– This assumes tabular CPT representation

A

CB

D E

F G

H

Page 60: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Today

1. Probabilistic graphical models

2. Treewidth methods:1. Variable elimination

2. Clique tree algorithm

3. Applications du jour: Sensor Networks

Page 61: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Junction Tree• Why junction tree?

– More efficient for some tasks than variable elimination

– We can avoid cycles if we turn highly-interconnected subsets of the nodes into “supernodes” cluster

• Objective– Compute

• is a value of a variable and is evidence for a set of variable

)|( eEvVP

v V eE

Page 62: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Properties of Junction Tree• An undirected tree• Each node is a cluster (nonempty set)

of variables• Running intersection property:

– Given two clusters and , all clusters on the path between and contain

• Separator sets (sepsets): – Intersection of the adjacent cluster

X YXY YX

ADEABD DEFAD DE

Cluster ABDSepset DE

Page 63: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Potentials

• Potentials: – Denoted by

• Marginalization– , the marginalization of into X

• Multiplication– , the multiplication of and

:X R {0}X

X\Y

YX YX

Y

YXZ YX

YXZ

Page 64: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Properties of Junction Tree

• Belief potentials: – Map each instantiation of clusters or sepsets into a

real number

• Constraints:– Consistency: for each cluster and neighboring

sepset

– The joint distribution

XS

SS\X

X

j

i

j

iPS

XU

)(

Page 65: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Properties of Junction Tree

• If a junction tree satisfies the properties, it follows that:– For each cluster (or sepset) ,

– The probability distribution of any variable , using any cluster (or sepset) that contains

X

)(XX P

VX V

}\{

)(V

VPX

X

Page 66: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Building Junction Trees

DAG

Moral Graph

Triangulated Graph

Junction Tree

Identifying Cliques

Page 67: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Constructing the Moral Graph

A

B

D

C

E

G

F

H

Page 68: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Constructing The Moral Graph

• Add undirected edges to all co-parents which are not currently joined –Marrying parents

A

B

D

C

E

G

F

H

Page 69: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Constructing The Moral Graph

• Add undirected edges to all co-parents which are not currently joined –Marrying parents

• Drop the directions of the arcs

A

B

D

C

E

G

F

H

Page 70: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Triangulating

• An undirected graph is triangulated iff every cycle of length >3 contains an edge to connects two nonadjacent nodes

A

B

D

C

E

G

F

H

Page 71: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Identifying Cliques

• A clique is a subgraph of an undirected graph that is complete and maximal

A

B

D

C

E

G

F

H

EGH

ADEABD

ACEDEF

CEG

Page 72: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Junction Tree

• A junction tree is a subgraph of the clique graph that – is a tree – contains all the cliques– satisfies the running intersection property

EGH

ADEABD

ACEDEF

CEG

ADEABD ACEAD AE CEGCE

DEF

DE

EGH

EG

Page 73: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Principle of Inference

DAG

Junction Tree

Inconsistent Junction Tree

Initialization

Consistent Junction Tree

Propagation

)|( eEvVP

Marginalization

Page 74: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Create Join Tree

X1 X2

Y1 Y2

HMM with 2 time steps:

Junction Tree:

X1,X2X1,Y1 X2,Y2X1 X2

Page 75: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Initialization

VariableAssociated

ClusterPotential function

X1 X1,Y1

Y1 X1,Y1

X2 X1,X2

Y2 X2,Y2

X1,Y1 P(X1)

X1,Y1 P(X1)P(Y1 | X1)

X1,X 2 P(X2 | X1)

X 2,Y 2 P(Y2 | X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 76: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Collect Evidence

• Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected.

• Call recursively neighboring cliques for messages:

• 1. Call X1,Y1.– 1. Projection:

– 2. Absorption:

X1 X1,Y1 P(X1,Y1) P(X1)Y1

{X1,Y1} X1

X1,X 2 X1,X 2

X1

X1old

P(X2 | X1)P(X1) P(X1,X2)

Page 77: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Collect Evidence (cont.)

• 2. Call X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X 2,Y 2 P(Y2 | X2) 1Y 2

{X 2,Y 2} X 2

X1,X2X1,Y1 X2,Y2X1 X2

X1,X 2 X1,X 2

X 2

X 2old

P(X1,X2)

Page 78: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Distribute Evidence

• Pass messages recursively to neighboring nodes

• Pass message from X1,X2 to X1,Y1:– 1. Projection:

– 2. Absorption:

X1 X1,X 2 P(X1,X2) P(X1)X 2

{X1,X 2} X1

X1,Y1 X1,Y1

X1

X1old

P(X1,Y1)P(X1)

P(X1)

Page 79: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Distribute Evidence (cont.)

• Pass message from X1,X2 to X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X1,X 2 P(X1,X2) P(X2)X1

{X1,X 2} X 2

X 2,Y 2 X 2,Y 2

X 2

X 2old P(Y2 | X2)

P(X2)

1P(Y2,X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 80: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Inference with evidence

• Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation)

• Assign likelihoods to the potential functions during initialization:

X1,Y1 0 if Y11

P(X1,Y10) if Y10

X 2,Y 2 0 if Y2 0

P(Y2 1 | X2) if Y2 1

Page 81: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Inference with evidence (cont.)

• Repeating the same steps as in the previous case, we obtain:

X1,Y1 0 if Y11

P(X1,Y10,Y2 1) if Y10

X1 P(X1,Y10,Y2 1)

X1,X 2 P(X1,Y10,X2,Y2 1)

X 2 P(Y10,X2,Y2 1)

X 2,Y 2 0 if Y2 0

P(Y10,X2,Y2 1) if Y2 1

Page 82: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Next Time

• Inference with Propositional Logic

• Later in the semester: (a) Approximate Probabilistic Inference via

sampling: Gibbs, Priority, MCMC

(b) Approximate Probabilistic Inference using a close, simpler distribution

Page 83: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

THE END

Page 84: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Example: Naïve Bayesian Model

• A common model in early diagnosis:– Symptoms are conditionally independent given the disease (or

fault)

• Thus, if – X1,…,Xp denote whether the symptoms exhibited by the patient

(headache, high-fever, etc.) and – H denotes the hypothesis about the patients health

• then, P(X1,…,Xp,H) = P(H)P(X1|H)…P(Xp|H),

• This naïve Bayesian model allows compact representation– It does embody strong independence assumptions

Page 85: Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro

Elimination on Trees

• Formally, for any tree, there is an elimination ordering with induced width = 1

Thm

• Inference on trees is linear in number of variables