probabilistic reasoning ece457 applied artificial intelligence spring 2007 lecture #9

Probabilistic Reasoning

ECE457 Applied Artificial IntelligenceSpring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007)

Outline Bayesian networks D-separation and independence Inference

Russell & Norvig, sections 14.1 to 14.4


Recall the Story from FOL Anyone passing their 457 exam

and winning the lottery is happy. Anyone who studies or is lucky can pass all their exams. Bob did not study but is lucky. Anyone who’s lucky can win the lottery.

Is Bob happy?


Add Probabilities Anyone passing their 457 exam and winning the

lottery has a 99% chance of being happy. Anyone only passing their 457 exam has an 80%, while someone only winning the lottery has a 60% chance of being happy, and someone who does neither has a 20% chance of being happy. Anyone who studies has a 90% chance of passing their exams. Anyone who’s lucky has a 50% chance of passing their exams. Anyone who’s both lucky and who studied has a 99% chance of passing, but someone who didn’t study and is unlucky has a 1% chance of passing. There’s a 20% chance that Bob studied, but a 75% chance that he’ll be lucky. Anyone who’s lucky has a 40% chance of winning the lottery, while an unlucky person only has a 1% chance of winning.

What’s the probability of Bob being happy?


Probabilities in the Story Example of probabilities in the story

P(Lucky) = 0.75 P(Study) = 0.2 P(PassExam|Study) = 0.9 P(PassExam|Lucky) = 0.5 P(Win|Lucky) = 0.4 P(Happy|PassExam,Win) = 0.99

Some variables directly affect others! Graphical representation of

dependencies and conditional independencies between variables?


Bayesian Network Belief network

Directed acyclic graph Nodes represent

variables Edges represent

conditional relationships

Concise representation of any full joint probability distribution

Study

PassExam

Lucky

Win

Happy


Bayesian Network Nodes with no

parents have prior probabilities

Nodes with parents have conditional probability tables For all truth value

combinations of their parents

Study

PassExam

Lucky

Win

Happy


Bayesian Network

a b

cd

gf

e

j

h

i

k

m

n

l

o p

q

r

st

u v

w

y

x

z


Chain Rule If we know the value of a node’s parents,

we don’t care about more distant ancestors Their influence is included through the

parents A node is conditionally independent of its

predecessors given its parents Or more generally, a node is conditionally

independent of its non-descendents given its parents

Update chain rule P(A1,A2,…,An) = i=1

n P(Ai|parents(Ai))


Chain Rule Example Probability that Bob is happy because

he won the lottery and passed his exam, because he’s lucky but did not study

P(H,W,E,L,S) = P(H|WE) * P(W|L) *P(E|LS) * P(L) * P(S)

P(H,W,E,L,S) = 0.99 * 0.4 * 0.5 * 0.75 * 0.8

P(H,W,E,L,S) = 0.12


Constructing Bayesian Nets

Build from the top-down

Start with root nodes

Add children Go down to leaves

Study

PassExam

Lucky

Win

Happy


Constructing Bayesian Nets

What happens if we build with the wrong order?

Network becomes needlessly complicated

Node ordering is important!

Study

PassExam

Lucky

Win

Happy


Connections We can understand dependence in

a network by considering how evidence is transmitted through it Information entered at one node Propagates to descendents and

ancestors through connected nodes Provided no node in path already has

evidence (in which case we would stop the propagation)


Serial Connection Study and Happy

are dependent Study and Happy

are independent given PassExam

Intuitively, the only way Study can affect Happy is through PassExam

Study

PassExam

Lucky

Win

Happy


Converging Connection Lucky and Study

are independent Lucky and Study

are dependent given PassExam

Intuitively, Lucky can be used to explain away Study

Study

PassExam

Lucky

Win

Happy


Diverging Connection Win and PassExams

are dependent Win and PassExams

are independent given Lucky

Intuitively, Lucky can explain both Win and PassExam. Win and PassExam can affect each other by changing the belief in Lucky

Study

PassExam

Lucky

Win

Happy


D-Separation Determine if two variables are

independent given some other variables X is independent of Y given Z if X and Y are

d-separate given Z X is d-separate from Y if, for all

(undirected) paths between X and Y, there exists a node Z for which: The connection is serial or diverging and

there is evidence for Z The connection is converging and there is

no evidence for Z or any of its descendents


D-Separation

X

ZBlocks path if in evidence

Y

X

ZBlocks path if in evidence

Y

X

ZBlocks path if

not in evidence

Y

Z2 Blocks path if

not in evidence


D-Separation Can be computed in linear time

using depth-first-search algorithm Fast algorithm to know if two

nodes are independent Allows us to infer whether learning

the value of a variable might give us information about another variable given what we already know

All d-separated variables are independent but not all independent variable are d-separated


D-Separation Exercise

If we observe a value for node g, what other nodes are updated? Nodes f, h and i

If we observe a value for node a, what other nodes are updated? Nodes b, c, d, e, f

a

b

c

d

e

f

g

h

i

j


D-Separation Exercise

Given an observation of c, are nodes a and f independent? Yes

Given an observation of i, are nodes g and j independent? No

a

b

c

d

e

f

g

h

i

j


Other Independence Criteria

b

cd

g h

i

k

n

l

o p

s

u v

w

y

x

m

A node is conditionally independent of its non-descendents given its parents Recall from

updated chain rule

z


Other Independence Criteria

b

cd

g h

i

k

n

l

o p

s

u v

w

y

x

m

A node is conditionally independent of all others in the network given its parents, children, and children’s parents

Markov blanket

z


Inference in Bayesian Network Compute the posterior probability of

a query variable given an observed event

P(A1,A2,…,An) = i=1n P(Ai|parents(Ai))

Observed evidence variables E = E1,…,Em

Query variable X Between them: nonevidence (hidden)

variables Y = Y1…Yl

Belief network is X E Y


Inference Example

Study

PassExam

Lucky

Win

Happy

P(L) = 0.75

P(S) = 0.2

L P(W)

F 0.01

T 0.4

L S P(E)

F F 0.01

T F 0.5

F T 0.9

T T 0.99

W E P(H)

F F 0.2

T F 0.6

F T 0.8

T T 0.99


Inference Example #1 With only the information from the

network (and no observations), what’s the probability that Bob won the lottery?

P(W) = l P(W,l)P(W) = l P(W|l)P(l) P(W) = P(W|L)P(L) + P(W|L)P(L)P(W) = 0.4*0.75 + 0.01*0.25P(W) = 0.3025


Inference Example #2

l s e P(s) P(l) P(e|l,s) P(W|l) P(H|W,e)

F F F 0.8 0.25 0.99 0.01 0.6 0.001188

T F F 0.8 0.75 0.5 0.4 0.6 0.072

F T F 0.2 0.25 0.1 0.01 0.6 0.00003

T T F 0.2 0.75 0.01 0.4 0.6 0.00036

F F T 0.8 0.25 0.01 0.01 0.99 0.0000198

T F T 0.8 0.75 0.5 0.4 0.99 0.1188

F T T 0.2 0.25 0.9 0.01 0.99 0.0004455

T T T 0.2 0.75 0.99 0.4 0.99 0.058806

P(W|H) = α 0.2516493


Inference Example #2

l s e P(s)

P(l) P(e|l,s) P(W|l) P(H|W,e)

F F F 0.8 0.25 0.99 0.99 0.2 0.039204

T F F 0.8 0.75 0.5 0.6 0.2 0.036

F T F 0.2 0.25 0.1 0.99 0.2 0.00099

T T F 0.2 0.75 0.01 0.6 0.2 0.00018

F F T 0.8 0.25 0.01 0.99 0.8 0.001584

T F T 0.8 0.75 0.5 0.6 0.8 0.144

F T T 0.2 0.25 0.9 0.99 0.8 0.03564

T T T 0.2 0.75 0.99 0.6 0.8 0.07128

P(W|H) = α 0.328878


Expert Systems Bayesian networks used to

implement expert systems Diagnostic systems that contains

subject-specific knowledge Knowledge (nodes, relationships,

probabilities) typically provided by human experts

System observes evidence by asking questions to user, then infers most likely conclusion


Pathfinder Expert system for medical

diagnostic of lymph-node diseases Very large Bayesian network

Over 60 diseases Over 100 features of lymph nodes Over 30 features for clinical

information Lot of work from medical experts

8 hours to define features and diseases 35 hours to build network topology 40 hours to assess probabilities


Pathfinder One node for each disease

Assumes the diseases are mutually exclusive and exhaustive

Large domain, hard to handle Several small networks for diagnostic tasks

built individually Then combined into a single large network


Pathfinder Testing the

network 53 test

cases (real diagnostics)

Diagnostic accuracy as good as a medical expert


Assumptions Learning agent Environment

Fully observable / Partially observable Deterministic / Strategic / Stochastic Sequential Static / Semi-dynamic Discrete / Continuous Single agent / Multi-agent


Assumptions Updated We can handle a new combination!

Fully observable & Deterministic No uncertainty (map of Romania)

Fully observable & Stochastic Games of chance (Monopoly,

Backgammon) Partially observable & Deterministic

Logic (Wumpus World) Partially observable & Stochastic

probabilistic reasoning ece457 applied artificial intelligence spring 2007 lecture #9

Documents

bob happy

whos lucky

hes lucky

probability of bob

conditional independencies

nodes parents

bayesian netsbuild

parentsa node