probabilistic reasoning ece457 applied artificial intelligence spring 2007 lecture #9
TRANSCRIPT
Probabilistic Reasoning
ECE457 Applied Artificial IntelligenceSpring 2007 Lecture #9
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2
Outline Bayesian networks D-separation and independence Inference
Russell & Norvig, sections 14.1 to 14.4
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3
Recall the Story from FOL Anyone passing their 457 exam
and winning the lottery is happy. Anyone who studies or is lucky can pass all their exams. Bob did not study but is lucky. Anyone who’s lucky can win the lottery.
Is Bob happy?
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4
Add Probabilities Anyone passing their 457 exam and winning the
lottery has a 99% chance of being happy. Anyone only passing their 457 exam has an 80%, while someone only winning the lottery has a 60% chance of being happy, and someone who does neither has a 20% chance of being happy. Anyone who studies has a 90% chance of passing their exams. Anyone who’s lucky has a 50% chance of passing their exams. Anyone who’s both lucky and who studied has a 99% chance of passing, but someone who didn’t study and is unlucky has a 1% chance of passing. There’s a 20% chance that Bob studied, but a 75% chance that he’ll be lucky. Anyone who’s lucky has a 40% chance of winning the lottery, while an unlucky person only has a 1% chance of winning.
What’s the probability of Bob being happy?
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5
Probabilities in the Story Example of probabilities in the story
P(Lucky) = 0.75 P(Study) = 0.2 P(PassExam|Study) = 0.9 P(PassExam|Lucky) = 0.5 P(Win|Lucky) = 0.4 P(Happy|PassExam,Win) = 0.99
Some variables directly affect others! Graphical representation of
dependencies and conditional independencies between variables?
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6
Bayesian Network Belief network
Directed acyclic graph Nodes represent
variables Edges represent
conditional relationships
Concise representation of any full joint probability distribution
Study
PassExam
Lucky
Win
Happy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7
Bayesian Network Nodes with no
parents have prior probabilities
Nodes with parents have conditional probability tables For all truth value
combinations of their parents
Study
PassExam
Lucky
Win
Happy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8
Bayesian Network
Study
PassExam
Lucky
Win
Happy
P(L) = 0.75
P(S) = 0.2
L P(W)
F 0.01 P(W|L)
T 0.4 P(W|L)
L S P(E)
F F 0.01
P(E|LS)
T F 0.5 P(E|LS)
F T 0.9 P(E|LS)
T T 0.99
P(E|LS)
W E P(H)
P(H)
F F 0.2 0.8
T F 0.6 0.4
F T 0.8 0.2
T T 0.99
0.01
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9
Bayesian Network
a b
cd
gf
e
j
h
i
k
m
n
l
o p
q
r
st
u v
w
y
x
z
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10
Chain Rule Recall the chain rule
P(A,B) = P(A|B)P(B) P(A,B,C) = P(A|B,C)P(B,C)
P(A,B,C) = P(A|B,C)P(B|C)P(C) P(A1,A2,…,An) =
P(A1|A2,…,An)P(A2|A3,…,An)…P(An-1|An)P(An)
P(A1,A2,…,An) = i=1n P(Ai|Ai+1,…,An)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11
Chain Rule If we know the value of a node’s parents,
we don’t care about more distant ancestors Their influence is included through the
parents A node is conditionally independent of its
predecessors given its parents Or more generally, a node is conditionally
independent of its non-descendents given its parents
Update chain rule P(A1,A2,…,An) = i=1
n P(Ai|parents(Ai))
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12
Chain Rule Example Probability that Bob is happy because
he won the lottery and passed his exam, because he’s lucky but did not study
P(H,W,E,L,S) = P(H|WE) * P(W|L) *P(E|LS) * P(L) * P(S)
P(H,W,E,L,S) = 0.99 * 0.4 * 0.5 * 0.75 * 0.8
P(H,W,E,L,S) = 0.12
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13
Constructing Bayesian Nets
Build from the top-down
Start with root nodes
Add children Go down to leaves
Study
PassExam
Lucky
Win
Happy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14
Constructing Bayesian Nets
What happens if we build with the wrong order?
Network becomes needlessly complicated
Node ordering is important!
Study
PassExam
Lucky
Win
Happy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15
Connections We can understand dependence in
a network by considering how evidence is transmitted through it Information entered at one node Propagates to descendents and
ancestors through connected nodes Provided no node in path already has
evidence (in which case we would stop the propagation)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16
Serial Connection Study and Happy
are dependent Study and Happy
are independent given PassExam
Intuitively, the only way Study can affect Happy is through PassExam
Study
PassExam
Lucky
Win
Happy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17
Converging Connection Lucky and Study
are independent Lucky and Study
are dependent given PassExam
Intuitively, Lucky can be used to explain away Study
Study
PassExam
Lucky
Win
Happy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18
Diverging Connection Win and PassExams
are dependent Win and PassExams
are independent given Lucky
Intuitively, Lucky can explain both Win and PassExam. Win and PassExam can affect each other by changing the belief in Lucky
Study
PassExam
Lucky
Win
Happy
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19
D-Separation Determine if two variables are
independent given some other variables X is independent of Y given Z if X and Y are
d-separate given Z X is d-separate from Y if, for all
(undirected) paths between X and Y, there exists a node Z for which: The connection is serial or diverging and
there is evidence for Z The connection is converging and there is
no evidence for Z or any of its descendents
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20
D-Separation
X
ZBlocks path if in evidence
Y
X
ZBlocks path if in evidence
Y
X
ZBlocks path if
not in evidence
Y
Z2 Blocks path if
not in evidence
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21
D-Separation Can be computed in linear time
using depth-first-search algorithm Fast algorithm to know if two
nodes are independent Allows us to infer whether learning
the value of a variable might give us information about another variable given what we already know
All d-separated variables are independent but not all independent variable are d-separated
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22
D-Separation Exercise
If we observe a value for node g, what other nodes are updated? Nodes f, h and i
If we observe a value for node a, what other nodes are updated? Nodes b, c, d, e, f
a
b
c
d
e
f
g
h
i
j
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23
D-Separation Exercise
Given an observation of c, are nodes a and f independent? Yes
Given an observation of i, are nodes g and j independent? No
a
b
c
d
e
f
g
h
i
j
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24
Other Independence Criteria
b
cd
g h
i
k
n
l
o p
s
u v
w
y
x
m
A node is conditionally independent of its non-descendents given its parents Recall from
updated chain rule
z
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25
Other Independence Criteria
b
cd
g h
i
k
n
l
o p
s
u v
w
y
x
m
A node is conditionally independent of all others in the network given its parents, children, and children’s parents
Markov blanket
z
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26
Inference in Bayesian Network Compute the posterior probability of
a query variable given an observed event
P(A1,A2,…,An) = i=1n P(Ai|parents(Ai))
Observed evidence variables E = E1,…,Em
Query variable X Between them: nonevidence (hidden)
variables Y = Y1…Yl
Belief network is X E Y
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27
Inference in Bayesian Network P(X|E)
Recall Bayes’ Theorem:P(A|B) = P(A,B) / P(B)
P(X|E) = α P(X,E)Recall marginalization:P(Ai) = j P(Ai,Bj)
P(X|E) = α Y P(X,E,Y)Recall chain rule:P(A1,A2,…,An) = i=1
n P(Ai|parents(Ai))
P(X|E) = α Y A=XE P(A|parents(A))
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28
Inference Example
Study
PassExam
Lucky
Win
Happy
P(L) = 0.75
P(S) = 0.2
L P(W)
F 0.01
T 0.4
L S P(E)
F F 0.01
T F 0.5
F T 0.9
T T 0.99
W E P(H)
F F 0.2
T F 0.6
F T 0.8
T T 0.99
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29
Inference Example #1 With only the information from the
network (and no observations), what’s the probability that Bob won the lottery?
P(W) = l P(W,l)P(W) = l P(W|l)P(l) P(W) = P(W|L)P(L) + P(W|L)P(L)P(W) = 0.4*0.75 + 0.01*0.25P(W) = 0.3025
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30
Inference Example #2 Given that we know that Bob is happy,
what’s the probability that Bob won the lottery?
From the network, we know P(h,e,w,s,l) = P(l)P(s)P(e|l,s)P(w|l)P(h|w,e)
We want to find P(W|H) = α l s e
P(l)P(s)P(e|l,s)P(W|l)P(H|W,e)
P(W|H) also needed to normalize
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31
Inference Example #2
l s e P(s) P(l) P(e|l,s) P(W|l) P(H|W,e)
F F F 0.8 0.25 0.99 0.01 0.6 0.001188
T F F 0.8 0.75 0.5 0.4 0.6 0.072
F T F 0.2 0.25 0.1 0.01 0.6 0.00003
T T F 0.2 0.75 0.01 0.4 0.6 0.00036
F F T 0.8 0.25 0.01 0.01 0.99 0.0000198
T F T 0.8 0.75 0.5 0.4 0.99 0.1188
F T T 0.2 0.25 0.9 0.01 0.99 0.0004455
T T T 0.2 0.75 0.99 0.4 0.99 0.058806
P(W|H) = α 0.2516493
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32
Inference Example #2
l s e P(s)
P(l) P(e|l,s) P(W|l) P(H|W,e)
F F F 0.8 0.25 0.99 0.99 0.2 0.039204
T F F 0.8 0.75 0.5 0.6 0.2 0.036
F T F 0.2 0.25 0.1 0.99 0.2 0.00099
T T F 0.2 0.75 0.01 0.6 0.2 0.00018
F F T 0.8 0.25 0.01 0.99 0.8 0.001584
T F T 0.8 0.75 0.5 0.6 0.8 0.144
F T T 0.2 0.25 0.9 0.99 0.8 0.03564
T T T 0.2 0.75 0.99 0.6 0.8 0.07128
P(W|H) = α 0.328878
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33
Inference Example #2 P(W|H) = α <0.2516493,
0.328878>P(W|H) = <0.4335, 0.5665> Note that P(W|H) > P(W|H) because
P(W|L) P(W|L) The probability of Bob having won
the lottery has increased by 13.1% thanks to our knowledge that he is happy!
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34
Expert Systems Bayesian networks used to
implement expert systems Diagnostic systems that contains
subject-specific knowledge Knowledge (nodes, relationships,
probabilities) typically provided by human experts
System observes evidence by asking questions to user, then infers most likely conclusion
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35
Pathfinder Expert system for medical
diagnostic of lymph-node diseases Very large Bayesian network
Over 60 diseases Over 100 features of lymph nodes Over 30 features for clinical
information Lot of work from medical experts
8 hours to define features and diseases 35 hours to build network topology 40 hours to assess probabilities
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36
Pathfinder One node for each disease
Assumes the diseases are mutually exclusive and exhaustive
Large domain, hard to handle Several small networks for diagnostic tasks
built individually Then combined into a single large network
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 37
Pathfinder Testing the
network 53 test
cases (real diagnostics)
Diagnostic accuracy as good as a medical expert
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 38
Assumptions Learning agent Environment
Fully observable / Partially observable Deterministic / Strategic / Stochastic Sequential Static / Semi-dynamic Discrete / Continuous Single agent / Multi-agent
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 39
Assumptions Updated We can handle a new combination!
Fully observable & Deterministic No uncertainty (map of Romania)
Fully observable & Stochastic Games of chance (Monopoly,
Backgammon) Partially observable & Deterministic
Logic (Wumpus World) Partially observable & Stochastic