probabilistic reasoning ece457 applied artificial intelligence spring 2007 lecture #9

39
Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

Upload: alvin-fowler

Post on 03-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

Probabilistic Reasoning

ECE457 Applied Artificial IntelligenceSpring 2007 Lecture #9

Page 2: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2

Outline Bayesian networks D-separation and independence Inference

Russell & Norvig, sections 14.1 to 14.4

Page 3: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3

Recall the Story from FOL Anyone passing their 457 exam

and winning the lottery is happy. Anyone who studies or is lucky can pass all their exams. Bob did not study but is lucky. Anyone who’s lucky can win the lottery.

Is Bob happy?

Page 4: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4

Add Probabilities Anyone passing their 457 exam and winning the

lottery has a 99% chance of being happy. Anyone only passing their 457 exam has an 80%, while someone only winning the lottery has a 60% chance of being happy, and someone who does neither has a 20% chance of being happy. Anyone who studies has a 90% chance of passing their exams. Anyone who’s lucky has a 50% chance of passing their exams. Anyone who’s both lucky and who studied has a 99% chance of passing, but someone who didn’t study and is unlucky has a 1% chance of passing. There’s a 20% chance that Bob studied, but a 75% chance that he’ll be lucky. Anyone who’s lucky has a 40% chance of winning the lottery, while an unlucky person only has a 1% chance of winning.

What’s the probability of Bob being happy?

Page 5: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5

Probabilities in the Story Example of probabilities in the story

P(Lucky) = 0.75 P(Study) = 0.2 P(PassExam|Study) = 0.9 P(PassExam|Lucky) = 0.5 P(Win|Lucky) = 0.4 P(Happy|PassExam,Win) = 0.99

Some variables directly affect others! Graphical representation of

dependencies and conditional independencies between variables?

Page 6: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6

Bayesian Network Belief network

Directed acyclic graph Nodes represent

variables Edges represent

conditional relationships

Concise representation of any full joint probability distribution

Study

PassExam

Lucky

Win

Happy

Page 7: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7

Bayesian Network Nodes with no

parents have prior probabilities

Nodes with parents have conditional probability tables For all truth value

combinations of their parents

Study

PassExam

Lucky

Win

Happy

Page 8: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8

Bayesian Network

Study

PassExam

Lucky

Win

Happy

P(L) = 0.75

P(S) = 0.2

L P(W)

F 0.01 P(W|L)

T 0.4 P(W|L)

L S P(E)

F F 0.01

P(E|LS)

T F 0.5 P(E|LS)

F T 0.9 P(E|LS)

T T 0.99

P(E|LS)

W E P(H)

P(H)

F F 0.2 0.8

T F 0.6 0.4

F T 0.8 0.2

T T 0.99

0.01

Page 9: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9

Bayesian Network

a b

cd

gf

e

j

h

i

k

m

n

l

o p

q

r

st

u v

w

y

x

z

Page 10: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10

Chain Rule Recall the chain rule

P(A,B) = P(A|B)P(B) P(A,B,C) = P(A|B,C)P(B,C)

P(A,B,C) = P(A|B,C)P(B|C)P(C) P(A1,A2,…,An) =

P(A1|A2,…,An)P(A2|A3,…,An)…P(An-1|An)P(An)

P(A1,A2,…,An) = i=1n P(Ai|Ai+1,…,An)

Page 11: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11

Chain Rule If we know the value of a node’s parents,

we don’t care about more distant ancestors Their influence is included through the

parents A node is conditionally independent of its

predecessors given its parents Or more generally, a node is conditionally

independent of its non-descendents given its parents

Update chain rule P(A1,A2,…,An) = i=1

n P(Ai|parents(Ai))

Page 12: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12

Chain Rule Example Probability that Bob is happy because

he won the lottery and passed his exam, because he’s lucky but did not study

P(H,W,E,L,S) = P(H|WE) * P(W|L) *P(E|LS) * P(L) * P(S)

P(H,W,E,L,S) = 0.99 * 0.4 * 0.5 * 0.75 * 0.8

P(H,W,E,L,S) = 0.12

Page 13: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13

Constructing Bayesian Nets

Build from the top-down

Start with root nodes

Add children Go down to leaves

Study

PassExam

Lucky

Win

Happy

Page 14: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14

Constructing Bayesian Nets

What happens if we build with the wrong order?

Network becomes needlessly complicated

Node ordering is important!

Study

PassExam

Lucky

Win

Happy

Page 15: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15

Connections We can understand dependence in

a network by considering how evidence is transmitted through it Information entered at one node Propagates to descendents and

ancestors through connected nodes Provided no node in path already has

evidence (in which case we would stop the propagation)

Page 16: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16

Serial Connection Study and Happy

are dependent Study and Happy

are independent given PassExam

Intuitively, the only way Study can affect Happy is through PassExam

Study

PassExam

Lucky

Win

Happy

Page 17: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17

Converging Connection Lucky and Study

are independent Lucky and Study

are dependent given PassExam

Intuitively, Lucky can be used to explain away Study

Study

PassExam

Lucky

Win

Happy

Page 18: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18

Diverging Connection Win and PassExams

are dependent Win and PassExams

are independent given Lucky

Intuitively, Lucky can explain both Win and PassExam. Win and PassExam can affect each other by changing the belief in Lucky

Study

PassExam

Lucky

Win

Happy

Page 19: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19

D-Separation Determine if two variables are

independent given some other variables X is independent of Y given Z if X and Y are

d-separate given Z X is d-separate from Y if, for all

(undirected) paths between X and Y, there exists a node Z for which: The connection is serial or diverging and

there is evidence for Z The connection is converging and there is

no evidence for Z or any of its descendents

Page 20: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20

D-Separation

X

ZBlocks path if in evidence

Y

X

ZBlocks path if in evidence

Y

X

ZBlocks path if

not in evidence

Y

Z2 Blocks path if

not in evidence

Page 21: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21

D-Separation Can be computed in linear time

using depth-first-search algorithm Fast algorithm to know if two

nodes are independent Allows us to infer whether learning

the value of a variable might give us information about another variable given what we already know

All d-separated variables are independent but not all independent variable are d-separated

Page 22: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22

D-Separation Exercise

If we observe a value for node g, what other nodes are updated? Nodes f, h and i

If we observe a value for node a, what other nodes are updated? Nodes b, c, d, e, f

a

b

c

d

e

f

g

h

i

j

Page 23: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23

D-Separation Exercise

Given an observation of c, are nodes a and f independent? Yes

Given an observation of i, are nodes g and j independent? No

a

b

c

d

e

f

g

h

i

j

Page 24: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24

Other Independence Criteria

b

cd

g h

i

k

n

l

o p

s

u v

w

y

x

m

A node is conditionally independent of its non-descendents given its parents Recall from

updated chain rule

z

Page 25: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25

Other Independence Criteria

b

cd

g h

i

k

n

l

o p

s

u v

w

y

x

m

A node is conditionally independent of all others in the network given its parents, children, and children’s parents

Markov blanket

z

Page 26: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26

Inference in Bayesian Network Compute the posterior probability of

a query variable given an observed event

P(A1,A2,…,An) = i=1n P(Ai|parents(Ai))

Observed evidence variables E = E1,…,Em

Query variable X Between them: nonevidence (hidden)

variables Y = Y1…Yl

Belief network is X E Y

Page 27: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27

Inference in Bayesian Network P(X|E)

Recall Bayes’ Theorem:P(A|B) = P(A,B) / P(B)

P(X|E) = α P(X,E)Recall marginalization:P(Ai) = j P(Ai,Bj)

P(X|E) = α Y P(X,E,Y)Recall chain rule:P(A1,A2,…,An) = i=1

n P(Ai|parents(Ai))

P(X|E) = α Y A=XE P(A|parents(A))

Page 28: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28

Inference Example

Study

PassExam

Lucky

Win

Happy

P(L) = 0.75

P(S) = 0.2

L P(W)

F 0.01

T 0.4

L S P(E)

F F 0.01

T F 0.5

F T 0.9

T T 0.99

W E P(H)

F F 0.2

T F 0.6

F T 0.8

T T 0.99

Page 29: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29

Inference Example #1 With only the information from the

network (and no observations), what’s the probability that Bob won the lottery?

P(W) = l P(W,l)P(W) = l P(W|l)P(l) P(W) = P(W|L)P(L) + P(W|L)P(L)P(W) = 0.4*0.75 + 0.01*0.25P(W) = 0.3025

Page 30: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30

Inference Example #2 Given that we know that Bob is happy,

what’s the probability that Bob won the lottery?

From the network, we know P(h,e,w,s,l) = P(l)P(s)P(e|l,s)P(w|l)P(h|w,e)

We want to find P(W|H) = α l s e

P(l)P(s)P(e|l,s)P(W|l)P(H|W,e)

P(W|H) also needed to normalize

Page 31: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31

Inference Example #2

l s e P(s) P(l) P(e|l,s) P(W|l) P(H|W,e)

F F F 0.8 0.25 0.99 0.01 0.6 0.001188

T F F 0.8 0.75 0.5 0.4 0.6 0.072

F T F 0.2 0.25 0.1 0.01 0.6 0.00003

T T F 0.2 0.75 0.01 0.4 0.6 0.00036

F F T 0.8 0.25 0.01 0.01 0.99 0.0000198

T F T 0.8 0.75 0.5 0.4 0.99 0.1188

F T T 0.2 0.25 0.9 0.01 0.99 0.0004455

T T T 0.2 0.75 0.99 0.4 0.99 0.058806

P(W|H) = α 0.2516493

Page 32: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32

Inference Example #2

l s e P(s)

P(l) P(e|l,s) P(W|l) P(H|W,e)

F F F 0.8 0.25 0.99 0.99 0.2 0.039204

T F F 0.8 0.75 0.5 0.6 0.2 0.036

F T F 0.2 0.25 0.1 0.99 0.2 0.00099

T T F 0.2 0.75 0.01 0.6 0.2 0.00018

F F T 0.8 0.25 0.01 0.99 0.8 0.001584

T F T 0.8 0.75 0.5 0.6 0.8 0.144

F T T 0.2 0.25 0.9 0.99 0.8 0.03564

T T T 0.2 0.75 0.99 0.6 0.8 0.07128

P(W|H) = α 0.328878

Page 33: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33

Inference Example #2 P(W|H) = α <0.2516493,

0.328878>P(W|H) = <0.4335, 0.5665> Note that P(W|H) > P(W|H) because

P(W|L) P(W|L) The probability of Bob having won

the lottery has increased by 13.1% thanks to our knowledge that he is happy!

Page 34: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34

Expert Systems Bayesian networks used to

implement expert systems Diagnostic systems that contains

subject-specific knowledge Knowledge (nodes, relationships,

probabilities) typically provided by human experts

System observes evidence by asking questions to user, then infers most likely conclusion

Page 35: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35

Pathfinder Expert system for medical

diagnostic of lymph-node diseases Very large Bayesian network

Over 60 diseases Over 100 features of lymph nodes Over 30 features for clinical

information Lot of work from medical experts

8 hours to define features and diseases 35 hours to build network topology 40 hours to assess probabilities

Page 36: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36

Pathfinder One node for each disease

Assumes the diseases are mutually exclusive and exhaustive

Large domain, hard to handle Several small networks for diagnostic tasks

built individually Then combined into a single large network

Page 37: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 37

Pathfinder Testing the

network 53 test

cases (real diagnostics)

Diagnostic accuracy as good as a medical expert

Page 38: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 38

Assumptions Learning agent Environment

Fully observable / Partially observable Deterministic / Strategic / Stochastic Sequential Static / Semi-dynamic Discrete / Continuous Single agent / Multi-agent

Page 39: Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 39

Assumptions Updated We can handle a new combination!

Fully observable & Deterministic No uncertainty (map of Romania)

Fully observable & Stochastic Games of chance (Monopoly,

Backgammon) Partially observable & Deterministic

Logic (Wumpus World) Partially observable & Stochastic