jir´ı vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 ·...

27
Some applications of Bayesian networks Jiˇ ı Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic This presentation is available at http://www.utia.cas.cz/vomlel/ 1

Upload: others

Post on 17-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Some applications of Bayesian networks

Ji rı Vomlel

Institute of Information Theory and AutomationAcademy of Sciences of the Czech Republic

This presentation is available athttp://www.utia.cas.cz/vomlel/

1

Page 2: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Contents• Brief introduction to Bayesian networks

• Typical tasks that can be solved using Bayesian networks

• 1: Medical diagnosis (a very simple example)

• 2: Decision making maximizing expected utility (another simpleexample)

• 3: Adaptive testing (a case study)

• 4: Decision-theoretic troubleshooting (a commercial product)

2

Page 3: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Bayesian network• a directed acyclic graph G = (V, E)

• each node i ∈ V corresponds to a random variable Xi with afinite set Xi of mutually exclusive states

• pa(i) denotes the set of parents of node i in graph G

• to each node i ∈ V corresponds a conditional probability tableP(Xi | (X j) j∈pa(i))

• the DAG implies conditional independence relations between(Xi)i∈V

• d-separation (Pearl, 1986) can be used to read the CI relationsfrom the DAG

3

Page 4: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Using the chain rule we have that:

P((Xi)i∈V) = ∏i∈V

P(Xi | Xi−1, . . . , X1)

Assume an ordering of Xi , i ∈ V such that if j ∈ pa(i) then j < i.From the DAG we can read conditional independence relations

Xi ⊥⊥ Xk | (X j) j∈pa(i) for i ∈ V and k < i and k 6∈ pa(i)

Using the conditional independence relations from the DAG we get

P((Xi)i∈V) = ∏i∈V

P(Xi | (X j) j∈pa(i)) .

It is the joint probability distribution represented by the Bayesiannetwork.

4

Page 5: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Example:X1 X2P(X1) P(X2)

P(X3 | X1)

P(X4 | X2)

P(X6 | X3 , X4)

P(X9 | X6)

P(X8 | X7 , X6)

P(X5 | X1)

P(X7 | X5)

X5

X7

X3

X8

X6

X9

X4

P(X1, . . . , X9) =

= P(X9|X8, . . . , X1) · P(X8|X7, . . . , X1) · . . . · P(X2|X1) · P(X1)

= P(X9|X6) · P(X8|X7, X6) · P(X7|X5) · P(X6|X4, X3)

·P(X5|X1) · P(X4|X2) · P(X3|X1) · P(X2) · P(X1)

5

Page 6: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Typical use of Bayesian networks

• to model and explain a domain.

• to update beliefs about states of certain variables when someother variables were observed, i.e., computing conditionalprobability distributions, e.g., P(X23|X17 = yes, X54 = no).

• to find most probable configurations of variables

• to support decision making under uncertainty

• to find good strategies for solving tasks in a domain withuncertainty.

6

Page 7: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Simplified diagnostic exampleWe have a patient.Possible diagnoses: tuberculosis, lung cancer, bronchitis.

7

Page 8: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

We don’t know anything about the pa-tient

Patient is a smoker.

8

Page 9: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Patient is a smoker. ... and he complains about dyspnoea

9

Page 10: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Patient is a smoker and complainsabout dyspnoea

... and his X-ray is positive

10

Page 11: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Patient is a smoker and complainsabout dyspnoea and his X-ray is pos-itive

... and he visited Asia recently

11

Page 12: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Application 2 :Decision makingThe goal: maximize expected utility

Hugin example: mildew4.net

12

Page 13: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Fixed and Adaptive Test Strategies

wrong

correct wrong

wrong correct

wrong

wrong

correct

correctwrong wrongcorrectcorrectcorrect

Q5

Q4

Q3

Q2

Q1

Q2

Q1

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q6 Q8

Q7

Q8

Q4 Q7 Q10

Q6

Q7

Q9

13

Page 14: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

For all nodes n of a strategy s wehave defined:

• evidence en, i.e. outcomes ofsteps performed to get to noden,

• probability P(en) of getting tonode n, and

• utility f (en) being a real num-ber.

Let L(s) be the set of terminalnodes of strategy s.Expected utility of strategy isE f (s) = ∑`∈L(s) P(e`) · f (e`).

14

Page 15: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

Strategy s? is optimal iff it maxi-mizes its expected utility.

Strategy s is myopically optimal iffeach step of strategy s is selectedso that it maximizes expected utilityafter the selected step is performed(one step look ahead).

15

Page 16: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Application 3 : Adaptive test of basicoperations with fractions

Examples of tasks:

T1:( 3

4 ·56

)− 1

8 = 1524 −

18 = 5

8 −18 = 4

8 = 12

T2: 16 + 1

12 = 212 + 1

12 = 312 = 1

4

T3: 14 · 1 1

2 = 14 ·

32 = 3

8

T4:( 1

2 ·12

)·( 1

3 + 13

)= 1

4 ·23 = 2

12 = 16 .

16

Page 17: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Elementary and operational skillsCP Comparison (common nu-

merator or denominator)

12 > 1

3 , 23 > 1

3

AD Addition (comm. denom.) 17 + 2

7 = 1+27 = 3

7

SB Subtract. (comm. denom.) 25 −

15 = 2−1

5 = 15

MT Multiplication 12 ·

35 = 3

10

CD Common denominator(

12 , 2

3

)=

(36 , 4

6

)CL Cancelling out 4

6 = 2·22·3 = 2

3

CIM Conv. to mixed numbers 72 = 3·2+1

2 = 3 12

CMI Conv. to improp. fractions 3 12 = 3·2+1

2 = 72

17

Page 18: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Misconceptions

Label Description Occurrence

MAD ab + c

d = a+cb+d 14.8%

MSB ab −

cd = a−c

b−d 9.4%

MMT1 ab ·

cb = a·c

b 14.1%

MMT2 ab ·

cb = a+c

b·b 8.1%

MMT3 ab ·

cd = a·d

b·c 15.4%

MMT4 ab ·

cd = a·c

b+d 8.1%

MC a bc = a·b

c 4.0%

18

Page 19: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Student model

CP

ACD

HV2 HV1

AD SB CMI CIM CL CD MT

MMT1 MMT2 MMT3 MMT4MCMAD MSB

ACMI ACIM ACL

19

Page 20: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Evidence model for task T1(34· 5

6

)− 1

8=

1524

− 18

=58− 1

8=

48

=12

T1 ⇔ MT & CL & ACL & SB & ¬MMT3 & ¬MMT4 & ¬MSB

CL

MMT4

MSB

SB

MMT3

ACL MT

T1

X1

P(X1 | T1)

Hugin: model-hv-2.net

20

Page 21: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Using information gain as the utility function

“The lower the entropy of a probability distribution the more we know.”

H (P(X)) = −∑x

P(X = x) · log P(X = x)

0

0.5

1

0 0.5 1

entr

opy

probability

Information gain in a node n of a strategy

IG(en) = H(P(S))− H(P(S | en))

21

Page 22: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Skill Prediction Quality

74

76

78

80

82

84

86

88

90

92

0 2 4 6 8 10 12 14 16 18 20

Qua

lity

of s

kill

pred

icti

ons

Number of answered questions

adaptiveaverage

descendingascending

22

Page 23: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Application 4: Troubleshooting

Dezide Advisor customized to a specific portal, seen from the user’sperspective through a web browser.

23

Page 24: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Application 2: Troubleshooting - Light print problem

FF3

F2

F1

F4

FaultsActions

A3

A2

A1

Q1

Problem

Questions

• Problems: F1 Distribution problem, F2 Defective toner, F3

Corrupted dataflow, and F4 Wrong driver setting.

• Actions: A1 Remove, shake and reseat toner, A2 Try anothertoner, and A3 Cycle power.

• Questions: Q1 Is the configuration page printed light?

24

Page 25: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Troubleshooting strategy

A1 = no

A2 = yes

Q1 = no

A1 = yesA2 = yes

Q1 = yes

A1 = yes

A2 = no

A1 = noA2 = no

A2

Q1

A1

A2 A1

The task is to find a strategy s ∈ S minimising expected cost of repair

ECR(s) = ∑`∈L(s)

P(e`) · ( t(e`) + c(e`) ) .

25

Page 26: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Expected cost of repair for a given strategy

A1 = no

A2 = yes

Q1 = no

A1 = yesA2 = yes

Q1 = yes

A1 = yes

A2 = no

A1 = noA2 = no

A2

Q1

A1

A2 A1

ECR(s) =

P(Q1 = no, A1 = yes) ·(cQ1 + cA1

)+P(Q1 = no, A1 = no, A2 = yes) ·

(cQ1 + cA1 + cA2

)+P(Q1 = no, A1 = no, A2 = no) ·

(cQ1 + cA1 + cA2 + cCS

)+P(Q1 = yes, A2 = yes) ·

(cQ1 + cA2

)+P(Q1 = yes, A2 = no, A1 = yes) ·

(cQ1 + cA2 + cA1

)+P(Q1 = yes, A2 = no, A1 = no) ·

(cQ1 + cA2 + cA1 + cCS

)

Demo: www.dezide.com Products/Demo/‘‘Try out expert mode’’

26

Page 27: Jir´ı Vomlelˇ - avcr.czstaff.utia.cas.cz/vomlel/slides/presentace-karny.pdf · 2005-09-02 · Bayesian network •a directed acyclic graph G = (V, E) •each node i ∈ V corresponds

Commercial applications of Bayesian networks ineducational testing and troubleshooting

• Hugin Expert A/S.software product: Hugin - a Bayesian network tool.http://www.hugin.com/

• Educational Testing Service (ETS)the world’s largest private educational testing organizationResearch unit doing research on adaptive tests using Bayesiannetworks: http://www.ets.org/research/

• SACSO ProjectSystems for Automatic Customer Support Operations- research project of Hewlett Packard and Aalborg University.The troubleshooter offered as DezisionWorks by Dezide Ltd.http://www.dezide.com/

27