imprecise probabilistic graphical models - sipta · probabilistic graphical models data mining and...

Introduction Credal sets Credal networks Modelling observations Decision making Inference algorithms Other guys

Imprecise Probabilistic Graphical ModelsCredal Networks and Other Guys

Alessandro Antonucci & Cassio de Campos & Giorgio Corani

{alessandro,cassio,giorgio}@idsia.ch

SISPTA School on Imprecise Probability

Durham, September 5, 2010

Imprecise Probability Group @ IDSIA

IDSIA = Dalle Molle Institute for AI

Imprecise Probability Group(1 professor, 4 researchers, 1 phd)

Theory of imprecise probability

Probabilistic graphical models

Data mining and classification

Observations modelling (missing

data)

Data fusion and filtering

Applications to environmental

modelling, military decision making,

risk analysis, bioinformatics, biology,

tracking, vision, . . .

IDSIA

1990

2000

Marco

Alex

Cassio

Giorgio

Alessio

Denis

Zaffalon

Antonucc

i

de Camp

os

Corani

Benavoli

Mauà

data)

IDSIA

1990

2000

Marco

Alex

Cassio

Giorgio

Alessio

Denis

Zaffalon

Antonucc

i

de Camp

os

Corani

Benavoli

Mauà

data)

IDSIA

1990

2000

Marco

Alex

Cassio

Giorgio

Alessio

Denis

Zaffalon

Antonucc

i

de Camp

os

Corani

Benavoli

Mauà

data)

IDSIA

1990

2000

Marco

Alex

Cassio

Giorgio

Alessio

Denis

Zaffalon

Antonucc

i

de Camp

os

Corani

Benavoli

Mauà

data)

IDSIA

1990

2000

Marco

Alex

Cassio

Giorgio

Alessio

Denis

Zaffalon

Antonucc

i

de Camp

os

Corani

Benavoli

Mauà

data)

IDSIA

1990

2000

Marco

Alex

Cassio

Giorgio

Alessio

Denis

Zaffalon

Antonucc

i

de Camp

os

Corani

Benavoli

Mauà

Probabilistic Graphical Modelsaka Decomposable Multivariate Probabilistic Models

(whose decomposability is induced by independence )

X1 X2 X3

X4 X5

X6 X7 X8

global modelφ(X1,X2,X3,X4,X5,X6,X7,X8)local model

φ(X1,X2,X4)local modelφ(X2,X3,X5)

local modelφ(X4,X6,X7)

φ(X1, X2, X3, X4, X5, X6, X7, X8) = φ(X1, X2, X4) ⊗ φ(X2, X3, X5) ⊗ φ(X4, X6, X7) ⊗ φ(X5, X7, X8)

undirected graphs

precise/imprecise Markov random fields

directed graphs

Bayesian/credal networks

mixed graphs

chain graphs

X1 X2 X3

X4 X5

X6 X7 X8

undirected graphs

directed graphs

mixed graphs

chain graphs

X1 X2 X3

X4 X5

X6 X7 X8

global modelφ(X1,X2,X3,X4,X5,X6,X7,X8)

undirected graphs

directed graphs

mixed graphs

chain graphs

X1 X2 X3

X4 X5

X6 X7 X8

global modelφ(X1,X2,X3,X4,X5,X6,X7,X8)

undirected graphs

directed graphs

mixed graphs

chain graphs

X1 X2 X3

X4 X5

X6 X7 X8

undirected graphs

directed graphs

mixed graphs

chain graphs

X1 X2 X3

X4 X5

X6 X7 X8

undirected graphs

directed graphs

mixed graphs

chain graphs

X1 X2 X3

X4 X5

X6 X7 X8

undirected graphs

directed graphs

mixed graphs

chain graphs

[Exe #1] Fault trees (Vesely et al, 1981)

brake fails = [ pads ∨ ( sensor ∧ controller ∧ actuator ) ]

devices failures are independent

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

with [.7, 1] insteadP(brake fails)∈[.49,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

padsfails

ORgate

ANDgate

sensorfails

controllerfails

actuatorfails

brakefails

111

0

?

1

1.8.8

.2

?

.64

= .2× .64 + .8× .64 + .2× .36 =.712

.712

1[.8,1][.8,1]

[0,.2]

?

[.64,1]

Indecision!

Outline

Motivations for imprecise probability

Credal sets (basic concepts and operations)

Independence relations

Credal networks

Modelling observations/missingness

Decision making

Inference algorithms

Other probabilistic graphical models

Conclusions

Three different levels of knowledge

FIFA’10 final match between Holland and SpainResult of Holland after the regular time? Win, draw or loss?

DETERMINISM

The Dutch goalkeeper isunbeatable and Holland

always makes a goal

Holland (certainly) wins

P(Win)P(Draw)P(Loss)

=

100

UNCERTAINTY

Win is two times moreprobable than draw, and

this being three timesmore probable than loss

=

.6.3.1

IMPRECISION

Win is more probablethan draw, and this is

more probable than loss

P(Win) > P(Draw)P(Draw) > P(Loss)

=

[ α3 + β +

γ2

α3 +

γ2

α3

]∀α, β, γ such thatα > 0, β > 0, γ > 0,α+ β + γ = 1

DETERMINISM

always makes a goal

=

100

UNCERTAINTY

=

.6.3.1

IMPRECISION

=

[ α3 + β +

γ2

α3 +

γ2

α3

DETERMINISM

always makes a goal

=

100

UNCERTAINTY

=

.6.3.1

IMPRECISION

=

[ α3 + β +

γ2

α3 +

γ2

α3

DETERMINISM

always makes a goal

=

100

UNCERTAINTY

=

.6.3.1

IMPRECISION

=

[ α3 + β +

γ2

α3 +

γ2

α3

Three different levels of knowledge (ii)

DETERMINISM UNCERTAINTY IMPRECISION

INFORMATIVENESS

EXPRESSIVENESS

Limit when learning fromlarge (complete) data sets

Propositional(Boolean) Logic

Bayesian Theoryof Probability

Walley’s Theory ofCoherent LowerPrevisions

Natural Embedding (de Cooman)

Only special cases

Small/incomplete dataExpert’s (qualitative) knowledgeVague/incomplete observations

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

INFORMATIVENESS

EXPRESSIVENESS

Only special cases

From CLPs to Credal Sets

probability distributionp : X → R{

p(x) ≥ 0∀x ∈ X∑x∈X p(x) = 1

coherent linear previsionP : L(X )→ R{

P(f + g) = P(f ) + P(g)P(f ) ≥ inf f , ∀f , g ∈ L(X )

P(f ) =∑

x∈X p(x)f (x)

P(x) = P(I{x})

Bayesian / precise

Modelling knowledge about X , taking values in X

Credal / imprecise

set of distributionsK (X) = { p(X) | constraints }

coherent lower previsionP : L(X )→ R P(f ) ≥ min f ∀f ∈ L(X )P(λf ) = λP(f )∀λ > 0P(f + g) ≥ P(f ) + P(g)∃ set M of linear previsions s.t.

P(f ) = infP∈M P(f ) ∀f ∈ L(X )lower envelope Theorem

(a set and its convex closure have the same lower envelope!)

convex closed

p(x) ≥ 0∀x ∈ X∑x∈X p(x) = 1

P(f ) =∑

x∈X p(x)f (x)

P(x) = P(I{x})

Bayesian / preciseModelling knowledge about X , taking values in X

Credal / imprecise

convex closed

p(x) ≥ 0∀x ∈ X∑x∈X p(x) = 1

P(f ) =∑

x∈X p(x)f (x)

P(x) = P(I{x})

Credal / imprecise

convex closed

p(x) ≥ 0∀x ∈ X∑x∈X p(x) = 1

P(f ) =∑

x∈X p(x)f (x)

P(x) = P(I{x})

Credal / imprecise

coherent lower previsionP : L(X )→ R P(f ) ≥ min f ∀f ∈ L(X )P(λf ) = λP(f )∀λ > 0P(f + g) ≥ P(f ) + P(g)

∃ set M of linear previsions s.t.P(f ) = infP∈M P(f ) ∀f ∈ L(X )

lower envelope Theorem(a set and its convex closure have the same lower envelope!)

convex closed

p(x) ≥ 0∀x ∈ X∑x∈X p(x) = 1

P(f ) =∑

x∈X p(x)f (x)

P(x) = P(I{x})

Credal / imprecise

P(f ) = infP∈M P(f ) ∀f ∈ L(X )

lower envelope Theorem(a set and its convex closure have the same lower envelope!)

convex closed

p(x) ≥ 0∀x ∈ X∑x∈X p(x) = 1

P(f ) =∑

x∈X p(x)f (x)

P(x) = P(I{x})

Credal / imprecise

convex closed

p(x) ≥ 0∀x ∈ X∑x∈X p(x) = 1

P(f ) =∑

x∈X p(x)f (x)

P(x) = P(I{x})

Credal / imprecise

convex closed

Credal sets (Levi, 1980)

A closed convex set K (X ) of probability mass functions

Equivalently described by its extreme points ext[K (X )]

Focus on categorical variables (|X | < +∞)and finitely generated CSs (polytopes, |ext[K (X )]| < +∞)

V-representation

enumerate the

extreme points

H-representation

linear constraints

on the probabilities

LRS software

(Avis & Fukuda)

Given a function (gamble) f (X ), lower expectation:

E [f (X )] := minP(X)∈K (X)∑

x∈X P(x)f (x)

LP task: the optimum is on an extreme!

E [f (X )] = minP(X)∈ext[K (X)]∑

x∈X P(x)f (x)

V-representation

enumerate the

extreme points

H-representation

linear constraints

LRS software

(Avis & Fukuda)

E [f (X )] := minP(X)∈K (X)∑

x∈X P(x)f (x)

V-representation

enumerate the

extreme points

H-representation

linear constraints

LRS software

(Avis & Fukuda)

E [f (X )] := minP(X)∈K (X)∑

x∈X P(x)f (x)

V-representation

enumerate the

extreme points

H-representation

linear constraints

LRS software

(Avis & Fukuda)

E [f (X )] := minP(X)∈K (X)∑

x∈X P(x)f (x)

V-representation

enumerate the

extreme points

H-representation

linear constraints

LRS software

(Avis & Fukuda)

E [f (X )] := minP(X)∈K (X)∑

x∈X P(x)f (x)

V-representation

enumerate the

extreme points

H-representation

linear constraints

LRS software

(Avis & Fukuda)

E [f (X )] := minP(X)∈K (X)∑

x∈X P(x)f (x)

V-representation

enumerate the

extreme points

H-representation

linear constraints

LRS software

(Avis & Fukuda)

E [f (X )] := minP(X)∈K (X)∑

x∈X P(x)f (x)

Credal Sets over Boolean Variables

Boolean X , values in X = {x ,¬x}Determinism ≡ degenerate mass fE.g., X = x ⇐⇒ P(X) =

[10

]Uncertainty ≡ prob mass functionP(X) =

[p

1− p

]with p ∈ [0, 1]

Imprecision credal seton the probability simplex

K (X) ≡{

P(X) =[

p1− p

] ∣∣∣.4 ≤ p ≤ .7}A CS over a Boolean variable cannothave more than two vertices!

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

Boolean X , values in X = {x ,¬x}

Determinism ≡ degenerate mass fE.g., X = x ⇐⇒ P(X) =

[10

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

]

Uncertainty ≡ prob mass functionP(X) =

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

]

Uncertainty ≡ prob mass functionP(X) =

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]

P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]

P(X ) =[

.7

.3

]

P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]

P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

] ∣∣∣.4 ≤ p ≤ .7}

A CS over a Boolean variable cannothave more than two vertices!

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

[10

[p

1− p

]with p ∈ [0, 1]

K (X) ≡{

P(X) =[

p1− p

ext[K (X)] ={[

.7

.3

],

[.4.6

]}

P(x)

P(¬x)

P(X ) =[

10

]P(X ) =[

.7

.3

]P(X ) =[

.4

.6

]

.4 .7

.6

.3

Geometric Representation of CSs (ternary variables)

Ternary X (e.g., X = {win,draw,loss})

P(X ) ≡ point in the space (simplex)No bounds to |ext[K (X )]|Modelling ignorance

Uniform models indifferenceVacuous credal set

Expert qualitative knowledgeWin is more probable than draw,which more probable than loss

Learning from small datasetsLearning with multiple priors(e.g., IDM with s = 2)

Learning from incomplete dataConsidering all the possibleexplanation of the missing data

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

From natural language tolinear contraints on probabilities

extremely probable P(x) ≥ 0.98very high probability P(x) ≥ 0.9

highly probable P(x) ≥ 0.85very probable P(x) ≥ 0.75

has a very good chance P(x) ≥ 0.65quite probable P(x) ≥ 0.6

P(x) ≥ 0.5has a good chance 0.4 ≤ P(x) ≤ 0.85

is improbable (unlikely) P(x) ≤ 0.5is somewhat unlikely P(x) ≤ 0.4

is very unlikely P(x) ≤ 0.25has little chance P(x) ≤ 0.2

is highly improbable P(x) ≤ 0.15is has very low probability P(x) ≤ 0.1

is extremely unlikely P(x) ≤ 0.02

P(win)

P(draw)

P(loss)

Previous matches:Holland 4 wins,Draws 1,

Spain 3 wins

1957: Spain vs. Holland 5 − 11973: Holland vs. Spain 3 − 21980: Spain vs. Holland 1 − 01983: Spain vs. Holland 1 − 01983: Holland vs. Spain 2 − 11987: Spain vs. Holland 1 − 12000: Spain vs. Holland 1 − 22001: Holland vs. Spain 1 − 02005: Spain vs. Holland ∗ − ∗ (missing)2008: Holland vs. Spain ∗ − ∗ (missing)

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)

K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)

K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)

K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)

K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

P(win)

P(draw)

P(loss)

P(X) =

.6.3.1

P(X)K (X)

P0(x) = 1|ΩX |

P0(X)K0(X)

K0(X)={P(X)

∣∣∣∣ ∑x P(x) = 1,P(x) ≥ 0}

(Walley, 1991)

P(win)

P(draw)

P(loss)

Spain 3 wins

Basic operations with credal sets

Joint

PRECISEMass functions

P(X ,Y )

IMPRECISECredal sets

K (X ,Y )

MarginalizationP(X ) s.t.

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

ConditioningP(X |y) s.t.

p(x |y) = P(x,y)∑y P(x,y)

K (X |y) ={P(X |y)

∣∣∣∣∣ P(x |y) = P(x,y)∑y P(x,y)P(X ,Y ) ∈ K (X ,Y )}

Combination P(x , y) = P(x |y)P(y) K (X |Y ) ⊗ K (Y ) =P(X ,Y )∣∣∣∣∣∣

P(x , y)=P(x |y)P(y)P(X |y) ∈ K (X |y)P(Y ) ∈ K (Y )

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Joint

P(X ,Y )

K (X ,Y )

p(x) =∑

y p(x , y)

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

Basic operations with credal sets (vertices)

Joint

Marginalization

Conditioning

Combination

K (X ,Y )

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

K (X |Y ) ⊗ K (Y ) =P(X ,Y )∣∣∣∣∣∣

IMPRECISEExtremes

= CH{

Pj (X ,Y )}nv

j=1

= CH{P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ ext[K (X ,Y )]}

= CH{P(X |y)

∣∣∣∣∣ P(x |y) = P(x,y)∑y P(x,y)P(X ,Y ) ∈ ext[K (X ,Y )]}

= CHP(X ,Y )∣∣∣∣∣∣

P(x , y)=P(x |y)P(y)P(X |y) ∈ ext[K (X |y)]P(Y ) ∈ ext[K (Y )]

[EXE]Prove it!

Joint

Marginalization

Conditioning

Combination

K (X ,Y )

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

K (X |Y ) ⊗ K (Y ) =P(X ,Y )∣∣∣∣∣∣

IMPRECISEExtremes

= CH{

Pj (X ,Y )}nv

j=1

= CH{P(X)

= CH{P(X |y)

= CHP(X ,Y )∣∣∣∣∣∣

[EXE]Prove it!

Joint

Marginalization

Conditioning

Combination

K (X ,Y )

K (X ) ={P(X)

∣∣∣∣ P(x) = ∑y P(x , y)P(X ,Y ) ∈ K (X ,Y )}

K (X |y) ={P(X |y)

K (X |Y ) ⊗ K (Y ) =P(X ,Y )∣∣∣∣∣∣

IMPRECISEExtremes

= CH{

Pj (X ,Y )}nv

j=1

= CH{P(X)

= CH{P(X |y)

= CHP(X ,Y )∣∣∣∣∣∣

[EXE]Prove it!

[Exe #2] An imprecise bivariate (graphical?) model

Two Boolean variables: Smoker,Lung Cancer

Eight “Bayesian” phisicians, eachone assessing Pj (S,C)

K (S,C) = CH{

Pj (S,C)}8

j=1

Compute:Marginal K (S)ConditioningK (C|S) := {K (C|s),K (C|s)}Combination (marg ext)K ′(C,S) := K (C|S)⊗ K (S)

Is this a (I)PGM?

Smoker Cancer

j Pj (s, c) Pj (s,¬c) Pj (¬s, c) Pj (¬s,¬c)

1 1/8 1/8 3/8 3/8

2 1/8 1/8 9/16 3/16

3 3/16 1/16 3/8 3/8

4 3/16 1/16 9/16 3/16

5 1/4 1/4 1/4 1/4

6 1/4 1/4 3/8 1/8

7 3/8 1/8 1/4 1/4

8 3/8 1/8 3/8 1/8

(0,1,0,0)

(0,0,1,0)

(1,0,0,0)

(0,0,0,1)

K (S,C) = CH{

Pj (S,C)}8

j=1

Is this a (I)PGM?

Smoker Cancer

1 1/8 1/8 3/8 3/8

2 1/8 1/8 9/16 3/16

3 3/16 1/16 3/8 3/8

4 3/16 1/16 9/16 3/16

5 1/4 1/4 1/4 1/4

6 1/4 1/4 3/8 1/8

7 3/8 1/8 1/4 1/4

8 3/8 1/8 3/8 1/8

(0,1,0,0)

(0,0,1,0)

(1,0,0,0)

(0,0,0,1)

K (S,C) = CH{

Pj (S,C)}8

j=1

Is this a (I)PGM?

Smoker Cancer

1 1/8 1/8 3/8 3/8

2 1/8 1/8 9/16 3/16

3 3/16 1/16 3/8 3/8

4 3/16 1/16 9/16 3/16

5 1/4 1/4 1/4 1/4

6 1/4 1/4 3/8 1/8

7 3/8 1/8 1/4 1/4

8 3/8 1/8 3/8 1/8

(0,1,0,0)

(0,0,1,0)

(1,0,0,0)

(0,0,0,1)

K (S,C) = CH{

Pj (S,C)}8

j=1

Is this a (I)PGM?

Smoker Cancer

1 1/8 1/8 3/8 3/8

2 1/8 1/8 9/16 3/16

3 3/16 1/16 3/8 3/8

4 3/16 1/16 9/16 3/16

5 1/4 1/4 1/4 1/4

6 1/4 1/4 3/8 1/8

7 3/8 1/8 1/4 1/4

8 3/8 1/8 3/8 1/8

(0,1,0,0)

(0,0,1,0)

(1,0,0,0)

(0,0,0,1)

K (S,C) = CH{

Pj (S,C)}8

j=1

Is this a (I)PGM?

Smoker Cancer

1 1/8 1/8 3/8 3/8

2 1/8 1/8 9/16 3/16

3 3/16 1/16 3/8 3/8

4 3/16 1/16 9/16 3/16

5 1/4 1/4 1/4 1/4

6 1/4 1/4 3/8 1/8

7 3/8 1/8 1/4 1/4

8 3/8 1/8 3/8 1/8

(0,1,0,0)

(0,0,1,0)

(1,0,0,0)

(0,0,0,1)

imprecise probabilistic graphical models - sipta · probabilistic graphical models data mining and...

Documents