r&n: 13.2-13.6,14.1-14.5; p&m: 8.1-8.4, 8.6 jacques fleuriot

71
T H E U N I V E R S I T Y O F E D I N B U R G H Probabilistic Reasoning R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot University of Edinburgh, School of Informatics [email protected] Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 1/71

Upload: others

Post on 18-Jul-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilistic ReasoningR&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6

Jacques Fleuriot

University of Edinburgh, School of Informatics

[email protected]

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 1/71

Page 2: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilities and Bayesian Inference

Note This week’s lectures will review the following at a quick pace:

1. Basic Properties of Probability

2. Bayes’ Rule and Inference

3. Product Probability Spaces

And then cover:

4. Representing Distributions with Bayesian Networks

5. How to Construct the Network

6. Inference using Networks

7. ProbLog: Logic programming with probabilities

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 2/71

Page 3: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilities

Can be used to model “objectively” the situation in the world.

A person with a cold sneezes (say at least once a day) withprobability 75%.

The objective interpretation is that this is a truly randomevent. Every person with a cold may or may not sneeze anddoes so with probability 75%.

Or . . .

Probabilities can model our “subjective” belief. For example:if we know that a person has a cold then we believe they willsneeze with probability 75%.The probabilities relate to an agent’s state of knowledge.They change with new evidence.

We will use the subjective interpretation, though probability theoryitself is independent of this choice.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 3/71

Page 4: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilities: Some Terminology I

Elementary Event: Outcome of some experiment e.g. coincomes out Head (H) when tossed

Sample Space: SET of possible elementary events e.g.{H,T}.If sample space, S , is finite, we can denote the number ofelementary events in S by n(S).

Example: For one throw of a die, the sample space is given by:

S = {1, 2, 3, 4, 5, 6}

and son(S) = 6

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 4/71

Page 5: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilities: Some Terminology II

Event: subset of sample space i.e. if event is denoted by Ethen

E ⊆ S

and also

n(E ) ≤ n(S)

Example: LetEo be event “the number is odd” when a die is rolled thenEo = {1, 3, 5} and n(Eo) = 3

Ee be event “the number is less than 3” when a die is rolledthen Ee = {1, 2} and n(Ee) = 2

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 5/71

Page 6: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilities: Running Example I

We throw 2 dice (each with 6 sides and uniform construction).

Each elementary event is a pair of number (a, b).

The sample space i.e. set of elementary events is

{(1, 1), (1, 2), . . . , (6, 6)}

Events i.e. subsets of the sample space include:E1: outcomes where the first die has value 1.E2: outcomes where the sum of the two numbers is even.E3: outcomes where the sum of the two numbers is at least11.

Events A and B are Mutually Exclusive if A ∩ B = ∅(the empty set).

All pairs of elementary events are mutually exclusive.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 6/71

Page 7: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilities: Running Example II

In Example:

Events E1 and E3 are mutually exclusiveEvents E1 and E2 are notEvents E2 and E3 are not

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 7/71

Page 8: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Classical Definition of Probability

If the sample space S consists of a finite (but non-zero) number ofequally likely outcomes, then the probability of an event E , writtenP(E ) is defined as

P(E ) =n(E )

n(S)

This definition satisfies the (Kolmogorov) axioms of probability :

1. P(E ) ≥ 0 for any event E since n(E ) ≥ 0 and n(S) > 0

2. P(S) = n(S)n(S) = 1

3. If A and B are mutually exclusive: P(A ∪ B) = P(A) + P(B)

This is because A ∩ B = ∅. So, n(A ∪ B) = n(A) + n(B), andtherefore

P(A∪B) =n(A ∪ B)

n(S)=

n(A) + n(B)

n(S)=

n(A)

n(S)+n(B)

n(S)= P(A)+P(B)

Note: P(A ∩ B) is also denoted by P(A,B) and P(A and B)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 8/71

Page 9: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probability Distributions

It is sufficient to specify the probability of elementary events.

Other probabilities can be computed from these.

In example: Each die gives a result 1, 2, 3, 4, 5, 6 withprobability 1

6 independently of the other die. So eachelementary event has probability 1

36

P(E1) = P((1, 1)) + . . .P((1, 6)) = 6/36 = 1/6

P(E2) = P((1, 1)) + P((1, 3)) + P((1, 5)) +P((2, 2)) + P((2, 4)) + P((2, 6)) + . . . = 1/2

P(E3) = P((5, 6)) + P((6, 5)) + P((6, 6)) = 3/36 = 1/12

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 9/71

Page 10: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Properties of Probabilities

We know that P(A) = n(A)n(S)

Since event A is a subset of sample space S ,

0 ≤ n(A) ≤ n(S) i.e. 0 ≤ n(A)

n(S)≤ 1

So, we have 0 ≤ P(A) ≤ 1

If P(A) = 0 then event cannot occur e.g. mutually exclusiveevents A and B, since then P(A ∩ B) = P(∅) = 0

If P(A) = 1 then event is certain to occur

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 10/71

Page 11: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Properties of Probabilities

Let A denote the event “A does not occur”

P(A) =n(A)

n(S)

=n(S)− n(A)

n(S)

= 1− n(A)

n(S)= 1− P(A)

So, P(A) = 1− P(A)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 11/71

Page 12: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Conditional Probability

If A and B are two events and P(B) 6= 0, then theprobability of A given B has already occurred is writtenP(A|B) and defined as

P(A|B) =P(A ∩ B)

P(B)

Where does this formula come from?Since we know that event B has already occurred, we knowthat the sample space for event A given event B is B.

P(A|B) =n(A ∩ B)

n(B)

=n(A ∩ B)/n(S)

n(B)/n(S)

=P(A ∩ B)

P(B)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 12/71

Page 13: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Conditional Probability: Example

P(E1 ∩ E2) = P((1, 1)) + P((1, 3)) + P((1, 5)) = 1/12P(E2 ∩ E3) = P((6, 6)) = 1/36P(E1 ∩ E3) = P(φ) = 0

P(E1|E2) = P(E1∩E2)P(E2) = 1/12

1/2 = 1/6

P(E2|E3) = P(E2∩E3)P(E3) = 1/36

1/12 = 1/3

P(E1|E3) = P(E1∩E3)P(E3) = 0

1/12 = 0

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 13/71

Page 14: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

More on Conditional Probability

Conditional probability rule is often written and used in thefollowing form:

P(A ∩ B) = P(A|B)P(B)

Since P(A ∩ B) = P(B ∩ A), we also have:

P(A ∩ B) = P(B|A)P(A)

and hence that P(A|B)P(B) = P(B|A)P(A)

if A and B are mutually exclusive events then, asP(A ∩ B) = 0 and P(B) 6= 0 (from def. of conditionalprobability), it follows that P(A|B) = 0.

Next, we look at an important result: Bayes’ Theorem

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 14/71

Page 15: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Bayes’ Theorem

We know that P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)

Reorganizing we get:

P(A|B) =P(A)P(B|A)

P(B)

Dice Example: P(E2|E1) = P(E2)P(E1|E2)P(E1) = (1/2)(1/6)

1/6 = 1/2

This will be the basis of our Bayesian inference and learningprocedures!

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 15/71

Page 16: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Independent Events I

If the occurrence or non-occurrence of an event A does notinfluence in any way the probability of an event B, then the eventB is (statistically) independent of event A, and we have:

P(A|B) = P(A)

Since

P(A|B)P(B) = P(B|A)P(A)

⇒ P(A)P(B) = P(B|A)P(A)

⇒ P(B|A) = P(B) also holds

But, we also know that

P(A ∩ B) = P(A|B)P(B)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 16/71

Page 17: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Independent Events II

So, A and B independent also means that

P(A ∩ B) = P(A)P(B)

In Dice Example:Events E1 and E2 are statistically independentEvents E2 and E3 are notEvents E1 and E3 are not

Independence can be used to simplify computations!

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 17/71

Page 18: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Independence: Example

Sometimes we know from the probabilistic experiment thatcertain events are “physically” independent. In this case theyare also statistically independent.

Event E4: outcomes where the second die has value 4.

Since each throw of the die is independent (the first throwdoes not change the probabilities of outcomes of the secondthrow), E1 and E4 are independent. They are also statisticallyindependent.

This implies: P(E1 ∩ E4) = P(E1)P(E4) = 1/36Can be verified using equations as above.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 18/71

Page 19: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probability Distributions

A random variable associates a value with each possibleoutcome of an experiment e.g. X = H or X = T for outcomesof tossing a coin.

An elementary event is an assignment of values to allvariables.

A probability distribution describes the probability of everyelementary event and can be represented using a probabilitytable.

A Joint Probability Distribution contains information aboutprobabilities of all variables and the connections betweenthem.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 19/71

Page 20: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Joint Distribution: Example

Two variables A and B each of which can take values (0, 1)(i.e. boolean variables).

Table has one column for each variable

Table has 2× 2 = 4 rows

A B P(A,B)

0 0 v00

0 1 v01

1 0 v10

1 1 v11

P(A = 0,B = 0) = v00

P(A = 1,B = 0) = v10

Particular probabilitiesdenoted by P()

The table is denoted byP(A,B)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 20/71

Page 21: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Joint Probability Distribution: Another Example

We have 4 variables describing the situation about a personwith an allergy to cats:Cold: person has a coldCat: there is a cat in the vicinity of the personAllergy: the persons is showing an allergic reactionSneeze: person sneezed

Each of these is Boolean: taking values 0 or 1.

The joint distribution can be described in a table of size24 = 16.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 21/71

Page 22: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

The Joint Distribution

Cold Cat Allergy Sneeze P(Cold ,Cat,Allergy ,Sneeze)

0 0 0 0 0.846450 0 0 1 0.00 0 1 0 0.0044550 0 1 1 0.0400950 1 0 0 0.00180 1 0 1 0.00 1 1 0 0.000720 1 1 1 0.006481 0 0 0 0.0094051 0 0 1 0.0846451 0 1 0 0.0002251 0 1 1 0.0047251 1 0 0 0.000021 1 0 1 0.000181 1 1 0 0.000041 1 1 1 0.00076

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 22/71

Page 23: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Marginal Distribution I

This is an induced probability distribution for one variable ormore variables obtained from a joint distribution.

To compute the marginal distribution of one or morevariables, we ignore the other variables.

Example:

A B P(A,B)

0 0 v00

0 1 v01

1 0 v10

1 1 v11

A P(A)

0 v00 + v01

1 v10 + v11

B P(B)

0 v00 + v10

1 v01 + v11

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 23/71

Page 24: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Marginal Distribution II

How do we “ignore variables”?

Example: We ignore B when computing marginal distributionfor A by summing it out. This is described generically by:

P(A) =∑

v∈{0,1}

P(A,B = v)

Similarly, for marginal distribution of B, we have:

P(B) =∑

v∈{0,1}

P(A = v ,B)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 24/71

Page 25: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Marginal Distribution III

And for particular values:

P(A = 0) =1∑

v=0

P(A = 0 and B = v)

Dice Example:

P(X1 = 1) =6∑

v=1

P(X1 = 1 and X2 = v) = 1/6

Allergy Example:

P(Cold = 1) = Σ . . . = 0.1

P(Cold = 0) = Σ . . . = 0.9

P(Sneeze = 1) = Σ . . . = 0.136885

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 25/71

Page 26: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Marginal Distribution: Another Example

C A B P(A,B|C )

0 0 0 v000

0 0 1 v001

0 1 0 v010

0 1 1 v011

1 0 0 v100

1 0 1 v101

1 1 0 v110

1 1 1 v111

P(A,B|C )

Sum out B:

C A P(A|C )

0 0 v000 + v001

0 1 v010 + v011

1 0 v100 + v101

1 1 v110 + v111

P(A|C ) =∑

v P(A,B = v |C )

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 26/71

Page 27: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Inference using the Joint

Compute probabilities of events:P(Cold = 1 and Sneeze = 1) =

∑. . . = 0.0902875

Causal Inference:P(Sneeze = 1|Cold = 1) = 0.0902875

0.1 = 0.902875

Diagnostic Inference:P(Cold = 1|Sneeze = 1) = 0.0902875

0.136885 = 0.66

Inter-Causal Inference:P(Cold = 1|Sneeze = 1 and Allergy = 1)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 27/71

Page 28: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Normalization

P(Cold = 1|Sneeze = 1) = P(Sneeze=1|Cold=1)P(Cold=1)P(Sneeze=1) = A

α

P(Cold = 0|Sneeze = 1) = P(Sneeze=1|Cold=0)P(Cold=0)P(Sneeze=1) = B

α

But Aα + B

α = 1 so α = A + B and

P(Cold = 1|Sneeze = 1) = AA+B

Will often use normalization to avoid computing thedenominator.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 28/71

Page 29: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Conditional Independence

Events A and B are statistically independent given event C iff

P(A ∩ B|C ) = P(A|C )P(B|C )

This is equivalent to the condition P(A|B ∩ C ) = P(A|C )

As in standard independence this can simplify thecomputations.

Event E5: outcomes where at least one die has value 6.

P(E2|E3) = 1/3

P(E2|E3 and E5) = 1/3

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 29/71

Page 30: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Conditional Independence: Derivation

P(A ∩ B|C ) =P(A ∩ B ∩ C )

P(C )

=P(A|B ∩ C )P(B ∩ C )

P(C )

=P(A|B ∩ C )P(B|C )P(C )

P(C )

So, P(A|C )P(B|C ) = P(A|B ∩ C )P(B|C )

⇒ P(A|B ∩ C ) = P(A|C )

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 30/71

Page 31: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Independence in Joint Probability Distributions

In joint probability distributions we can express a more generalform of independence.

X1 and X2 are independent iff P(X1|X2) = P(X1)

This means that for all v1 and v2

P(X1 = v1|X2 = v2) = P(X1 = v1)

And similarly for conditional independenceX1 and X2 are independent given X3 iffP(X1|X2,X3) = P(X1|X3)

This means that for all v1, v2, v3

P(X1 = v1|X2 = v2,X3 = v3) = P(X1 = v1|X3 = v3)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 31/71

Page 32: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Product Probability Spaces

We have n variables, X1, . . . ,Xn

Each variable Xi ranges over a finite set of valuesvi ,1, . . . , vi ,ki .

We can write a big table with n columns and∏

i ki rowsdescribing the probability of every elementary event.

Table grows exponentially with n.Not feasible unless n is very small.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 32/71

Page 33: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Bayesian Networks

Allow us to represent distributions more compactly.

Take advantage of the structure available in a domain.

Basic idea: represent dependence and independence explicitly.

If P(X1 ∩ X2) = P(X1)P(X2) then we can use two1-dimensional tables instead of a 2-dimensional table.

If each has 6 values, this means 12 entries instead of 36!

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 33/71

Page 34: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example

Edges represent “direct influence”

Assume that Cat and Cold do notdepend on other variables.

Allergy depends only on Cat

Sneeze depends on Cold and Allergy

Sneeze depends on CatBUT only through Allergy

For each node we associate aconditional probability table

Cold

Cat

Sneeze

Allergy

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 34/71

Page 35: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example II

The network structure expresses independence of variables.

P(Cat|Cold) = P(Cat)

P(Allergy |Cat,Cold) = P(Allergy |Cat)

P(Sneeze|Allergy ,Cat,Cold) = P(Sneeze|Allergy ,Cold)

More generally, the joint distribution can be expressed as theproduct of the distributions in the network. Example:

P(Cat,Cold ,Allergy ,Sneeze) =P(Cat)P(Cold)P(Allergy |Cat)P(Sneeze|Allergy ,Cold)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 35/71

Page 36: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example III

P(Cat = 0) P(Cat = 1)

0.99 0.01

P(Cold = 0) P(Cold = 1)

0.90 0.10

Cat P(Allergy = 0) P(Allergy = 1)

0 0.95 0.051 0.20 0.80

Cold Allergy P(Sneeze = 0) P(Sneeze = 1)

0 0 1.00 0.000 1 0.10 0.901 0 0.10 0.901 1 0.05 0.95

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 36/71

Page 37: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example: Joint Distribution in Terms of CPDs I

We can use the conditional probability rule (in its various forms)and Bayes’ Theorem to express Joint Distributions in terms ofconditional probability distributions (stored with nodes in Bayesiannetwork as Conditional Probability Tables (CPTs)).

Example: Express P(Cat,Cold ,Allergy ,Sneeze) in terms of knownconditional probability distributions (tables):

P(Cat),P(Cold),P(Allergy |Cat),P(Sneeze|Allergy ,Cold)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 37/71

Page 38: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example: Joint Distribution in Terms of CPDs II

P(Cat,Cold ,Allergy ,Sneeze)

= P(Sneeze|Allergy ,Cat,Cold)P(Allergy ,Cat,Cold)

by conditional probability rule applied to distributions

= P(Sneeze|Allergy ,Cold)P(Allergy |Cat,Cold)P(Cat,Cold)

by conditional probability rule applied to distributions

= P(Sneeze|Allergy ,Cold)P(Allergy |Cat)P(Cat|Cold)P(Cold)

by conditional probability rule applied to distributions

= P(Sneeze|Allergy ,Cold)P(Allergy |Cat)P(Cat)P(Cold)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 38/71

Page 39: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Another Useful Result

P(A,B|C ) = P(A|B,C )P(B|C ) (†)Proof:

RHS = P(A|B,C )P(B|C ) =P(A,B,C )

P(B,C )· P(B,C )

P(C )

=P(A,B,C )

P(C )

=P(A,B|C )P(C )

P(C )

= P(A,B|C ) = LHS

Also:

P(A = a,B = b|C = c) = P(A = a|B = b,C = c)P(B = b|C = c)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 39/71

Page 40: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Bayesian Networks I

We have n variables, X1, . . . ,Xn

The graph has n nodes, each corresponding to one variable.

The graph is acyclic; there is no directed path from Xi toitself.

For each node associate a conditional probability table (CPT)describing P(Xi | parents(Xi )).

The joint probability distribution can be expressed as theproduct of the distributions in the network:

P(X1,X2, . . . ,Xn) =n∏

i=1

P(Xi | parents(Xi ))

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 40/71

Page 41: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Bayesian Networks II

P(Cat = 0,Cold = 1,Allergy = 0,Sneeze = 1) =P(Cat = 0)P(Cold = 1)P(Allergy = 0|Cat = 0)P(Sneeze = 1|Allergy = 0,Cold = 1) =

0.99 · 0.1 · 0.95 · 0.9 = 0.084645

So to represent a distribution we need to represent thenetwork and CPTs.

If for all nodes the number of parents is small then all CPTsare small and we have a compact representation.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 41/71

Page 42: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

How to Construct a Network? I

Can represent any distribution using a network.By repeated application of P(A,B) = P(A|B)P(B)

Choose ordering of variables X1, . . . ,Xn.

For i=1 to nAdd Xi to network with P(Xi |X1, . . . ,Xi−1).

The joint distribution is:P(X1,X2, . . . ,Xn) =

∏ni=1 P(Xi |X1, . . . ,Xi−1).

But this is no improvement as Xn is connected to allpredecessors (so we need a huge table for it).

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 42/71

Page 43: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

How to Construct a Network? II

Instead, at each stage choose a subset such thatparents(Xi ) ⊆ {X1, . . . ,Xi−1}.

Choice of parents must satisfyP(Xi |X1, . . . ,Xi−1) = P(Xi | parents(Xi ))

so that Xi is independent of other predecessors given itsparents.

If parents(Xi ) is small then representation is compact.

For example if for all Xi , | parents(Xi )| ≤ 3 theninstead of 2n entries we have n23 = 8n entries !

How should we order the variables?

Causal links tend to produce small representations.

Choose “root causes” first. Then continue with causalstructure as much as possible.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 43/71

Page 44: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example

You are at work, neighbour John calls to say your home alarm isringing, but neighbour Mary doesn’t call. Sometimes alarm set offby minor earthquakes. Is there a burglar?

Variables and ordering:

Burglar , Earthquake, Alarm, JohnCalls, MaryCalls

Network topology reflects “causal” knowledge:

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 44/71

Page 45: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

The Network

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

t

f.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29

.001

.94

.002

P(E)

A P(M)

t

f.70

.01

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 45/71

Page 46: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

What if we choose another ordering?

Variable ordering (1):

MaryCalls, JohnCalls, Alarm, Burglar , Earthquake

Variable ordering (2):

MaryCalls, JohnCalls, Earthquake Burglar , Alarm,

JohnCalls

MaryCalls

Alarm

Burglary

Earthquake

MaryCalls

Alarm

Earthquake

Burglary

JohnCalls

(a) (b)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 46/71

Page 47: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

How to Construct a Network

May work with domain experts to decide on structure andthen find values of probabilities.

Choosing a “bad order” can have adverse effect:

Large probability tables.“Unnatural” dependencies, that are in turn hard to estimate.

Luckily experts are often good at identifying causal structure,and prefer giving probability judgements for causal rules.

Learning the CPTs and even the structure is possible.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 47/71

Page 48: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Computing with Bayes Nets

We have seen how to reconstruct the joint distribution fromthe network.

Can we compute other probabilities efficiently?

P(Cold = 1 and Sneeze = 1) =?P(Sneeze = 1|Cold = 1) =?P(Cold = 1|Sneeze = 1) =?P(Cold = 1|Sneeze = 1 and Allergy = 1) =?

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 48/71

Page 49: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Computing a Marginal Distribution I

We want to compute P(Cold ,Sneeze).

By conditional probability rule:

P(Cold , Sneeze) = P(Sneeze|Cold)P(Cold)

Recall that we have the following CPTs attached to nodes of ournetwork:

P(Cat) P(Cold)

P(Allergy |Cat) P(Sneeze|Allergy ,Cold)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 49/71

Page 50: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Computing a Marginal Distribution II

We need to compute P(Sneeze|Cold):

P(Sneeze|Cold)

=∑

v∈{0,1}

P(Sneeze,Allergy = v |Cold)

by def. of marginal distribution (we sum Allergy out)

=∑

v∈{0,1}

P(Sneeze|Allergy = v ,Cold)P(Allergy = v |Cold)

by result (†)=

∑v∈{0,1}

P(Sneeze|Allergy = v ,Cold)P(Allergy = v)

Since Allergy is independent of Cold

We can read out P(Sneeze|Allergy = v ,Cold) for all v using ourCPT for P(Sneeze|Allergy ,Cold).

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 50/71

Page 51: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Computing a Marginal Distribution III

We now need to compute the marginal distribution P(Allergy).

P(Allergy = v)

=∑

v2∈{0,1}

P(Allergy = v ,Cat = v2)

by def. of marginal distribution (we sum Cat out)∑v2∈{0,1}

P(Allergy = v |Cat = v2)P(Cat = v2)

by def. of conditional probability rule

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 51/71

Page 52: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Computing a Marginal Distribution IV

So, we have now expressed all computations in terms of the CPTsin the network. We can write the computation fully as:

P(Cold , Sneeze)

= (∑

v∈{0,1}

(P(Sneeze|Allergy = v ,Cold)

∑v2∈{0,1}

P(Allergy = v |Cat = v2)P(Cat = v2)))

P(Cold)

In general, “sum out” variables that do not appear in the query.Try to maintain small tables along the way.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 52/71

Page 53: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Resulting Distributions

P(Allergy = 0) P(Allergy = 1)

0.9425 0.0575

Cold P(Sneeze = 0) P(Sneeze = 1)

0 0.94825 0.051751 0.097125 0.902875

Cold Sneeze P(Sneeze,Cold)

0 0 0.8534250 1 0.0465751 0 0.00971251 1 0.0902875

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 53/71

Page 54: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Causal Inference

To compute P(Sneeze = 1|Cold = 1)

First compute P(Allergy) as in previous example.

Then compute

P(Sneeze = 1|Cold = 1) =∑v∈{0,1} P(Sneeze = 1|Allergy = v ,Cold = 1)P(Allergy = v)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 54/71

Page 55: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Diagnostic Inference

Use Bayes’ Rule to compute P(Cold = 1|Sneeze = 1)

A1 = P(Cold = 1|Sneeze = 1) =P(Sneeze=1|Cold=1)P(Cold=1)

P(Sneeze=1) = N1P(Sneeze=1)

A0 = P(Cold = 0|Sneeze = 1) =P(Sneeze=1|Cold=0)P(Cold=0)

P(Sneeze=1) = N0P(Sneeze=1)

But A0 + A1 = 1 so

P(Cold |Sneeze = 1) =

(P(Cold = 0|Sneeze = 1),P(Cold = 1|Sneeze = 1)) =

( N0N0+N1

, N1N0+N1

), using normalization.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 55/71

Page 56: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Diagnostic Inference

From the CPT we have:P(Cold = 0) = 0.90 and P(Cold = 1) = 0.10

P(Sneeze = 1|Cold = 0) = 0.05175P(Sneeze = 1|Cold = 1) = 0.902875computed as in previous example using P(Allergy)

N0 = 0.05175 · 0.90 = 0.046575

N1 = 0.902875 · 0.10 = 0.0902875

P(Cold = 1|Sneeze = 1) = N1N0+N1

= 0.66

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 56/71

Page 57: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Inference in Burglary Example

Use B,E ,A, J,M to denote variables

Compute P(A = 1|E = 1, J = 1)

P(A = 1|E = 1, J = 1) = P(J=1|A=1,E=1)P(A=1|E=1)P(J=1|E=1)

By computing and normalisingP(J = 1|A = b,E = 1)P(A = b|E = 1) for b ∈ {0, 1}P(A = 1|E = 1) = 0.001 · 0.95 + 0.999 · 0.29 = 0.29066

P(A = 0|E = 1) = 0.70934

P(J = 1|A = 1,E = 1)P(A = 1|E = 1) = 0.90 · 0.29066 =0.261994

P(J = 1|A = 0,E = 1)P(A = 0|E = 1) = 0.05 · 0.70934 =0.035467

P(A = 1|E = 1, J = 1) = 0.2619940.261994+0.035467 = 0.88

Exercise: Work out the details of all the computations.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 57/71

Page 58: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Inference in Burglary Example: Derivation I

P(A = 1|E = 1, J = 1)

=P(A = 1,E = 1, J = 1)

P(J = 1,E = 1)

=P(J = 1|A = 1,E = 1)P(A = 1,E = 1)

P(J = 1,E = 1)

=P(J = 1|A = 1,E = 1)P(A = 1|E = 1)P(E = 1)

P(J = 1|E = 1)P(E = 1)

=P(J = 1|A = 1,E = 1)P(A = 1|E = 1)

P(J = 1|E = 1)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 58/71

Page 59: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Inference in Burglary Example: Derivation II

P(A = 1|E = 1)

=P(A = 1,E = 1)

P(E = 1)

by def. of marginal distribution:

=Σv∈{0,1}P(A = 1,E = 1,B = v)

P(E = 1)

=Σv∈{0,1}P(A = 1|E = 1,B = v)P(E = 1,B = v)

P(E = 1)

since E (earthquake) and B (burglary) are independent:

=Σv∈{0,1}P(A = 1|E = 1,B = v)P(E = 1)P(B = v)

P(E = 1)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 59/71

Page 60: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Inference in Burglary Example: Derivation III

= Σv∈{0,1}P(A = 1|E = 1,B = v)P(B = v)

by expanding summation:

= P(A = 1|E = 1,B = 0)P(B = 0) + P(A = 1|E = 1,B = 1)P(B = 1)

by probabilities from CPTs in network

= (0.29)(0.999) + (0.95)(0.01) = 0.29066

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 60/71

Page 61: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Inference

In general the situation is more complicated than in oursimple network. The problem is NP-Hard.

Can always compute via the joint but this may not be efficient.

Efficient algorithms are known for graphs with polytreestructure. These are a singly connected networks. There areno cycles even ignoring the direction of edges.

Otherwise, try to turn graph into a tree, or use simulationmethods to approximate the probability.

Simulation is not guaranteed to give good answers but workswell in some cases.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 61/71

Page 62: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Inference by Simulation

To compute P(X = v1|Y = v2) = P(X=v1,Y=v2)P(Y=v2)

Repeat many times:choose random values for all nodes using CPTs

by choosing first for nodes with no parentsand then for nodes whose parents have been chosen

if Y = v2 was chosen then:Ycounter = Ycounter +1if X = v1 was chosen then Xcounter = Xcounter+1

Return Xcounter/Ycounter

May require many rounds if Y = v2 occurs with lowprobability.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 62/71

Page 63: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Simulation by Likelihood Weighting

Improves the simulation by forcing evidence values (Y = v2)to be chosen.

Initialise weight w = 1

In sampling if a node Xi corresponds to an evidence variablechoose the given value vi .

The current round will not have full weight but insteadw ← w · P(Xi = vi | values chosen for parents)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 63/71

Page 64: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example I

Compute P(Allergy = 1|Cold = 1, Sneeze = 1)

Set up general counter CG and counter CA for Allergy = 1(both initialised to zero).

Here we illustrate just one sample

Choose Cold = 1 so w ← 0.1

Choose Cat randomly. In the following we assume 0 waschosen.

Choose Allergy using P(Allergy |Cat = 0).In the following we assume 0 was chosen.

Choose Sneeze = 1 and updatew ← w · P(Sneeze = 1|Cold = 1,Allergy = 0) = 0.1 · 0.9 =0.09

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 64/71

Page 65: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example II

The round is over.We forced the evidence to succeed.But Allergy = 1 did not succeed.

CG ← CG + 0.09CA ← CA (no change)

After Many rounds return CA/CG

Faster than simple simulation but may still need a long timeto get good answers

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 65/71

Page 66: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Probabilistic Logic Programming (PLP)

Growing interest in combining probability and logic to dealwith complex relational domains.

Probabilistic logic programs enable some of the facts to beannotated with probabilities i.e. they extend logic programs.

Related field: Statistical Relational Learning, which extendsprobabilistic graphical models like Bayesian networks withlogical and relational representations.

Many inference and learning tasks that use graphical modelscan be expressed as PLP programs.

Example PLPs languages include ProbLog, ICL, PRISM, etc.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 66/71

Page 67: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Problog Syntax in a Nutshell

A ProbLog program consists of two parts: a set of groundprobabilistic facts, and a logic program.1

A ground probabilistic fact, written p::f, is a ground fact fannotated with a probability p.

Probabilistic statements are of the form p::f(X1,X2,...,Xn) :-

body, where body is a conjunction of calls to non-probabilistic facts.

Such a statement defines the domains of the variables X1,X2, ...Xn.

An atom that unifies with a ground probabilistic fact is called aprobabilistic atom, while an atom that unifies with the head of somerule in the logic program is called a derived atom.

The set of probabilistic atoms must be disjoint from the set ofderived atoms.

All variables in the head of a rule should also appear in a positiveliteral in the body of the rule.

1https://dtai.cs.kuleuven.be/problog/index.htmlJacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 67/71

Page 68: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Burglar Example revisited in Problog

A possible (propositional) ProbLog program for

computing P(A = 1|E = 1, J = 1):

0.01::burglary.

0.02::earthquake.

0.95::alarm :- burglary, earthquake.

0.94::alarm :- burglary, \+ earthquake.

0.29::alarm :- \+ burglary, earthquake.

0.01::alarm :- \+ burglary, \+ earthquake.

0.9::calls john :- alarm.

0.05::calls john :- \+alarm.

0.7::calls mary :- alarm.

0.01::calls mary :- \+alarm.

evidence(calls john,true).

evidence(earthquake, true).

query(alarm).

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

t

f.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29

.001

.94

.002

P(E)

A P(M)

t

f.70

.01

Note: \+ denotes negation.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 68/71

Page 69: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Example: A first order Problog programA slightly different (extended) version of burglary

example.

person(john).

person(mary).

0.01::burglary.

0.02::earthquake.

0.95::alarm :- burglary, earthquake.

0.94::alarm :- burglary, \+ earthquake.

0.29::alarm :- \+ burglary, earthquake.

0.01::alarm :- \+ burglary, \+ earthquake.

0.8::calls(X) :- alarm, person(X).

0.1::calls(X) :- \+ alarm, person(X).

evidence(calls(john),true).

evidence(calls(mary),true).

query(burglary).

query(earthquake).

Assume there are N people and

each person independently calls

the police with a certain

probability, depending on the

alarm ringing or not: if the alarm

rings, the probability of calling is

0.8, otherwise it is 0.1. The case

for N = 2 is shown.

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 69/71

Page 70: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Diagnostic: P(Cold = 1|Sneeze = 1)

Our allergy example revisited as ProbLog program:

0.01::cat.

0.10::cold.

0.8::allergy :- cat.

0.05::allergy :- \+ cat.

0.95::sneeze:- cold, allergy.

0.90::sneeze:- \+ cold, allergy.

0.90::sneeze:- cold, \+ allergy.

evidence(sneeze,true).

query(cold).

Question: Why do we need only 3 clauses to capture P(Sneeze|Cold ,Allergy)?Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 70/71

Page 71: R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 Jacques Fleuriot

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Exercise

A cab was involved in a hit-and-run accident at night. Two cabcompanies, the Green and the Blue, operate in the city. You aregiven the following data:

85% of the cabs in the city are Green and 15% are Blue.

A witness identified the cab as Blue. The court tested thereliability of the witness in the circumstances that existed onthe night of the accident and concluded that the witnesscorrectly identifies each one of the two colours 80% of thetime and failed 20% of the time.

1. Express the problem as a belief network and work out theprobability that the cab involved in the accident was Blue.

2. Express the problem as a ProbLog program (you may checkyour answer using the online engine on the ProbLog webpage).

(Problem by D. Kahneman, Thinking Fast and Slow, 2011)

Jacques Fleuriot Probabilistic Reasoning, R&N: 13.2-13.6,14.1-14.5; P&M: 8.1-8.4, 8.6 71/71