probabilistic graphical modelsassets.disi.unitn.it/uploads/doctoral_school/... · introduction...

52
Probabilistic Graphical Models Lecture 1: Introduction and Probability Basics M. Jaeger Aalborg University Trento, May 2015 Probability 1 / 39

Upload: others

Post on 22-Aug-2020

34 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probabilistic Graphical ModelsLecture 1: Introduction and Probability Basics

M. Jaeger

Aalborg University

Trento, May 2015 Probability 1 / 39

Page 2: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction

Trento, May 2015 Probability 2 / 39

Page 3: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 1: SLAM

?

Simultaneous Localization and Mapping: learn a map of the environment and locate currentposition

Trento, May 2015 Probability 2 / 39

Page 4: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 1: SLAM cont.

A probabilistic graphical model for the SLAM problem:

cont0

pos0

sens0

cont1

pos1

sens1

cont2

pos2

sens2

cont3

pos3

sens3

map

cont t : control input at time t

post : position at time t

senst : sensor reading at time t

map: map of the environment

◮ Determine most probable position given map, controls, and sensor readings◮ Determine most probable map given position, controls, and sensor readings

S. Thrun, W. Burgard, and D. Fox: A probabilistic approach to concurrent mapping and localizationfor mobile robots. Autonomous Robots 5 (3-4), 253-271, 1998.

Trento, May 2015 Probability 3 / 39

Page 5: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 2: Image Segmentation

(source: http://pubs.niaaa.nih.gov/publications/arh313/243-246.htm)

◮ Divide image into small number of regions representing structurally similar areas

Trento, May 2015 Probability 4 / 39

Page 6: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 2: Image Segmentation cont.

A PGM for image segmentation:

seg5,7 seg5,8

seg6,7 seg6,8

rgb5,7 rgb5,8

rgb6,7 rgb6,8

segi,j : segment index of pixel (i , j)rgbi,j : color value of pixel (i , j)

◮ Determine most probable segmentation given the color values

Y. Zhang, M. Brady, and S. Smith: Segmentation of brain MR images through a hidden Markovrandom field model and the expectation-maximization algorithm. IEEE Transactions on MedicalImaging, 20(1), 45-57, 2001.

Trento, May 2015 Probability 5 / 39

Page 7: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 3: Statistical Semantics

Given a collection of texts:

Goal: automatically learn semantic descriptors for documents and words, that support documentclustering, text understanding, information retrieval ...

Trento, May 2015 Probability 6 / 39

Page 8: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 3: Statistical Semantics

The Probabilistic Latent Semantic Analysis (PLSA) model:

Document

Topic

Word

Word occurrences in Documents are observed. Topics are latent attributes of word occurrences indocuments.

T. Hofmann: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning,42, 177-196, 2001.

Trento, May 2015 Probability 7 / 39

Page 9: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 4: Bioinformatics

Micro-array gene expression data:

Which genes are expressed under which conditions? Which are co-regulated, or functinallydependent?

Trento, May 2015 Probability 8 / 39

Page 10: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Example 4: Bioinformatics

Bayesian network showing dependencies among gene expression levels:

DBG5

LKI3

FFR3

TSW2

IUJ8

ERK3

AA7

RDE6

NQO2

JSW5

PLR9

BDO4

MNW7

KID1

N. Friedman, M. Linial, I. Nachman, and D. Pe’er: Using Bayesian networks to analyze expressiondata. Journal of computational biology, 7 (3-4), 601-620, 2000.

Trento, May 2015 Probability 9 / 39

Page 11: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Statistical Machine Learning

Common ground in 4 examples:

◮ Learn a probabilistic model from data (using statistical learning techniques )◮ Apply probabilistic inference algorithms to use models for prediction (classification ,

regression ), structure analysis (clustering , segmentation )

Advantages of probabilistic/statistical methods:

◮ Principled quantification of prediction uncertainties◮ Robust and principled techniques to deal with incomplete information, missing data.

Trento, May 2015 Probability 10 / 39

Page 12: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Probabilistic Graphical Models

Need: probabilistic models that◮ can represent models for high-dimensional state spaces◮ support efficient learning and inference techniques

Probabilistic Graphical Models◮ support a structured specification of high-dimensional distributions in terms of

low-dimensional factors◮ structured representation can be exploited for efficient learning and inference algorithms

(sometimes ...)◮ graphical representation gives human-friendly design and description possibilities

Trento, May 2015 Probability 11 / 39

Page 13: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction This Course: Objective and Contents

Objective

◮ Provide a general introduction to the principles and techniques of probabilistic graphicalmodels

◮ Enable understanding of scientific papers that use PGMs in a specific scientific context

Contents

1: Introduction and Probability Basics2: Bayesian Networks – Syntax and Semantics3: Bayesian Networks – Inference4: Approximate Inference5: Markov Networks6: Parameter Learning7: Structure Learning8: Temporal Models9: Latent Variable Models10: Beyond Graphs

Trento, May 2015 Probability 12 / 39

Page 14: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Literature

D. Koller and N. Friedman:Probabilistic Graphical Mod-els. MIT press, 2009

C. M. Bishop: Pattern Recog-nition and Machine Learning.Springer, 2006

K. Murphy: Machine Learn-ing: A Probabilistic Perspec-tive. MIT Press, 2012

Trento, May 2015 Probability 13 / 39

Page 15: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Introduction Literature

D. Koller and N. Friedman:Probabilistic Graphical Mod-els. MIT press, 2009

C. M. Bishop: Pattern Recog-nition and Machine Learning.Springer, 2006

K. Murphy: Machine Learn-ing: A Probabilistic Perspec-tive. MIT Press, 2012

Trento, May 2015 Probability 13 / 39

Page 16: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics

Trento, May 2015 Probability 14 / 39

Page 17: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Probabilities as frequencies

The probability of tossing an even number with a die is 1/2:

Frequency of even numbers: 16/30 = 0.5333. In a sequence of 100 tosses, the frequency of ’even’is (expected to be) even closer to 0.5.

Trento, May 2015 Probability 14 / 39

Page 18: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Probabilities as frequencies

The probability of tossing an even number with a die is 1/2:

Frequency of even numbers: 16/30 = 0.5333. In a sequence of 100 tosses, the frequency of ’even’is (expected to be) even closer to 0.5.

The set of possible outcomes is called the sample space:

S = {1, 2, 3, 4, 5, 6}.

A subset of S is called an event:

Event Subset of Seven {2, 4, 6}1 {1}multiple of 3 {3, 6}

Trento, May 2015 Probability 14 / 39

Page 19: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics The laws of frequency probabilities

Notation

- S the sample space

- A,B, . . . events

- Pf (A) the frequency of occurrences of A in a (large) set of observed outcomes (theprobability of A).

Laws (Frequency) probabilities obey the following rules:◮ 0 ≤ Pf (A) ≤ 1◮ Pf (S) = 1◮ If A ∩ B = ∅, then Pf (A) + Pf (B) = Pf (A ∪ B)

Trento, May 2015 Probability 15 / 39

Page 20: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Probabilities as Beliefs

Measuring a subjective belief: let A be any proposition whose truth value is currently unknown, butwill later become known (e.g.: “it will rain tomorrow”, “a Democratic president will be elected in2016”, “Chievo Verona will score a goal vs. AC Fiorentina on 31/05/15”). Consider a betting ticket:

Ticket

GLOBAL GAMBLING INC. shall pay tothe owner of this ticket

$ 1if A happens.

How much are you willing to pay for this ticket? (At least $ 0!)For how much are you willing to sell this ticket? (Certainly for $ 1!)What is the price at which you would just as well buy or sell? (In between $ 0 and $ 1!)

Trento, May 2015 Probability 16 / 39

Page 21: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Ticket trading and Dutch books

Consider two agents: the elicitor (E) and the subject (S). Both E and S are in possession of ticketsfor various propositions A,B, . . .. Now:

◮ E asks S for a price for tickets for each of the propositions A,B, . . ..◮ After S has set prices for all propositions, S must be ready to either buy from E or sell to E

tickets at these prices.

The price set by S for proposition A, denoted Pb(A) is a measure of S’s belief in the happening ofA (S’s subjective probability for A).

Trento, May 2015 Probability 17 / 39

Page 22: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Ticket trading and Dutch books

Consider two agents: the elicitor (E) and the subject (S). Both E and S are in possession of ticketsfor various propositions A,B, . . .. Now:

◮ E asks S for a price for tickets for each of the propositions A,B, . . ..◮ After S has set prices for all propositions, S must be ready to either buy from E or sell to E

tickets at these prices.

The price set by S for proposition A, denoted Pb(A) is a measure of S’s belief in the happening ofA (S’s subjective probability for A).

E can make a Dutch Book against S, if S has set prices Pb(A), Pb(B), . . . for some propositions,such that E can make a combination of buying/selling deals with S, so that E will gain from thesedeals (and S will lose), under all possible combinations of outcomes for the propositions involved.

Trento, May 2015 Probability 17 / 39

Page 23: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Dutch book theorem

Proposition S’s price Pb E decides toA 0.4 buyB 0.3 buy

A ∨ B 0.8 sell

Outcome of propositions E’s gain from deals (= S’s losses)gain from buying/selling payout tickets bought - payout tickets sold total

A ∧ B 0.1 2 -1 1.1A ∧ ¬B 0.1 1 -1 0.1¬A ∧ B 0.1 1 -1 0.1¬A ∧ ¬B 0.1 0 0 0.1

Trento, May 2015 Probability 18 / 39

Page 24: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Dutch book theorem

Proposition S’s price Pb E decides toA 0.4 buyB 0.3 buy

A ∨ B 0.8 sell

Outcome of propositions E’s gain from deals (= S’s losses)gain from buying/selling payout tickets bought - payout tickets sold total

A ∧ B 0.1 2 -1 1.1A ∧ ¬B 0.1 1 -1 0.1¬A ∧ B 0.1 1 -1 0.1¬A ∧ ¬B 0.1 0 0 0.1

Dutch Book Theorem (de Finetti)

E can make a Dutch book against S, if and only if S’s prices do not obey

◮ 0 ≤ Pb(A) ≤ 1◮ Pb(S) = 1 (S the sure proposition)◮ If A ∧ B is an impossible proposition, then Pb(A) + Pb(B) = Pb(A ∨ B)

Literature: http://plato.stanford.edu/entries/dutch-book/

Trento, May 2015 Probability 18 / 39

Page 25: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Beliefs and Frequencies

◮ Frequencies and rational beliefs follow the same laws◮ We need only one probability calculus to deal with both

◮ Differences:◮ We can have beliefs about everything!◮ Frequency-probabilities need to be based on repeatable sampling/observation procedures.

From frequencies to beliefs

Pf (max. temperature on Aug.1 > 25◦C) = 0.35 Pb(it will be warmer than 25◦C on 01/08/2010) = 0.35.

This kind of reasoning pattern is called direct inference. However, it is not always possible to basesubjective probabilities on well-defined frequencies (what is the probability that global warming willraise the sea-levels more than 1m by the year 2100?).

Trento, May 2015 Probability 19 / 39

Page 26: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Probabilities: the Mathematical Model

A probability distribution on a finite sample space S is a function P() that assigns to every eventA ⊆ S a number P(A) ∈ [0, 1] (the probability of A), such that

◮ P(S) = 1◮ If A ∩ B = ∅, then P(A) + P(B) = P(A ∪ B) (finite additivity)

A

B CD

S

P(A ∪ B) = P(A) + P(B)P(C ∪ D) = P(C) + P(D)− P(C ∩ D)

Infinite Sample Spaces

◮ Probabilities are only assigned to measurable subsets A ⊆ S

◮ Countable additivity:

Ai (i ∈ N) pairwise disjoint : P(∪i∈NAi ) =∑

i∈N

P(Ai )

Trento, May 2015 Probability 20 / 39

Page 27: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Conditional probabilities

For two events A and B we define the conditional probability of A given B:

P(A|B) =P(A ∩ B)

P(B)

Examples :

P({4}|{2, 4, 6}) = P({4})P({2,4,6}) = 1/6

3/6 = 13

P(even|{4, 5, 6}) = P({4,6})P({4,5,6}) = 2

3

P(zero|roulette wheel is fair) =P(zero ∩ roulette wheel is fair)

P(roulette wheel is fair)(=

?

?) = 1/37.

Trento, May 2015 Probability 21 / 39

Page 28: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Probability Basics Fundamental rules

Notation: In probability expressions: A,B stands for intersection A∩ B, or conjunction “A and B”

The fundamental ruleP(A,B) = P(A|B)P(B)

The fundamental rule, conditioned

P(A,B|C) = P(A|B, C)P(B|C)

Bayes’s rule

P(B|A) =P(A|B)P(B)

P(A)

Bayes’s rule, conditioned

P(B|A, C) =P(A|B, C)P(B|C)

P(A|C)

Trento, May 2015 Probability 22 / 39

Page 29: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Random Variables and Independence

Trento, May 2015 Probability 23 / 39

Page 30: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence Random Variables

Sample Spaces are usually defined by random variables given by a name X and a value spaceVal(X)

X Val(X)Temperature R

CO2level 2100 R

US President 2016 democratic, republican, otherMiddleEastPeace2020 yes, no

Population N

A set of random variables X1, . . . ,Xn defines the sample space

S = Val(X1) × · · · × Val(Xn)

A (marginal) event of the form Xi = xi is the subset

Val(X1)× · · · × {xi} × · · · × Val(Xn) ⊆ S

Trento, May 2015 Probability 23 / 39

Page 31: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence Terminology and Notation

Terminology

◮ Discrete random variable: RV with finite value space (in Probability Theory also: countablyinfinite value space)

◮ Continuous random variable: RV with value space R

Notation

◮ Upper case letters X ,Y , Z for (generic) random variables, corresponding lower case lettersx, y , z for their values

◮ Boldface X = (X1, . . . ,Xk ), x = (x1, . . . , xk ) for tuples of random variables and values◮ X = x stands for component-wise assignment Xi = xi (i = 1, . . . , k )

TerminologyThe joint distribution of RVs X is a distribution P on the sample space

S = Val(X ) = Val(X1)× · · · × Val(Xn)

The marginal distribution of Xi is P restricted to events of the form Xi = xi .

Trento, May 2015 Probability 24 / 39

Page 32: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence Tables and Marginals

Tabular specification of a joint distribution for discrete RVs (contingency table):

USPres16MEP2020 dem rep oth

yes 0.12 0.1 0.08no 0.23 0.3 0.17

Trento, May 2015 Probability 25 / 39

Page 33: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence Tables and Marginals

Tabular specification of a joint distribution for discrete RVs (contingency table):

USPres16MEP2020 dem rep oth

yes 0.12 0.1 0.08no 0.23 0.3 0.17

Marginal distributions:

USPres16MEP2020 dem rep oth

yes 0.12 0.1 0.08 0.3no 0.23 0.3 0.17 0.7

0.35 0.4 0.25

Trento, May 2015 Probability 25 / 39

Page 34: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence 3-way Tables

Tabular representation of joint distribution P(A,B,C) of three variables A,Val(A) = {a1, a2},B,Val(B) = {b1, b2, b3}, C,Val(C) = {c1, c2}.

Bb1 b2 b3C C C

A c1 c2 c1 c2 c1 c2a1 0.1 0.04 0.02 0.17 0.09 0.13 0.55a2 0.07 0.11 0.04 0.09 0.12 0.02 0.45

0.32 0.32 0.360.44 0.56

or

A, B, C Pa1,b1,c1 0.1a1,b1,c2 0.04

......

a2,b3,c2 0.02

Trento, May 2015 Probability 26 / 39

Page 35: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence Conditional Distributions

Conditional distribution of A given B = b1 and C = c2:

P(A | B = b1,C = c2):A

a1 P(A = a1 | B = b1,C = c2) =P(A=a1,B=b1,C=c2)

P(B=b1,C=c2)= 0.04

0.15 = 0.2666

a2 P(A = a2 | B = b1,C = c2) =P(A=a1,B=b1,C=c2)

P(B=b1,C=c2)= 0.11

0.15 = 0.7333

Trento, May 2015 Probability 27 / 39

Page 36: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence Conditional Distributions

Conditional distribution of A given B = b1 and C = c2:

P(A | B = b1,C = c2):A

a1 P(A = a1 | B = b1,C = c2) =P(A=a1,B=b1,C=c2)

P(B=b1,C=c2)= 0.04

0.15 = 0.2666

a2 P(A = a2 | B = b1,C = c2) =P(A=a1,B=b1,C=c2)

P(B=b1,C=c2)= 0.11

0.15 = 0.7333

Conditional distribution of A given B and C:

P(A | B,C):B

b1 b2 b3C C C

A c1 c2 c1 c2 c1 c2a1 0.588 0.2666 0.333 0.654 0.428 0.866a2 0.411 0.7333 0.666 0.346 0.571 0.133

Again a function on Val(A,B,C)!

Trento, May 2015 Probability 27 / 39

Page 37: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

RVs and Independence Basic Rules for Variables

According to fundamental rule: for all a ∈ Val(A) and all b ∈ Val(B):

P(A = a,B = b) = P(A = a | B = b) · P(B = b)

Can write this as an equation between functions P(A,B),P(A | B),P(B) on Val(A) × Val(B):

P(A,B) = P(A | B) · P(B)

Similarly, conditioned version of Fundamental rule:

P(A,B | C) = P(A | B,C) · P(B | C)

and Bayes’ rule (conditioned):

P(B | A) = P(A|B)·P(B)P(A)

P(B | A,C) = P(A|B,C)·P(B|C)P(A|C)

Trento, May 2015 Probability 28 / 39

Page 38: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence

Trento, May 2015 Probability 29 / 39

Page 39: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Example: Football statistics

Results for Bayern München and SC Freiburg in seasons 2001/02 and 2003/04. (Not counting thematches München vs. Freiburg):

Val(München) = Val(Freiburg) = {Win, Draw, Loss}

2001/02München: LWDWWWWWWWWLDLDLDLWLDWWWDWDDWWWWFreiburg: WLLDDWLDWDWLLLDDLWDDLLDLLLLLLWLW

2003/04München: WDWWLDWWDWLWWDDWDWLWWWDDWWWLWWLLFreiburg: LDDWDWLWLLLWWLWLWLLDWLDDWDLLLWLD

Summary:

FreiburgMünchen W D L

W 12 9 15 36D 3 4 9 16L 6 4 2 12

21 17 26

Trento, May 2015 Probability 29 / 39

Page 40: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Independence of Outcomes

Counts normalized to probabilities:

P(München,Freiburg):Freiburg

München W D L München marginal

W .1875 .1406 .2344 .5625

D .0468 .0625 .1406 .25

L .0937 .0625 .0312 .1875

Freiburg marginal .3281 .2656 .4062

Trento, May 2015 Probability 30 / 39

Page 41: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Independence of Outcomes

Counts normalized to probabilities:

P(München,Freiburg):Freiburg

München W D L München marginal

W .1875 .1406 .2344 .5625.1845 .1494 .2284

D .0468 .0625 .1406 .25.082 .0664 .1015

L .0937 .0625 .0312 .1875.0615 .0498 .0761

Freiburg marginal .3281 .2656 .4062

Product of marginals: P(München) · P(Freiburg)

Trento, May 2015 Probability 30 / 39

Page 42: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Independence of Outcomes

Counts normalized to probabilities:

P(München,Freiburg):Freiburg

München W D L München marginal

W .1875 .1406 .2344 .5625.1845 .1494 .2284

D .0468 .0625 .1406 .25.082 .0664 .1015

L .0937 .0625 .0312 .1875.0615 .0498 .0761

Freiburg marginal .3281 .2656 .4062

Product of marginals: P(München) · P(Freiburg)

Explanation: The outcome of Freiburg’s game is independent of the outcome of München’s game,therefore the probabilities of combinations of outcomes are the product of the probabilities for theindividual outcomes.

Trento, May 2015 Probability 30 / 39

Page 43: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Independent Variables

Let P be a joint distribution of variables A1, . . . ,An; The variables Ai and Ak are independent(according to P), if for all ai,j ∈ Val(Ai ) and ak,h ∈ Val(Ak ) :

P(Ai = ai,j ,Ak = ak,h) = P(Ai = ai,j) · P(Ak = ak,h).

Written as equation for distributions:

P(Ai ,Ak ) = P(Ai ) · P(Ak ).

◮ Similar for 3 or more variables◮ A set of RVs is independent, if every finite subset is independent

Trento, May 2015 Probability 31 / 39

Page 44: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Example

Pairwise independence does not imply independence:

◮ Random variables Xi (i = 1, . . . , n; n ≥ 2) and B with Val(Xi ) = Val(B) = {0, 1}◮ The Xi represent a sequence of independent coin tosses:

P(X = x) = (1

2)n for all x ∈ {0, 1}n

◮ B is a parity bit:

P(B =n∑

i=1

Xi mod 2) = 1

Then for all i , x ∈ {0, 1}:P(B = 1 | Xi = x) = P(B = 1) = 0.5

But for all x ∈ {0, 1}n:

0.5 = P(B = 1) 6= P(B = 1 | X = x) ∈ {0, 1}

Trento, May 2015 Probability 32 / 39

Page 45: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Equivalent Formulations of Independence

The following are equivalent for two variables A,B:

P(A,B) = P(A)P(B)P(A | B) = P(A)P(B | A) = P(B)

M =W D L

P(M) : .5625 .25 .1875

P(M | F = W ) : .1875.3281 = .571 .0468

.3281 = .143 .0937.3281 = .285

P(M | F = D) : .1406.2656 = .529 .0625

.2656 = .235 .0625.2656 = .235

P(M | F = L) : .2344.4062 = .577 .1406

.4062 = .346 .0312.4062 = .077

Knowing the outcome of Freiburg’s game does not change the probabilities for München’s game (infact here not completely true – but discrepancies between e.g. P(M = L | F = L) and P(M = L)can be explained by small number of games from which frequencies were computed).

Trento, May 2015 Probability 33 / 39

Page 46: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Independence Compact Specifications by Independence

Independence properties can greatly simplify the specification of a distribution:

F =M = W D L M marginal

W .5625

D .25

L M and F are independent

.1875

F marginal .3281 .2656 .4062

Trento, May 2015 Probability 34 / 39

Page 47: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Conditional Independence

Trento, May 2015 Probability 35 / 39

Page 48: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Conditional Independence Example

Joint distribution for variables

Sex : Val(Sex) = {male,female}Hair length : Val(Hair length) = {long, short}Stature : Val(Stature) = {≥ 1.68,≤ 1.68}

Sexmale female

Hair length Hair lengthStature long short long short≥ 1.68 0.0441 0.3969 0.2142 0.0918≤ 1.68 0.0049 0.0441 0.1428 0.0612

P(Hair length,Stature), P(Hair length), P(Stature), P(Hair length) · P(Stature):

Hair lengthStature long short≥ 1.68 0.2583 0.4887 0.747

0.3032 0.4437≤ 1.68 0.1477 0.1053 0.253

0.1027 0.15020.406 0.594

Hair length and Stature are not independent.

Trento, May 2015 Probability 35 / 39

Page 49: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Conditional Independence Example Continued

P(Hair length,Stature | Sex = female)P(Hair length | Sex = female)P(Stature | Sex = female)P(Hair length | Sex = female) · P(Stature | Sex = female)

Hair lengthStature long short≥ 1.68 0.42 0.18 0.6

0.42 0.18≤ 1.68 0.28 0.12 0.4

0.28 0.120.7 0.3

Hair length and Stature are independent given Sex=female.Also: Hair length and Stature are independent given Sex=male. Hair length and Stature are independent given Sex.

Trento, May 2015 Probability 36 / 39

Page 50: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Conditional Independence Conditionally Independent Variables

Let P be a joint distribution of variables A = A1, . . . ,An, B = B1, . . . ,Bm, C = C1, . . . ,Ck ; Thevariables A are conditionally independent of the variables B given C, if

P(A,B | C) = P(A | C) · P(B | C)

Equivalently:P(A | B,C) = P(A | C)

Trento, May 2015 Probability 37 / 39

Page 51: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Conditional Independence The Chain Rule

For any joint distribution P of variables A1, . . . ,An:

P(A1, . . . ,An) = P(An | A1, . . . ,An−1) · P(An−1 | A1, . . . ,An−2) · · ·P(A2 | A1) · P(A1).

Proof by repeated application of the fundamental rule.

For i = 1, . . . , n let Pa(Ai ) be a subset of A1, . . . ,Ai−1, such that

P(Ai | A1, . . . ,Ai−1) = P(Ai | Pa(Ai )).

Then:

P(A1, . . . ,An) =n∏

i=1

P(Ai | Pa(Ai )).

This is also called the chain rule for Bayesian networks.

Trento, May 2015 Probability 38 / 39

Page 52: Probabilistic Graphical Modelsassets.disi.unitn.it/uploads/doctoral_school/... · Introduction Probabilistic Graphical Models Need: probabilistic models that can represent models

Conditional Independence Example

Chain rule applied to Sex,Hair length, Stature:

P(Sex,Hair length,Stature) = P(Stature | Hair length,Sex) · P(Hair length | Sex) · P(Sex).

Using conditional independence:

P(Sex,Hair length,Stature) = P(Stature | Sex) · P(Hair length | Sex) · P(Sex).

Graphical representation:

Sex

Hair length Stature

Trento, May 2015 Probability 39 / 39