1 bn semantics 1 graphical models – 10708 carlos guestrin carnegie mellon university september 15...

37
1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th , 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 – Carlos Guestrin 2006-2008

Upload: grant-reynold-hicks

Post on 13-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

1

BN Semantics 1

Graphical Models – 10708

Carlos Guestrin

Carnegie Mellon University

September 15th, 2008

Readings:K&F: 3.1, 3.2, 3.3

10-708 – Carlos Guestrin 2006-2008

Page 2: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2008 2

Let’s start on BNs…

Consider P(Xi) Assign probability to each xi 2 Val(Xi)

Independent parameters

Consider P(X1,…,Xn) How many independent parameters if |Val(Xi)|=k?

Page 3: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2008 3

What if variables are independent?

What if variables are independent? (Xi Xj), 8 i,j

Not enough!!! (See homework 1 ) Must assume that (X Y), 8 X,Y subsets of {X1,…,Xn}

Can write P(X1,…,Xn) = i=1…n P(Xi)

How many independent parameters now?

Page 4: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2008 4

Conditional parameterization – two nodes

Grade is determined by Intelligence

Page 5: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2008 5

Conditional parameterization – three nodes

Grade and SAT score are determined by Intelligence

(G S | I)

Page 6: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 6

The naïve Bayes model – Your first real Bayes Net

Class variable: C Evidence variables: X1,…,Xn

assume that (X Y | C), 8 X,Y subsets of {X1,…,Xn}

Page 7: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 7

What you need to know (From last class)

Basic definitions of probabilities

Independence

Conditional independence

The chain rule

Bayes rule

Naïve Bayes

Page 8: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 8

This class

We’ve heard of Bayes nets, we’ve played with Bayes nets, we’ve even used them in your research

This class, we’ll learn the semantics of BNs, relate them to independence assumptions encoded by the graph

Page 9: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 9

Causal structure

Suppose we know the following: The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches

How are these connected?

Page 10: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 10

Possible queries

Flu Allergy

Sinus

Headache Nose

Inference

Most probable explanation

Active data collection

Page 11: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 11

Car starts BN

18 binary attributes

Inference P(BatteryAge|Starts=f)

218 terms, why so fast? Not impressed?

HailFinder BN – more than 354 = 58149737003040059690390169 terms

Page 12: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 12

Factored joint distribution - Preview

Flu Allergy

Sinus

Headache Nose

Page 13: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 13

Number of parameters

Flu Allergy

Sinus

Headache Nose

Page 14: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 14

Key: Independence assumptions

Flu Allergy

Sinus

Headache Nose

Knowing sinus separates the symptom variables from each other

Page 15: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 15

(Marginal) Independence

Flu and Allergy are (marginally) independent

More Generally:

Flu = t Flu = f

Allergy = t

Allergy = f

Allergy = t

Allergy = f

Flu = t

Flu = f

Page 16: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 16

Conditional independence

Flu and Headache are not (marginally) independent

Flu and Headache are independent given Sinus infection

More Generally:

Page 17: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 17

The independence assumption

Flu Allergy

Sinus

Headache Nose

Local Markov Assumption:A variable X is independentof its non-descendants given its parents and only its parents (Xi NonDescendantsXi | PaXi)

Page 18: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 18

Explaining away

Flu Allergy

Sinus

Headache Nose

Local Markov Assumption:A variable X is independentof its non-descendants given its parents and only its parents(Xi NonDescendantsXi | PaXi)

Page 19: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 19

What about probabilities?Conditional probability tables (CPTs)

Flu Allergy

Sinus

Headache Nose

Page 20: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 20

Joint distribution

Flu Allergy

Sinus

Headache Nose

Why can we decompose? Local Markov Assumption!

Page 21: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 21

A general Bayes net

Set of random variables

Directed acyclic graph

CPTs

Joint distribution:

Local Markov Assumption: A variable X is independent of its non-descendants given its

parents and only its parents – (Xi NonDescendantsXi | PaXi)

Page 22: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 22

Announcements

Homework 1: Out wednesday Due in 2 weeks – beginning of class! It’s hard – start early, ask questions

Collaboration policy OK to discuss in groups Tell us on your paper who you talked with Each person must write their own unique paper No searching the web, papers, etc. for answers, we trust you

want to learn

Audit policy No sitting in, official auditors only, see couse website

Don’t forget to register to the mailing list at: https://mailman.srv.cs.cmu.edu/mailman/listinfo/10708-announce

Page 23: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 23

Questions????

What distributions can be represented by a BN?

What BNs can represent a distribution?

What are the independence assumptions encoded in a BN? in addition to the local Markov assumption

Page 24: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 24

Today: The Representation Theorem – Joint Distribution to BN

Joint probabilitydistribution:Obtain

BN: Encodes independenceassumptions

If conditionalindependencies

in BN are subset of conditional

independencies in P

Page 25: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 25

Today: The Representation Theorem – BN to Joint Distribution

If joint probabilitydistribution:

BN: Encodes independenceassumptions

Obtain

Then conditionalindependencies

in BN are subset of conditional

independencies in P

Page 26: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 26

Let’s start proving it for naïve Bayes –

From joint distribution to BN Independence assumptions:

Xi independent given C

Let’s assume that P satisfies independencies must prove that P factorizes according to BN: P(C,X1,…,Xn) = P(C) i P(Xi|C)

Use chain rule!

Page 27: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 27

Let’s start proving it for naïve Bayes –

From BN to joint distribution Let’s assume that P factorizes according to the BN:

P(C,X1,…,Xn) = P(C) i P(Xi|C)

Prove the independence assumptions: Xi independent given C

Actually, (X Y | C), 8 X,Y subsets of {X1,…,Xn}

Page 28: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 28

Today: The Representation Theorem

BN: Encodes independenceassumptions

Joint probabilitydistribution:Obtain

If conditionalindependencies

in BN are subset of conditional

independencies in P

If joint probabilitydistribution: Obtain

Then conditionalindependencies

in BN are subset of conditional

independencies in P

Page 29: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 29

Local Markov assumption & I-maps

Flu Allergy

Sinus

Headache Nose

Local Markov Assumption:A variable X is independentof its non-descendants given its parents and only its parents (Xi NonDescendantsXi | PaXi)

Local independence assumptions in BN structure G:

Independence assertions of P:

BN structure G is an I-map (independence map) if:

Page 30: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 30

Factorized distributions

Given Random vars X1,…,Xn

P distribution over vars BN structure G over same vars

P factorizes according to G if

Flu Allergy

Sinus

Headache Nose

Page 31: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 31

BN Representation Theorem – I-map to factorization

Joint probabilitydistribution:Obtain

If conditionalindependencies

in BN are subset of conditional

independencies in P

G is an I-map of P P factorizes

according to G

Page 32: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 32

BN Representation Theorem – I-map to factorization: Proof

Local Markov Assumption:A variable X is independentof its non-descendants given its parents and

only its parents (Xi NonDescendantsXi | PaXi)

ALL YOU NEED:

Flu Allergy

Sinus

Headache Nose

ObtainG is an I-map of P

P factorizes according to G

Page 33: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 33

Defining a BN

Given a set of variables and conditional independence assertions of P

Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n

Add Xi to the network

Define parents of Xi, PaXi, in graph as the minimal

subset of {X1,…,Xi-1} such that local Markov assumption holds – Xi independent of rest of {X1,…,Xi-1}, given parents PaXi

Define/learn CPT – P(Xi| PaXi)

Page 34: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 34

BN Representation Theorem – Factorization to I-map

G is an I-map of P P factorizes

according to G

If joint probabilitydistribution: Obtain

Then conditionalindependencies

in BN are subset of conditional

independencies in P

Page 35: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 35

BN Representation Theorem – Factorization to I-map: Proof

G is an I-map of P P factorizes

according to G

If joint probabilitydistribution: Obtain

Then conditionalindependencies

in BN are subset of conditional

independencies in P

Homework 1!!!!

Page 36: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 36

The BN Representation Theorem

If joint probability

distribution:Obtain

Then conditionalindependencies

in BN are subset of conditional

independencies in P

Joint probabilitydistribution:Obtain

If conditionalindependencies

in BN are subset of conditional

independencies in P

Important because: Every P has at least one BN structure G

Important because: Read independencies of P from BN structure G

Page 37: 1 BN Semantics 1 Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, 3.3 10-708 –  Carlos

10-708 – Carlos Guestrin 2006-2008 37

Acknowledgements

JavaBayes applet http://www.pmr.poli.usp.br/ltd/Software/javabayes/

Home/index.html