bayesian networks – (structure) learning · sinus headache nose 16 . 9 what you need to know...

12
1 Bayesian Networks – (Structure) Learning Machine Learning – CSE446 Carlos Guestrin University of Washington May 31, 2013 ©Carlos Guestrin 2005-2013 1 Review Bayesian Networks Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference As shown in demo examples Compute P(X|e) Today Learn BN structure Flu Allergy Sinus Headache Nose ©Carlos Guestrin 2005-2013 2

Upload: others

Post on 03-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

1

Bayesian Networks – (Structure) Learning

Machine Learning – CSE446 Carlos Guestrin University of Washington

May 31, 2013 ©Carlos Guestrin 2005-2013 1

Review n  Bayesian Networks

¨  Compact representation for probability distributions

¨  Exponential reduction in number of parameters

n  Fast probabilistic inference ¨  As shown in demo examples ¨  Compute P(X|e)

n  Today ¨  Learn BN structure

Flu Allergy

Sinus

Headache Nose

©Carlos Guestrin 2005-2013 2

Page 2: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

2

Learning Bayes nets

x(1) …

x(m)

Data

structure parameters

CPTs – P(Xi| PaXi)

©Carlos Guestrin 2005-2013 3

Learning the CPTs

x(1) …

x(m)

Data For each discrete variable Xi

©Carlos Guestrin 2005-2013 4

Page 3: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

3

Information-theoretic interpretation of maximum likelihood 1

n  Given structure, log likelihood of data:

Flu Allergy

Sinus

Headache Nose

©Carlos Guestrin 2005-2013 5

Information-theoretic interpretation of maximum likelihood 2

n  Given structure, log likelihood of data:

Flu Allergy

Sinus

Headache Nose

©Carlos Guestrin 2005-2013 6

Page 4: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

4

Information-theoretic interpretation of maximum likelihood 3

n  Given structure, log likelihood of data:

Flu Allergy

Sinus

Headache Nose

©Carlos Guestrin 2005-2013 7

Decomposable score

n  Log data likelihood

n  Decomposable score: ¨ Decomposes over families in BN (node and its parents) ¨ Will lead to significant computational efficiency!!! ¨ Score(G : D) = ∑i FamScore(Xi|PaXi : D)

©Carlos Guestrin 2005-2013 8

Page 5: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

5

How many trees are there? Nonetheless – Efficient optimal algorithm finds best tree

©Carlos Guestrin 2005-2013 9

Scoring a tree 1: equivalent trees

©Carlos Guestrin 2005-2013 10

Page 6: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

6

Scoring a tree 2: similar trees

©Carlos Guestrin 2005-2013 11

Chow-Liu tree learning algorithm 1

n  For each pair of variables Xi,Xj ¨  Compute empirical distribution:

¨  Compute mutual information:

n  Define a graph ¨  Nodes X1,…,Xn ¨  Edge (i,j) gets weight

©Carlos Guestrin 2005-2013 12

Page 7: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

7

Chow-Liu tree learning algorithm 2

n  Optimal tree BN ¨ Compute maximum weight

spanning tree ¨ Directions in BN: pick any

node as root, breadth-first-search defines directions

©Carlos Guestrin 2005-2013 13

Structure learning for general graphs

n  In a tree, a node only has one parent

n  Theorem: ¨ The problem of learning a BN structure with at most d

parents is NP-hard for any (fixed) d>1

n  Most structure learning approaches use heuristics ¨  (Quickly) Describe the two simplest heuristic

©Carlos Guestrin 2005-2013 14

Page 8: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

8

Learn BN structure using local search

Starting from Chow-Liu tree

Local search, possible moves: •  Add edge •  Delete edge •  Invert edge

Score using BIC

©Carlos Guestrin 2005-2013 15

Learn Graphical Model Structure using LASSO

n  Graph structure is about selecting parents:

n  If no independence assumptions, then CPTs depend on all parents:

n  With independence assumptions, depend on key variables:

n  One approach for structure learning, sparse logistic regression!

©Carlos Guestrin 2005-2013

Flu Allergy

Sinus

Headache Nose

16

Page 9: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

9

What you need to know about learning BN structures

n  Decomposable scores ¨ Maximum likelihood ¨  Information theoretic interpretation

n  Best tree (Chow-Liu) n  Beyond tree-like models is NP-hard n  Use heuristics, such as:

¨ Local search ¨ LASSO

©Carlos Guestrin 2005-2013 17

Markov Decision Processes (MDPs)

Machine Learning – CSE446 Carlos Guestrin University of Washington

June 3, 2013 ©Carlos Guestrin 2005-2013 18

Page 10: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

10

Reinforcement Learning

training by feedback

©Carlos Guestrin 2005-2013 19

Learning to act

n  Reinforcement learning n  An agent

¨  Makes sensor observations ¨  Must select action ¨  Receives rewards

n  positive for “good” states n  negative for “bad” states

[Ng et al. ’05]

©Carlos Guestrin 2005-2013 20

Page 11: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

11

Markov Decision Process (MDP) Representation

n  State space: ¨  Joint state x of entire system

n  Action space: ¨  Joint action a= {a1,…, an} for all agents

n  Reward function: ¨  Total reward R(x,a)

n  sometimes reward can depend on action

n  Transition model: ¨  Dynamics of the entire system P(x’|x,a)

©Carlos Guestrin 2005-2013 21

Discount Factors

People in economics and probabilistic decision-making do this all the time. The “Discounted sum of future rewards” using discount factor γ” is

(reward now) + γ (reward in 1 time step) + γ 2 (reward in 2 time steps) + γ 3 (reward in 3 time steps) + : : (infinite sum)

©Carlos Guestrin 2005-2013 22

Page 12: Bayesian Networks – (Structure) Learning · Sinus Headache Nose 16 . 9 What you need to know about learning BN structures ! Decomposable scores " Maximum likelihood " Information

12

The Academic Life

Define: VA = Expected discounted future rewards starting in state A VB = Expected discounted future rewards starting in state B VT = “ “ “ “ “ “ “ T VS = “ “ “ “ “ “ “ S

VD = “ “ “ “ “ “ “ D

How do we compute VA, VB, VT, VS, VD ?

A. Assistant

Prof 20

B. Assoc. Prof 60

S. On the Street

10

D. Dead

0

T. Tenured

Prof 400

Assume Discount

Factor γ = 0.9

0.7

0.7

0.6

0.3

0.2 0.2

0.2

0.3

0.6 0.2

©Carlos Guestrin 2005-2013 23