introduction to graphical models - aisociety.kr · introduction to graphical models yung-kyun noh...

80
Seoul National University Introduction to Graphical Models Yung-Kyun Noh Seoul National University Pattern Recognition and Machine Learning Winter School Seoul National University Museum, Seoul 2018. 1. 24-25. (Wed.-Thu.)

Upload: vukhanh

Post on 24-Apr-2018

214 views

Category:

Documents


2 download

TRANSCRIPT

Seoul National University

Introduction to Graphical Models

Yung-Kyun Noh

Seoul National University

Pattern Recognition and Machine Learning Winter School

Seoul National University Museum, Seoul

2018. 1. 24-25. (Wed.-Thu.)

Seoul National University

GENERALIZATION FOR

PREDICTION

2

Seoul National University

Probabilistic Assumption and Bayes Classification• Bayes Error

3

p1(x) p2(x)

Data space

Seoul National University

Probabilistic Assumption and Bayes Classification

4

p1(x) p2(x)

Data space

Seoul National University

Learning

5

• Data

• Learning

Learn prediction function

from data

• Prediction

(Regularity)

( : Hypothesis set/Candidate set)

Seoul National University

Quantify the Evaluation

• Measure of quality: expected loss

• Estimated error

• Examples

– Classification

– Regression

6

: loss function

Seoul National University

• satisfies

• Caution:

– The definition of consistency is not

Consistent Learner

7

<Uniform convergence>

Seoul National University

Quiz 1

• Consider a hypothesis set which satisfies

8

Explain that learning with is not consistent

though it satisfies .

What is the possible problem in this case?

N

Seoul National University

Consistency and Bayes Error

• Minimizing expected error (objective) vs.

minimizing estimated error

9

Bayes error

0

Error

Error for training data (train error)

(For example, a linear classifier with regularization)

Larger (Complex classifier)

Smaller (Simple classifier)

: loss function

: Optimal parameter

with N data

f

Seoul National University

Consistency and Bayes Error

• Consistent learner with many data

10

Bayes error

0

Error

Smaller Larger

train

error

Minimum error of consistent learner

Consistency does not

necessarily imply

achieving the Bayes

error!!!

: Optimal parameter

(For example, a linear classifier with regularization)

f

Seoul National University

Related Terms with Confining H

• Linear model

VC-dim for classification = Dimensionality + 1

• Small number of parameters

• Large margin

• Regularization

• Bias-Variance trade-off

• Generalization ability, overfitting

Many terms are theoretically connected

11

Seoul National University

Learning for Prediction

12

• Method:

– Learning from

examples and can

classify an unseen

data

label

classifierNew

unseen

data(a horse)

[Based on the assumption of regularity]

car car

plane

ship

horse

horsehorse

car

Learn

Seoul National University

=[1, 2, 5, 10, …]T

Representation of Data

13

• Each datum is one point

in a data space

Data space

elements

Seoul National University

Classification

14

Seoul National University

Bayes Optimal Classifier

• Our ultimate goal is not a zero error.

15

(Optimal) Bayes error

Figure credit: Masashi Sugiyama

Seoul National University

GRAPHICAL MODELS

16

Seoul National University

Gaussian Random Variable

17

Principal axes are the eigenvector directions of

: eigenvalues

Seoul National University

Covariance Matrix and Projection

18

: eigenvalues

covariance = Ʃ

variance =

with

Seoul National University

Parameter Estimation

• Maximum Likelihood Estimation

19

Data:

Covariance matrix:

Mean vector:

Seoul National University

Naïve Bayes As An Extreme Case

• Naïve Bayes

– Simply ignore every correlation and

dependencies between variables

• True decomposition:

20

p(x) =DY

d=1

pd(xd)

p(x) =p(x1)p(x2jx1)p(x3jx1;x2): : :p(xDjx1; : : : ;xD¡1)

=p1(x1)p2(x2) : : :pD(xD)

Seoul National University

Number of Parameters

• D = 1000

– Number of parameters of a Gaussian:

1000 + 1001*(1000)/2 = 501,500

21

Seoul National University

Independence

• D1 = 500, D2 = 500

– Number of parameters

500 + 501*(500)/2 + 500 + 501*(500)/2

= 251,500

• Incorporating one independence can reduce

the number of parameters into half.

22

p(x) =p1(x1)p2(x2)

D=D1 +D2

Seoul National University

Independency

• Correlation and Independency

23

Naïve Bayes?

Mixture of Gaussian?

Independency in Gaussian means no correlation

x a ? ? x b

Seoul National University

Conditional Independency

24

vs.

xa ?? xbjxc

x a ? ? x b

Seoul National University

Graphical Models

• We utilize probabilities that are represented by

the graph structure. (directed & undirected)

25

Use probabilistic independencies

and conditional independencies

that can be captured by graph

structure

Seoul National University

Directed Graphical Models

• Factorization of a (large) joint pdf

• For given data, make a model for each decomposed

probability, then estimate parameters separately.

26

Seoul National University

D-Separations

27

X1

X2

X3

X1

X2 X3

X1X2

X3

Causal path Common cause Common effect

X1??X3jX2 X2??X3jX1 X1??= X2jX3

X1 ??X2

Seoul National University

D-Separations

28

X1 X2 X3 X1??X3jX2

X1 X2 X3

Seoul National University

Discrete Random Variables

29

Seoul National University

Discrete Random Variables

30

P (1)

Seoul National University

Inference with Discrete Variables

31

Seoul National University

Inference with Discrete Variables

32

Seoul National University

Inference with Discrete Variables

• Marginalize X5

33

Seoul National University

Inference with Discrete Variables

• Marginalize X5

34

!

Seoul National University

Inference with Discrete Variables

• Marginalize X3

35

Seoul National University

Inference with Discrete Variables

• Marginalize X3

36

Seoul National University

Inference with Discrete Variables

• Marginalize X2

37

Seoul National University

Inference with Discrete Variables

38

From the joint distribution,

Seoul National University

Continuous Random Variables

• Each probability density is a Gaussian

39

Seoul National University

Continuous Random Variables

• Jointly Gaussian with 6 variables

– Need 6 + 6*7/2 = 27 parameters

• Using graphical model:

– Need to obtain the parameters of

– We need to estimate only the following 19 parameters

40

Seoul National University

Continuous Random Variables

41

Seoul National University

Gaussian Random Variable – Marginal

42

Seoul National University

Gaussian Random Variable – Marginal

43

Seoul National University

Gaussian Random Variable – Conditional

44

Seoul National University

KALMAN FILTER

45

Seoul National University

Filtering

• Linear Dynamical Systems (LDS) with

Gaussian noise

46

X 1 X 2 X 3 X k ¡ 1 X k…

Y1 Y2 Y3 Yk¡1 Yk

Seoul National University

Kalman Filter

47

Marginalizing x0,…xT-1

Filtering

“Conditional marginalization”

Marginalization from the left

Seoul National University

Kalman Filter

48

Unconstrained distribution

Also, for joint density if necessary

Seoul National University

Kalman Filter

49

Unconstrained distribution

Seoul National University

Kalman Filter

50

Marginalizing x0

Constrained distribution

Same

and are from unconstrained distribution. What matters is .

Seoul National University

Kalman Filter

51

Marginalizing x1,…xT-1

What matters is how we can find the covariance of joint density function!!

Seoul National University

Kalman Filter

52

…Filtering

“Conditional marginalization”

Marginalization from the left

Why filtering? Once we know , we

don’t have to know (or keep) .

Seoul National University

Kalman Filter

• Time update

• Measurement update

53

Time update

Similar to the unconstrained distribution calculation!

Seoul National University

Kalman Filter

54

Also,

Joint:

Seoul National University

Kalman Filter

• Sum - ups

55

Measurement update (Conditional density)

Seoul National University

Kalman Filter

• With different notation,

• Alternative form of Kt+1

56

Seoul National University

Kalman Filter

57

http://afrl.dodlive.mil/2012/04/03/the-kalman-filter-a-tool-we-use-everyday/

http://www.wpafb.af.mil/News/Article-Display/Article/401306/computer-mouse-

kalman-filter-trace-origins-to-air-force-basic-research-funding/

Seoul National University

Markov Random Field

• Undirected Graph

58

X1

X2 X3

X5X4

If there is a direct edge

between Xi and Xj:

Xi ??= XjjXni;j

If there is no direct edge

between Xi and Xj:

Xi ??XjjXni;j

X1??= X3jX2;X4;X5

X1??X5jX3

Seoul National University

Joint Distribution

• Product of functions on cliques

59

X1

X2 X3

X5X4

P(X1;X2;X3;X4;X5) =

1

ZÃ1;2(X1;X2) Ã1;3(X1;X3) Ã3;4;5(X3;X4;X5)

0

@Z =X

X1;X2;X3;X4;X5

Ã1;2(X1;X2)Ã1;3(X1;X3)Ã3;4;5(X3;X4;X5)

1

A

X1

X2

X1

X3

X3

X5X4

Cliques:

The set of distributions satisfying MRF conditions (Markov random field)

= The set of distributions decomposed by cliques (Gibbs random field)

(Hammersley-Clifford Theorem)

Seoul National University

More Fancy Models

• Latent Variable Model

• Filtering

60

Z1 Z2 Z3 Zk¡1 Zk…

Y1 Y2 Y3 Yk¡1 Yk

Z

Y

X

Z

Y

XZ

X

Seoul National University

Topic Models

61

Seoul National University

Topic Models

62

http://machinelearning.snu.ac.kr/NIPS2015/NIPS2015_Accepted/nipsnice.html

http://cs.stanford.edu/people/karpathy/nipspreview/

Seoul National University

Naïve Bayes for Classification

63

P (X;Y ) = P (Y )P (X1jY ) : : : P (XDjY )

Y

Seoul National University

Undirected Graphical Models

64

Independent

Seoul National University

Undirected Graphical Models

• Find all maximal cliques:

65

Seoul National University

Undirected Graphical Models

• Potential functions on cliques

66

(X1, X2, …: maximal cliques)

Discrete

ContinuousPartition function

Seoul National University

Undirected Graphical Models

67

2-cliques

1 1 1 2

1 1 0 0

1 0 1 0

1 0 0 1

0 1 1 2

0 1 0 0

0 0 1 0

0 0 0 1

1 1 1

1 0 0

0 1 0

0 0 3

X 3 X 4 © X 3 ; X 4

Ex.

Seoul National University

Estimating Parameters

68

2-cliques

1 1 1

1 1 0

1 0 1

1 0 0

0 1 1

0 1 0

0 0 1

0 0 0

1 1

1 0

0 1

0 0

1 1 1 1

1 1 1 0

1 1 0 1

… …

12 parameters

: 8

: 4

Without graphical model: 15 parameters (24 – 1)

……

Seoul National University

Conditional Independency

69

= P (X3 = 1jX2 = 1)

Seoul National University

Conditional Independency

70

2-cliques

1 1 1 1

1 1 1 0

1 0 1 1

1 0 1 0

0 1 1 1

0 1 1 0

0 0 1 1

0 0 1 01 1

1 0

1 1 1

1 0 1

0 1 1

0 0 1

All answers are the same:

Seoul National University

Marginalization

71

Any good property like

Seoul National University

Marginalization

72

Any decomposition with

P (XDjX1; : : : ;Xi¡1)

Seoul National University

Marginalization

73

P (XDjX1; : : : ;Xi¡1)

Marginalize Xi

Seoul National University

Introducing Latent Variables

• Issue: how can a model be simplified as

much as possible, while the flexibility is kept

enough to incorporate the true dependency.

74

Seoul National University

Expectation-Maximization Algorithm

• Parameter estimation with latent variables

– We don’t have data for latent variables

• E-step:

– Data for latent variables are obtained from

expectation with current parameter values.

• M-step:

– With expected latent variables, parameters

are obtained by maximizing the likelihood.

• E-step and M-step are repeated back and

forth until the likelihood converges.

75

Seoul National University

Expectation-Maximization Algorithm

• Gaussian mixture model

76

Parameters: for

Unknown variables: for

We are given for

E-step: Distribution of (Responsibilities)

using current parameters

j

i

Seoul National University

Expectation-Maximization Algorithm

77

M-step: Estimate parameters.

for

Iterate until converges.

Seoul National University

Gaussian Mixture Model With EM

78

C. Bishop 2007, Figure 9.8(a)-(f)

Seoul National University

ANY QUESTIONS?

79

Seoul National University

THANK YOU

Yung-Kyun Noh

[email protected]

80