parameter estimation - i-systems.github.ioi-systems.github.io/hse545/machine learning all/11...

Parameter Estimation

Industrial AI Lab.

Generative Model

2

Y

𝑦 = 𝜔𝑇𝑥 + 𝜀

𝜀~𝒩(0, 𝜎2)

𝜎2

wX

Maximum Likelihood Estimation (MLE)

• Estimate parameters 𝜃 𝜔, 𝜎2 such that maximize the likelihood given a generative model

– Given observed data

– Generative model structure (assumption)

3


• Find parameters 𝜔 and 𝜎 that maximize the likelihood over the observed data

• Likelihood:

• Perhaps the simplest (but widely used) parameter estimation method

4

Drawn from a Gaussian Distribution

5

• You will often see the following derivation

Drawn from a Gaussian Distribution

• To maximize, 𝜕ℓ

𝜕𝜇= 0,

𝜕ℓ

𝜕𝜎= 0

• BIG Lesson– We often compute a mean and variance to represent data statistics

– We kind of assume that a data set is Gaussian distributed

– Good news: sample mean is Gaussian distributed by the central limit theorem

6

Numerical Example

• Compute the likelihood function, then– Maximize the likelihood function

– Adjust the mean and variance of the Gaussian to maximize its product

7

Numerical Example

8

Numerical Example for Gaussian

9

When Mean is Unknown

10

When Variance is Unknown

11

Probabilistic Machine Learning

• Probabilistic Machine Learning– I personally believe this is a more fundamental way of looking at machine

learning

• Maximum Likelihood Estimation (MLE)

• Maximum a Posterior (MAP)

• Probabilistic Regression

• Probabilistic Classification

• Probabilistic Clustering

• Probabilistic Dimension Reduction

12


13

Linear Regression: A Probabilistic View

• Linear regression model with (Gaussian) normal errors

14


• BIG Lesson– Same as the least squared optimization

15


16


17


18


19


20

Maximum a Posterior (MAP)

21

Data Fusion with Uncertainties

• Learning Theory (Reza Shadmehr, Johns Hopkins University)– youtube link

• In a matrix form

22

𝑦𝑎 𝑦𝑏

X

http://www.shadmehrlab.org/Courses/learningtheory.html

https://www.youtube.com/watch?v=52jlBrAcw9Q


• Find ො𝑥𝑀𝐿

• 𝐶𝑇𝑅−1𝐶 −1𝐶𝑇𝑅−1

23


24

Data Fusion with Less Uncertainties

• Summary

• BIG Lesson:

– Two sensors are better than one sensor ⟹ less uncertainties

– Accuracy or uncertainty information is also important in sensors

25

𝜇𝑎 ො𝑥𝑀𝐿 𝜇𝑏 𝜇𝑎 ො𝑥𝑀𝐿 𝜇𝑏

𝜎𝑎2 = 𝜎𝑏

2𝜎𝑎2 > 𝜎𝑏

2

1D Examples

• Example of Two Rulers

• How brain works on human measurements from both hapticand visual channels

26

Data Fusion with 1D Example

27

Data Fusion with 2D Example

28

Maximum-a-Posterior Estimation (MAP)

• Choose 𝜃 that maximizes the posterior probability of 𝜃(i.e. probability in the light of the observed data)

• Posterior probability of 𝜃 is given by the Bayes Rule

– 𝑃 𝜃 : Prior probability of 𝜃 (without having seen any data)

– 𝑃 𝐷|𝜃 : Likelihood

– 𝑃 𝐷 : Probability of the data (independent of 𝜃 )

• The Bayes rule lets us update our belief about 𝜃 in the light of observed data

29

Maximum-a-Posterior Estimation (MAP)

• While doing MAP, we usually maximize the log of the posterior

probability

• for multiple observations 𝐷 = 𝑑1, 𝑑2, ⋯ , 𝑑𝑚

• same as MLE except the extra log-prior-distribution term

• MAP allows incorporating our prior knowledge about 𝜃 in its estimation

30

MAP for mean of a univariate Gaussian

• Suppose that 𝜃 is a random variable with 𝜃~𝑁 𝜇, 12 , but a prior knowledge (unknown 𝜃 and known 𝜇, 𝜎2)

– Observations 𝐷 = 𝑑1, 𝑑2, ⋯ , 𝑑𝑚 : conditionally independent given 𝜃

– Joint Probability

31


• MAP: choose 𝜃𝑀𝐴𝑃

32


33


• ML interpretation:

• BIG Lesson: a prior acts as a data

• Note: prior knowledge– Education

– Get older

– School ranking

34

𝜇 ത𝑋

𝜃𝑀𝐴𝑃

𝑚 = 0 𝑚 → ∞


Example) Experiment in class

• Which one do you think is heavier?– with eyes closed

– with visual inspection

– with haptic (touch) inspection

35

MAP Python code

• Suppose that 𝜃 is a random variable with 𝜃~𝑁 𝜇, 12 , but a prior knowledge (unknown 𝜃 and known 𝜇, 𝜎2)

– for mean of a univariate Gaussian

36

MAP Python code

37

MAP Python code

38

MAP Python code

39

Object Tracking in Computer Vision

• Optional

• Lecture: Introduction to Computer Vision by Prof. Aaron Bobickat Georgia Tech

40

Object Tracking in Computer Vision

41

Kernel Density Estimation

• non-parametric estimate of density

• Lecture: Learning Theory (Reza Shadmehr, Johns Hopkins University)

42

Kernel Density Estimation

43

parameter estimation - i-systems.github.ioi-systems.github.io/hse545/machine learning all/11...

Documents