parameter estimation - i-systems.github.ioi-systems.github.io/hse545/machine learning all/11...
TRANSCRIPT
Parameter Estimation
Industrial AI Lab.
Generative Model
2
Y
𝑦 = 𝜔𝑇𝑥 + 𝜀
𝜀~𝒩(0, 𝜎2)
𝜎2
wX
Maximum Likelihood Estimation (MLE)
• Estimate parameters 𝜃 𝜔, 𝜎2 such that maximize the likelihood given a generative model
– Given observed data
– Generative model structure (assumption)
3
Maximum Likelihood Estimation (MLE)
• Find parameters 𝜔 and 𝜎 that maximize the likelihood over the observed data
• Likelihood:
• Perhaps the simplest (but widely used) parameter estimation method
4
Drawn from a Gaussian Distribution
5
• You will often see the following derivation
Drawn from a Gaussian Distribution
• To maximize, 𝜕ℓ
𝜕𝜇= 0,
𝜕ℓ
𝜕𝜎= 0
• BIG Lesson– We often compute a mean and variance to represent data statistics
– We kind of assume that a data set is Gaussian distributed
– Good news: sample mean is Gaussian distributed by the central limit theorem
6
Numerical Example
• Compute the likelihood function, then– Maximize the likelihood function
– Adjust the mean and variance of the Gaussian to maximize its product
7
Numerical Example
8
Numerical Example for Gaussian
9
When Mean is Unknown
10
When Variance is Unknown
11
Probabilistic Machine Learning
• Probabilistic Machine Learning– I personally believe this is a more fundamental way of looking at machine
learning
• Maximum Likelihood Estimation (MLE)
• Maximum a Posterior (MAP)
• Probabilistic Regression
• Probabilistic Classification
• Probabilistic Clustering
• Probabilistic Dimension Reduction
12
Maximum Likelihood Estimation (MLE)
13
Linear Regression: A Probabilistic View
• Linear regression model with (Gaussian) normal errors
14
Linear Regression: A Probabilistic View
• BIG Lesson– Same as the least squared optimization
15
Linear Regression: A Probabilistic View
16
Linear Regression: A Probabilistic View
17
Linear Regression: A Probabilistic View
18
Linear Regression: A Probabilistic View
19
Linear Regression: A Probabilistic View
20
Maximum a Posterior (MAP)
21
Data Fusion with Uncertainties
• Learning Theory (Reza Shadmehr, Johns Hopkins University)– youtube link
• In a matrix form
22
𝑦𝑎 𝑦𝑏
X
Data Fusion with Uncertainties
• Find ො𝑥𝑀𝐿
• 𝐶𝑇𝑅−1𝐶 −1𝐶𝑇𝑅−1
23
Data Fusion with Uncertainties
24
Data Fusion with Less Uncertainties
• Summary
• BIG Lesson:
– Two sensors are better than one sensor ⟹ less uncertainties
– Accuracy or uncertainty information is also important in sensors
25
𝜇𝑎 ො𝑥𝑀𝐿 𝜇𝑏 𝜇𝑎 ො𝑥𝑀𝐿 𝜇𝑏
𝜎𝑎2 = 𝜎𝑏
2𝜎𝑎2 > 𝜎𝑏
2
1D Examples
• Example of Two Rulers
• How brain works on human measurements from both hapticand visual channels
26
Data Fusion with 1D Example
27
Data Fusion with 2D Example
28
Maximum-a-Posterior Estimation (MAP)
• Choose 𝜃 that maximizes the posterior probability of 𝜃(i.e. probability in the light of the observed data)
• Posterior probability of 𝜃 is given by the Bayes Rule
– 𝑃 𝜃 : Prior probability of 𝜃 (without having seen any data)
– 𝑃 𝐷|𝜃 : Likelihood
– 𝑃 𝐷 : Probability of the data (independent of 𝜃 )
• The Bayes rule lets us update our belief about 𝜃 in the light of observed data
29
Maximum-a-Posterior Estimation (MAP)
• While doing MAP, we usually maximize the log of the posterior
probability
• for multiple observations 𝐷 = 𝑑1, 𝑑2, ⋯ , 𝑑𝑚
• same as MLE except the extra log-prior-distribution term
• MAP allows incorporating our prior knowledge about 𝜃 in its estimation
30
MAP for mean of a univariate Gaussian
• Suppose that 𝜃 is a random variable with 𝜃~𝑁 𝜇, 12 , but a prior knowledge (unknown 𝜃 and known 𝜇, 𝜎2)
– Observations 𝐷 = 𝑑1, 𝑑2, ⋯ , 𝑑𝑚 : conditionally independent given 𝜃
– Joint Probability
31
MAP for mean of a univariate Gaussian
• MAP: choose 𝜃𝑀𝐴𝑃
32
MAP for mean of a univariate Gaussian
33
MAP for mean of a univariate Gaussian
• ML interpretation:
• BIG Lesson: a prior acts as a data
• Note: prior knowledge– Education
– Get older
– School ranking
34
𝜇 ത𝑋
𝜃𝑀𝐴𝑃
𝑚 = 0 𝑚 → ∞
MAP for mean of a univariate Gaussian
Example) Experiment in class
• Which one do you think is heavier?– with eyes closed
– with visual inspection
– with haptic (touch) inspection
35
MAP Python code
• Suppose that 𝜃 is a random variable with 𝜃~𝑁 𝜇, 12 , but a prior knowledge (unknown 𝜃 and known 𝜇, 𝜎2)
– for mean of a univariate Gaussian
36
MAP Python code
37
MAP Python code
38
MAP Python code
39
Object Tracking in Computer Vision
• Optional
• Lecture: Introduction to Computer Vision by Prof. Aaron Bobickat Georgia Tech
40
Object Tracking in Computer Vision
41
Kernel Density Estimation
• non-parametric estimate of density
• Lecture: Learning Theory (Reza Shadmehr, Johns Hopkins University)
42
Kernel Density Estimation
43