![Page 1: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/1.jpg)
P(B|A) * P(A)
P(B) P(A|B) =
Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418
…by no means merely a curious speculation in the doctrine of chances, but necessary to be solved in order to a sure foundation for all our reasonings concerning past facts, and what is likely to be hereafter…. necessary to be considered by any that would give a clear account of the strength of analogical or inductive reasoning…
Bayes� rule
we call P(A) the �prior� and P(A|B) the �posterior�
![Page 2: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/2.jpg)
Other Forms of Bayes Rule
)(~)|~()()|()()|()|(
APABPAPABPAPABPBAP
+=
)()()|()|(
XBPXAPXABPXBAP
∧
∧∧=∧
![Page 3: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/3.jpg)
Parameter Estimation
• How to estimate parameters from data?
39
Maximum Likelihood Principle:
Choose the parameters that maximize the probability of the observed data!
![Page 4: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/4.jpg)
Maximum Likelihood Estimation Recipe
1. Use the log-likelihood 2. Differentiate with
respect to the parameters
3. *Equate to zero and solve
40
*Often requires numerical approximation (no closed form solution)
![Page 5: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/5.jpg)
An Example
• Let’s start with the simplest possible case – Single observed variable – Flipping a bent coin
41
![Page 6: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/6.jpg)
An Example
• Let’s start with the simplest possible case – Single observed variable – Flipping a bent coin
41
• We Observe: – Sequence of heads or tails – HTTTTTHTHT
• Goal: – Estimate the probability that
the next flip comes up heads
![Page 7: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/7.jpg)
Assumptions
• Fixed parameter – Probability that a flip comes up heads
• Each flip is independent – Doesn’t affect the outcome of other flips
• (IID) Independent and Identically Distributed
42
![Page 8: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/8.jpg)
Example
• Let’s assume we observe the sequence:– HTTTTTHTHT
• What is the best value of ?– Probability of heads
43
![Page 9: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/9.jpg)
Example
• Let’s assume we observe the sequence:– HTTTTTHTHT
• What is the best value of ?– Probability of heads
• Intuition: should be 0.3 (3 out of 10)• Question: how do we justify this?
43
![Page 10: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/10.jpg)
Maximum Likelihood Principle
• The value of which maximizes the probability of the observed data is best!
• Based on our assumptions, the probability of “HTTTTTHTHT” is:
44
![Page 11: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/11.jpg)
Maximum Likelihood Principle
• The value of which maximizes the probability of the observed data is best!
• Based on our assumptions, the probability of “HTTTTTHTHT” is:
44
![Page 12: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/12.jpg)
Maximum Likelihood Principle
• The value of which maximizes the probability of the observed data is best!
• Based on our assumptions, the probability of “HTTTTTHTHT” is:
44
This is the Likelihood Function
![Page 13: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/13.jpg)
Maximum Likelihood Principle• Probability of “HTTTTTHTHT” as a function of
45
![Page 14: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/14.jpg)
Maximum Likelihood Principle• Probability of “HTTTTTHTHT” as a function of
45
![Page 15: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/15.jpg)
Maximum Likelihood Principle• Probability of “HTTTTTHTHT” as a function of
45
Θ=0.3
![Page 16: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/16.jpg)
Maximum Likelihood Principle• Probability of “HTTTTTHTHT” as a function
of
46
Θ=0.3
![Page 17: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/17.jpg)
Maximum Likelihood value of
47
Log Identities
![Page 18: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/18.jpg)
Maximum Likelihood value of
48
![Page 19: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/19.jpg)
Maximum Likelihood value of
48
![Page 20: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/20.jpg)
The problem with Maximum Likelihood
• What if the coin doesn’t look very bent?– Should be somewhere around 0.5?
• What if we saw 3,000 heads and 7,000 tails?– Should this really be the same as 3 out of 10?
49
![Page 21: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/21.jpg)
The problem with Maximum Likelihood
• What if the coin doesn’t look very bent?– Should be somewhere around 0.5?
• What if we saw 3,000 heads and 7,000 tails?– Should this really be the same as 3 out of 10?
• Maximum Likelihood
49
![Page 22: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/22.jpg)
The problem with Maximum Likelihood
• What if the coin doesn’t look very bent?– Should be somewhere around 0.5?
• What if we saw 3,000 heads and 7,000 tails?– Should this really be the same as 3 out of 10?
• Maximum Likelihood– No way to quantify our uncertainty.
49
![Page 23: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/23.jpg)
The problem with Maximum Likelihood
• What if the coin doesn’t look very bent?– Should be somewhere around 0.5?
• What if we saw 3,000 heads and 7,000 tails?– Should this really be the same as 3 out of 10?
• Maximum Likelihood– No way to quantify our uncertainty.– No way to incorporate our prior knowledge!
49
![Page 24: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/24.jpg)
The problem with Maximum Likelihood
• What if the coin doesn’t look very bent?– Should be somewhere around 0.5?
• What if we saw 3,000 heads and 7,000 tails?– Should this really be the same as 3 out of 10?
• Maximum Likelihood– No way to quantify our uncertainty.– No way to incorporate our prior knowledge!
49
Q: how to deal with this problem?
![Page 25: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/25.jpg)
Bayesian Parameter Estimation
• Let’s just treat like any other variable
• Put a prior on it! – Encode our prior knowledge about possible
values of using a probability distribution
• Now consider two probability distributions:
50
![Page 26: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/26.jpg)
Posterior Over
51
![Page 27: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/27.jpg)
Posterior Over
51
![Page 28: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/28.jpg)
Posterior Over
51
My rule is so cool! !
![Page 29: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/29.jpg)
How can we encode prior knowledge?
• Example: The coin doesn’t look very bent – Assign higher probability to values of near 0.5
• Solution: The Beta Distribution
52
![Page 30: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/30.jpg)
How can we encode prior knowledge?
• Example: The coin doesn’t look very bent – Assign higher probability to values of near 0.5
• Solution: The Beta Distribution
52
![Page 31: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/31.jpg)
How can we encode prior knowledge?
• Example: The coin doesn’t look very bent – Assign higher probability to values of near 0.5
• Solution: The Beta Distribution
52
Hyper-Parameters
![Page 32: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/32.jpg)
How can we encode prior knowledge?
• Example: The coin doesn’t look very bent – Assign higher probability to values of near 0.5
• Solution: The Beta Distribution
52
Hyper-Parameters
![Page 33: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/33.jpg)
How can we encode prior knowledge?
• Example: The coin doesn’t look very bent – Assign higher probability to values of near 0.5
• Solution: The Beta Distribution
52
Gamma is a continuous
generalization of the Factorial
Function
Hyper-Parameters
![Page 34: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/34.jpg)
Beta Distribution
53
Beta(5,5) Beta(100,100)
![Page 35: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/35.jpg)
Beta Distribution
53
Beta(5,5) Beta(100,100)
Beta(0.5,0.5) Beta(1,1)
![Page 36: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/36.jpg)
MAP Estimate
54
=#H + ↵� 1
#T +#H + ↵+ � � 2
✓MAP = argmax✓
P (✓|D)
![Page 37: P(B|A) * P(A) P(B) prior · 2019-12-10 · towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means](https://reader031.vdocument.in/reader031/viewer/2022041816/5e5b38f6bbea6d029153e124/html5/thumbnails/37.jpg)
MAP Estimate
54
=#H + ↵� 1
#T +#H + ↵+ � � 2
✓MAP = argmax✓
P (✓|D) -Add-N smoothing -Pseudo-counts