what is bayesian statistics and how is it different?
TRANSCRIPT
Goal
● Clarify the difference between “classical and Bayesian Statistics
● Lay out the pro/con with this “attitude”
One sentence definition
Bayesian statistics is a mathematical framework to update beliefs as you observe more data.
Movie cliché: Am I pregnant?
● What did I do in the past month?– Forms a prior belief of whether I am pregnant
Movie cliché: Am I pregnant?
● What did I do in the past month?– Forms a prior belief of whether I am pregnant
● The missing period– Data!
Movie cliché: Am I pregnant?
● What did I do in the past month?– Forms a prior belief of whether I am pregnant
● The missing period– Data!
● Belief is updated as more data is observed!
Bayesian terminology
● Prior: your belief about pregnancy before seeing new data
● Data: missing period● Posterior: your belief that is updated after
seeing the data
How do we formalize this update?
● Pregnant is a uncertain event with two outcomes: Yes or No
● “Days delayed of period” is a data point– If (Pregnant = Yes), delayed ~ 30*9 days
– If (Pregnant = No), it might come sooner
Mathematical framework
● “Pregnant” is a random variable: – P(Pregnant = Yes) = X
– P(Pregnant = No) = (1 - X)
Mathematical framework
● “Pregnant” is a random variable: – P(Pregnant = Yes) = X
– P(Pregnant = No) = (1 - X)
● “Days delayed of period” is another random variable!
– P(days delay >= 7 days | Pregnant) = 1
– P(days delay >= 7 days | Not Pregnant) = Y
Simplify
● Start with the objective:
Am I pregnant?i.e. P(Pregnant | Data)?
● Note all the numbers we know are the form of P( **** | Pregnant)
Conditional Probability!
P(Pregnant | Data)
= P(Data | Pregnant) P(Pregnant) / P(Data)
Immediate implication:● If your prior says you cannot be pregnant,
your belief cannot be changed!
“Bayes Rule”
P(Pregnant | Data)
= P(Data | Pregnant) P(Pregnant) / P(Data)
= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]
“Bayes Rule”
P(Pregnant | Data)
= P(Data | Pregnant) P(Pregnant) / P(Data)
= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]
Why add more numbers?
P(Data) was hard to compute, so chop it into pieces we know!
P(Data): Big Issue for Bayesians
● Pregnant is binary which made this realllllly easy
● In general, a lot of “tricks” are trying to– solve for P(Data)
● Belief propagation in graphical models
– getting around it● Sampling: MCMC● Approximation: Variational Bayes
Back to the key question:
P(Pregnant | Data)
= P(Data | Pregnant) P(Pregnant) / [ P(Data | Pregnant) P(Pregnant) + P(Data | Not Pregnant) P(Not Pregnant) ]
= 1 * X / [ 1 * X + Y * (1 - X) ]
Can add more data….....almost for free!
● Notice “Data” is quite general:– Can add pregnancy strips data to further
update beliefs!
– Treat previous outputs as priors then update similarly!
So.....what's the big deal?
● Your belief matters a lot!– Your prior changes the outcome
● Your prior and my prior may be different
What “could” a bad Frequentist Do?
● Calculate the p-value for you, i.e.
P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%
What “could” a bad Frequentist Do?
● Calculate the p-value for you, i.e.
P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%● Declaration has 5% false positive and a
certain false negative rates
What “could” a bad Frequentist Do?
● Calculate the p-value for you, i.e.
P(Late period | Not Pregnant)● Declare that you're Pregnant if this is <= 5%● Declaration has 5% false positive and a
certain false negative rates
● Issue: Not as relevant to you! Rates are for all the people using this procedure...not specific to your case!
“not as relevant”?
● There's no consideration of your specific case– There was no P(Pregnant) in the p-value
calculation
– You could be really sure that you're not pregnant....doesn't change the calculation!
What would a Frequentist say?
● P(Pregnant) = 100% or 0%– Fixed but unknown
– NOT uncertain
● …Not actually interested in a single event– Probabilities are defined for repeated events
– Will not write down P(Pregnant | Data)
– For your one case, anything could be true
What would a Frequentist say?
● P(Pregnant) = 100% or 0%– Fixed but unknown
– NOT uncertain
● …Not actually interested in a single event– Probabilities are defined for repeated events
– Will not write down P(Pregnant | Data)
– For your one case, anything could be true
● Would say “Go talk to a doctor”
Key difference
● “Attitude”– What can be a random variable?
● Bayesian: Uncertain events● Frequentist: Repeatable events
Implications of this attitude
● Bayesian:– Can incorporate prior knowledge easily
– Can update beliefs easily
– Can tackle a wider class of problems since probabilities are “beliefs”
Implications of this attitude
● Bayesian:– Can incorporate prior knowledge easily
– Can update beliefs easily
– Can tackle a wider class of problems since probabilities are “beliefs”
– Must specify a model
– Your belief can be different from mine● Our answers will be different!
Implications of this attitude
● Frequentist:– Probabilities are more objective
– Harder to cheat
– Has non-parametric methods
Implications of this attitude
● Frequentist:– Probabilities are more objective
– Harder to cheat
– Has non-parametric methods
– Focused on repeatable events
– Prior knowledge is introduced in an ad hoc format
– Usually need lots of data
In the end...
● Frequentist and Bayesian use the same rules of probabilities
● Difference exists in set-up: “What is random?”– Bayesians: uncertainty in knowledge
– Frequentist: intrinsic randomness