Download - Introduction to Bayesian Statistics
![Page 1: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/1.jpg)
Introduction to Bayesian Statistics
Harry R. Erwin, PhD
School of Computing and Technology
University of Sunderland
![Page 2: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/2.jpg)
Resources
• Albert, Jim (2007) Bayesian Computation with R, Springer.
• Ntzoufras, Ioannis (2009) Bayesian Modeling Using WinBUGS, Wiley.
• Kéry, Marc (2010) Introduction to WinBUGS for Ecologists, Academic Press.
![Page 3: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/3.jpg)
Topics
• Probability
• Bayes’ Theorem
• Bayesian Statistics
![Page 4: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/4.jpg)
Basic Definitions
• Suppose Ω is a sample space—the set of outcomes, ω, of an experiment—for example the possible results of flipping a coin or rolling a die.
• Suppose F is the collection of possible events (subsets of Ω) involving outcomes in Ω, including:– An empty event, Ø, with no outcomes belonging to it.– Simple events consisting of single outcomes.– Complex events consisting of multiple alternative outcomes (e.g., rolling
an even number on a six-sided die).• An event in F that combines the outcomes in two other events, A and
B, is called the union of A and B and is written A∪B. • An event in F made up of the outcomes present in both A and B is
known as the intersection of A and B and is written A∩B.
![Page 5: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/5.jpg)
Probability Measures
• A probability measure, P, is a function that assigns to each event in F, a real number between 0 and 1, called the probability of the event, that satisfies the following requirements:– P(Ω) = 1– If there are two disjoint events, A and B in F—that
is, A∩B = Ø—then P(A∪B) = P(A)+P(B). (This rule must also be true for any countable number of pairwise disjoint events.)
![Page 6: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/6.jpg)
Conditional Probability
• The “conditional probability of B given A”, written P(B|A), describes the probability of an outcome being in B given that it is known to be in A.
• P(B|A) = P(A∩B)/P(A).• For example, let A be even die rolls of a fair six-
sided die, and B be die rolls that are a multiple of three.
• A={2,4,6}, B={3,6}, A∩B={6}, and P(B|A) = P(A∩B)/P(A)=(1/6)/(1/2) = 1/3.
![Page 7: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/7.jpg)
Probability Models
• The triple <Ω, F, P> is called a probability model.
• Some theorems can be easily proven:1. Let Ø be a event in which there are no outcomes.
Then P(Ø)=0. 2. Define ¬A to be an event consisting of all the
outcomes not in A. Then P(¬A) = 1 – P(A).3. If A∩B = Ø, P(A∪B) = P(A)+P(B).
![Page 8: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/8.jpg)
Bayes’ Theorem
• Bayes’ theorem is a provable consequence of these axioms (Wikipedia):
• That is, the probability of A given B is the probability of B given A multiplied by the probability of A and divided by the probability of B.
• Also, P(A|B) ∝ P(B|A)P(A)
![Page 9: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/9.jpg)
What Are Bayesian Statistics?
• Bayesian statistics are the working out of the implications of Bayes’ Theorem.
• They allow you to deduce the posteriori (afterwards) probability distribution of an event if you know the prior (before) probability distribution of the event and have some additional information.
• It’s a theorem, so it is always true.
![Page 10: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/10.jpg)
Why is Bayes’ Theorem Useful?
• If the prior probability distribution is ‘vague’ or ‘noninformative’, you can incrementally add information to produce a posterior distribution that reflects just the information. That posterior distribution is very similar to the distribution you would come up with using classical statistics.
• If you start with real information in your prior, that is also taken into account, which is even more useful.
![Page 11: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/11.jpg)
Density Functions
• You often have a ‘density’ function that is a good model of how events are distributed. A few typical density functions include:1. The binomial distribution (one parameter, the probability
of a ‘heads’)2. The Poisson distribution (one parameter, the mean number
of occurrences in a unit time interval)3. The exponential distribution (one parameter, the rate)4. The normal distribution (two parameters, the mean and the
variance)5. The uniform distribution (two parameters, the beginning
and the end)
![Page 12: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/12.jpg)
Likelihood
• Suppose you have an event, ω, drawn from a process described by a density. The probability of the event is then the value of the density for that event.
• This is the ‘likelihood’ of that event.• If you have multiple samples, the corresponding
likelihood is the product of the density values for each of the events.
![Page 13: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/13.jpg)
Maximum Likelihood
• Suppose you have a probability density function, f(x,θ), and θ is a parameter, such as the mean, that you want to estimate. If you have n data samples, xn, the most likely value of θ is the one that maximizes the value of the likelihood.
• Mathematically, the likelihood is Πf(xn, θ), the joint distribution of the sample and the product of all the f(xn, θ) functions.
• You can often use calculus to calculate θ.
![Page 14: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/14.jpg)
Bayes and Maximum Likelihood
• Suppose you have a prior distribution, f(θ), and some data, described by a likelihood function, li(data|θ). The posterior distribution, f(θ|data), can be calculated by applying Bayes’ Theorem.– f(θ|data) ∝li(data|θ)f(θ)
![Page 15: Introduction to Bayesian Statistics](https://reader036.vdocument.in/reader036/viewer/2022082505/56812b25550346895d8f25d9/html5/thumbnails/15.jpg)
Worked Example
• 51 smokers in 83 cases of lung cancer• 23 smokers in 70 disease-free• P(smoker|case) = 51/83• P(smoker|control) = 23/70• P(case|smoker) = P(smoker|case)P(case)/
(P(smoker|case)P(case)+P(smoker|control)P(control))• Relative risk = RR = P(case|smoker)/
P(case|nonsmoker) = 1.87