introduction to bayesian statistics

15
Introduction to Bayesian Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Upload: tate-ball

Post on 30-Dec-2015

43 views

Category:

Documents


1 download

DESCRIPTION

Introduction to Bayesian Statistics. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Albert, Jim (2007) Bayesian Computation with R, Springer. Ntzoufras, Ioannis (2009) Bayesian Modeling Using WinBUGS, Wiley. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Bayesian Statistics

Introduction to Bayesian Statistics

Harry R. Erwin, PhD

School of Computing and Technology

University of Sunderland

Page 2: Introduction to Bayesian Statistics

Resources

• Albert, Jim (2007) Bayesian Computation with R, Springer.

• Ntzoufras, Ioannis (2009) Bayesian Modeling Using WinBUGS, Wiley.

• Kéry, Marc (2010) Introduction to WinBUGS for Ecologists, Academic Press.

Page 3: Introduction to Bayesian Statistics

Topics

• Probability

• Bayes’ Theorem

• Bayesian Statistics

Page 4: Introduction to Bayesian Statistics

Basic Definitions

• Suppose Ω is a sample space—the set of outcomes, ω, of an experiment—for example the possible results of flipping a coin or rolling a die.

• Suppose F is the collection of possible events (subsets of Ω) involving outcomes in Ω, including:– An empty event, Ø, with no outcomes belonging to it.– Simple events consisting of single outcomes.– Complex events consisting of multiple alternative outcomes (e.g., rolling

an even number on a six-sided die).• An event in F that combines the outcomes in two other events, A and

B, is called the union of A and B and is written A∪B. • An event in F made up of the outcomes present in both A and B is

known as the intersection of A and B and is written A∩B.

Page 5: Introduction to Bayesian Statistics

Probability Measures

• A probability measure, P, is a function that assigns to each event in F, a real number between 0 and 1, called the probability of the event, that satisfies the following requirements:– P(Ω) = 1– If there are two disjoint events, A and B in F—that

is, A∩B = Ø—then P(A∪B) = P(A)+P(B). (This rule must also be true for any countable number of pairwise disjoint events.)

Page 6: Introduction to Bayesian Statistics

Conditional Probability

• The “conditional probability of B given A”, written P(B|A), describes the probability of an outcome being in B given that it is known to be in A.

• P(B|A) = P(A∩B)/P(A).• For example, let A be even die rolls of a fair six-

sided die, and B be die rolls that are a multiple of three.

• A={2,4,6}, B={3,6}, A∩B={6}, and P(B|A) = P(A∩B)/P(A)=(1/6)/(1/2) = 1/3.

Page 7: Introduction to Bayesian Statistics

Probability Models

• The triple <Ω, F, P> is called a probability model.

• Some theorems can be easily proven:1. Let Ø be a event in which there are no outcomes.

Then P(Ø)=0. 2. Define ¬A to be an event consisting of all the

outcomes not in A. Then P(¬A) = 1 – P(A).3. If A∩B = Ø, P(A∪B) = P(A)+P(B).

Page 8: Introduction to Bayesian Statistics

Bayes’ Theorem

• Bayes’ theorem is a provable consequence of these axioms (Wikipedia):

• That is, the probability of A given B is the probability of B given A multiplied by the probability of A and divided by the probability of B.

• Also, P(A|B) ∝ P(B|A)P(A)

Page 9: Introduction to Bayesian Statistics

What Are Bayesian Statistics?

• Bayesian statistics are the working out of the implications of Bayes’ Theorem.

• They allow you to deduce the posteriori (afterwards) probability distribution of an event if you know the prior (before) probability distribution of the event and have some additional information.

• It’s a theorem, so it is always true.

Page 10: Introduction to Bayesian Statistics

Why is Bayes’ Theorem Useful?

• If the prior probability distribution is ‘vague’ or ‘noninformative’, you can incrementally add information to produce a posterior distribution that reflects just the information. That posterior distribution is very similar to the distribution you would come up with using classical statistics.

• If you start with real information in your prior, that is also taken into account, which is even more useful.

Page 11: Introduction to Bayesian Statistics

Density Functions

• You often have a ‘density’ function that is a good model of how events are distributed. A few typical density functions include:1. The binomial distribution (one parameter, the probability

of a ‘heads’)2. The Poisson distribution (one parameter, the mean number

of occurrences in a unit time interval)3. The exponential distribution (one parameter, the rate)4. The normal distribution (two parameters, the mean and the

variance)5. The uniform distribution (two parameters, the beginning

and the end)

Page 12: Introduction to Bayesian Statistics

Likelihood

• Suppose you have an event, ω, drawn from a process described by a density. The probability of the event is then the value of the density for that event.

• This is the ‘likelihood’ of that event.• If you have multiple samples, the corresponding

likelihood is the product of the density values for each of the events.

Page 13: Introduction to Bayesian Statistics

Maximum Likelihood

• Suppose you have a probability density function, f(x,θ), and θ is a parameter, such as the mean, that you want to estimate. If you have n data samples, xn, the most likely value of θ is the one that maximizes the value of the likelihood.

• Mathematically, the likelihood is Πf(xn, θ), the joint distribution of the sample and the product of all the f(xn, θ) functions.

• You can often use calculus to calculate θ.

Page 14: Introduction to Bayesian Statistics

Bayes and Maximum Likelihood

• Suppose you have a prior distribution, f(θ), and some data, described by a likelihood function, li(data|θ). The posterior distribution, f(θ|data), can be calculated by applying Bayes’ Theorem.– f(θ|data) ∝li(data|θ)f(θ)

Page 15: Introduction to Bayesian Statistics

Worked Example

• 51 smokers in 83 cases of lung cancer• 23 smokers in 70 disease-free• P(smoker|case) = 51/83• P(smoker|control) = 23/70• P(case|smoker) = P(smoker|case)P(case)/

(P(smoker|case)P(case)+P(smoker|control)P(control))• Relative risk = RR = P(case|smoker)/

P(case|nonsmoker) = 1.87