statistical methods for astronomyircamera.as.arizona.edu/astr_518/statistics-lecture2.pdf ·...

21
Statistical Methods for Astronomy Lecture 1 Why do we need statistics? Definitions Statistical distributions Binomial Distribution Poisson Distribution Gaussian Distribution Central Limit theorem Least Squares chi-squared significance Lecture 2 Your Statistical Toolbox Bayes' theorem F-test KS-test Monte Carlo method transforming deviates If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford

Upload: others

Post on 12-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Statistical Methods for Astronomy● Lecture 1● Why do we need statistics?

● Definitions

● Statistical distributions

� Binomial Distribution

� Poisson Distribution

� Gaussian Distribution● Central Limit theorem

● Least Squares

� chi-squared

� significance

● Lecture 2● Your Statistical Toolbox

● Bayes' theorem● F-test● KS-test● Monte Carlo method● transforming deviates

If your experiment needs statistics, you ought to have done a better experiment.-Ernest Rutherford

Page 2: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

References● “Data Reduction and Error Analysis”, Bevington and

Robinson● “Practical Statistics for Astronomers”, Wall and Jenkins● “Numerical Recipes”, Press et al.● “Understanding Data Better with Bayesian and Global

Statistical Methods”, Press, 1996 (on astro-ph)

Page 3: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Another look at the problem● Knowing the distribution

allows us to predict what we will observe.

● We often know what we have observed and want to determine what that tells us about the distribution.

Page 4: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Bayesian Statistics● “Frequentist” approaches are computationally easy,

but often solve the inverse of the problem we want.● Bayesian approaches use both the data and any

“prior” information to develop a “posterior” distribution.� Allows calculation of parameter uncertainty more

directly.� More easily incorporates outside information.

Page 5: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

An Example● I flip a coin 10 times and obtain 7 heads. What is the

probability for flipping a heads?� A frequentist statistician would say 0.7� A bayesian statistician might define a prior probability

with mean=0.5 and sigma=0.2 (for example)

Who would you side with?

Page 6: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Obtaining the Posterior Distribution● Bayes' Theorem states:

P B∣A= P A∣B P BP A

P(A | B) should be read as“probability of A given B”

● A is typically the data P(data), B the statistic we want to know.

● P(B) is the “prior” information we may know about the experiment.

P B∣data∝P data∣BP B ● P(data) is just a normalization constant

Page 7: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Using Bayes' theorem● Assume we are looking for faint companions,

and expect them to be around 1% of the stars we observe.

● From putting in fake companions we know that we can detect planets 90% of the time.

● We also know that we see “false” planets 3% of the observations.

● What is the probability that an object we see is actually a planet?

P planet =0.01

P planet∣det.= P det.∣planet P planet P det

P det.∣planet =0.01P −det.∣planet =0.9

P det.∣no planet =0.03

P planet∣det.= 0.9×0.010.9×0.010.03×0.99

=0.23

P planet∣det.= P det.∣planet P planet P det.∣planet P planet P det.∣no planet P no planet

Page 8: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

General Bayesian Guidance● Focuses on probability rather than accept/reject.● Bayesian approaches allow you to calculate

probabilities the parameters have a range of values in a more straightforward way.

● A common concern about Bayesian statistics is that it is subjective. This is not necessarily a problem.

● Bayesian techniques are generally more computationally intensive, but this is rarely a drawback for modern computers.

Page 9: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Hypothesis Testing● Hypothesis testing uses some metric to determine

whether two data sets, or a data set and a model, are distinct.

● Typically, the problem is set up so that the hypothesis is that the data sets are consistent (the null hypothesis).

● A probability is calculated that the value found would be obtained again with another sample.

● Based on the required level of confidence, the hypothesis is rejected or accepted.

Page 10: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Are two data sets drawn from the same distribution?

● The “t” statistic quantifies the likelihood that the means are the same.

● The “F” statistic quantifies the likelihood that the variances of two data sets are the same.

● Consider two data sets, x and y, with m and n data points:

t= x−ys1 /m1/n

s2=nS xmS ynm

F= ∑ xi−x 2/n−1

∑ y j−y 2/m−1

Page 11: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Student's t test● Calculate the t statistic. A perfect agreement is t=0.● Evaluate the probability for t>value.

t= x−ys1 /m1/n

s2=nS xmS ynm

t= x−ys1 /m1/n

s2=nS xmS ynm

Page 12: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

F test● Calculate the F statistic.

● Calculate the probability that F>value.

F= ∑ xi−x 2/n−1

∑ y j−y 2/m−1

Page 13: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

The Kolmogorov-Smirnov Test● Calculate the cumulative distribution function for your

model (C_model(x)).● Calculate the cumulative distribution function for your

data(C_data(x).● Find maximum of |Cmodel(x)-Cdata(x)|● The variables, x, must be continuous to use K-S test.

Page 14: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

K-S test example

D

Page 15: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Monte Carlo Simulation● Often we may find it easiest just to replicate an

experiment or observation in the computer. ● In general these tools are referred to as “Monte Carlo”

methods.● General idea is to simulate randomness and

reproduce observations for comparison with data.● First we need a random number sequence.

Page 16: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Creating Random numbers● A proper random sequence of numbers is a whole

topic in itself. Numerical Recipes discusses this in some detail.

● A simple example of a random number generator is the sequence:

I j1=a I jmod m/a

Where a and m are large numbers. I_j is a seed value that would always give us the exact same sequence of random numbers.

Page 17: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Random Numbers● The example gives a “uniform” distribution set of

random numbers. That is, P x dx=dx if 0x1

0otherwise● We would like useful distributions, such as Poisson, etc.

To do so, we need to transform the random numbers.

Page 18: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Transformation Method● Starting from the law for transformation of

probabilities:∣p y dy∣=∣p xdx∣

∣dxdy

∣=∣p ypx

p y=dxdy

● We can rewrite to solve for the probability we want.

1. Need to integrate the probability distribution2. Solve for the new variable (y) in terms of the uniform variable(x)

Page 19: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Example● I want to simulate the time it takes between arrival of

photons at the detector. This is given by an exponential probability distribution:

e−t=dxdt

e−t=x

t=−ln x

P t dt=e− t dt

● Use the transformation of probabilities:● Need to integrate: ∫e−t dt=∫ dx

● A random number in the range 0 to 1 will be transformed to one which can be between Inf and 0

Page 20: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Limitations● Transformation methods are limited to analytical

probability distributions. ● One also needs to be able to integrate the proability

distribution and invert the equation to solve for the new variable.

● Often one of these criteria is not satisfied. You can still generate useful random numbers using the rejection method.

Page 21: Statistical Methods for Astronomyircamera.as.arizona.edu/Astr_518/Statistics-Lecture2.pdf · Bayesian Statistics “Frequentist” approaches are computationally easy, but often solve

Rejection Method● Generate two uniform random deviates, x and y. ● Adjust x to span the range of values expected for the

random number (x'=f(x)). ● Compare the value of y to the value of the probability

distribution at x' (y'=p(x'))● If y'<y use the value of x' in your simulation, if y'>y

reject this pair and start over.