introduction to statistical...

31
Introduction to Statistical Inference Brandon Wales

Upload: doannhan

Post on 30-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Introduction to Statistical Inference Brandon Wales

Page 2: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Introduction A coin is tossed 25 times; can we determine if it is unbiased (chance of heads/tails = .5)? The idea behind inferential statistics is that there lurks an unknown parameter (like the probability of heads/tails) and we want to use collected data to make inferences about the unknown parameters.

Page 3: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Descriptive vs. Inferential Descriptive statistics are quantities that do not depend on unknown parameters that are used to describe the data (often sample data). Inferential statistics often use descriptive statistics but has an objective to answer the question: is this outcome by chance?

Page 4: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Major Sources of Error If we had a fair coin and the coin tosses were random – would we always get half heads and half tails? We would not because there is error being introduced by sampling called sampling error. This error is caused by chance and is uncontrollable.

Page 5: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Major Sources of Error Imagine if someone was flipping a fair coin but they did it in such a way that the outcome was always heads. This is considered sample bias. For this presentation, we will assume all collected data was done at random and there is not a significant source of sample bias.

Page 6: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Descriptive Statistics Descriptive statistics include (but not limited to): mean, variance (also standard deviation), minimum, maximum, and median. Descriptive statistics are used to characterize data which will be essential to inferential statistics. Anyone know of any other descriptive statistics?

Page 7: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Means Means are often referred to as a measure of central tendency – or location. If someone had a mean of 6, this would describe a location for the data but not the spread of the data For example both of these sets of numbers have means of 6 but are very different: 5, 6, 7 & 1, 6, 11

Page 8: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Variances Variances are used to characterize the spread in the data. If variances are conceptually a measure of spread then the variance of 5, 6, and 7 would be less than 1, 6, 11 (we will show how to calculate this in a example) Note: having both mean and variance fully characterizes the normal distribution.

Page 9: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Calculating Descriptive Statistics The notation that is often used for sample mean is 𝑋� and sample variance is 𝑆2

𝑋� =∑ 𝑋𝑖𝑛𝑖=1𝑛 𝑜𝑜

𝑋1 + 𝑋2 + 𝑋3 + … + 𝑋𝑛𝑛

𝑆2 = ∑ 𝑋𝑖−𝑋� 2𝑛𝑖=1𝑛−1

or 𝑋1−𝑋� 2+ 𝑋2−𝑋� 2+⋯ 𝑋𝑛−𝑋� 2

𝑛−1

*Calculate 𝑋� and 𝑆2 for: 5, 6 , 7 & 1, 6, 11

Page 10: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Probability Distributions There are numerous types of distributions depending on the behavior of the variable.

Flipping a coin has two outcomes, this follows a Binomial distribution. The number of accidents on the freeway is considered count data which often follows a Poisson distribution. Looking at the average height of a large sample of people is approximately a normal distribution

Page 11: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Central Limit Theorem Under some regularity conditions; by collecting a large enough sample, the sample mean is approximately a normal distribution. Since mean and variances fully characterize a normal distribution; we can use the sample mean and sample variance to help us characterize the approximation.

Page 12: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

T distribution Under the assumption that the data is approximately normal; we can use a T distribution (with specified degrees of freedom) to calculate the probabilities that is used for inferential statistics.

If we know the data is exactly normal with the exact variance, then the distribution of a sample will be exactly normal.

Page 13: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Comparison

Page 14: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Why do we care? Inferential statistics relies on calculating the probabilities that the data happened by chance under some assumptions. We can often use the advantage that data is approximately normal to calculate probabilities. If the data is not approximately normal; often more complicated distributions are used but the idea is still the same.

Page 15: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Notation

Name Sample Population

Mean 𝑋� 𝜇

Variance 𝑆2 𝜎2

Standard Deviation S 𝜎

Page 16: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Hypothesis Testing Hypothesis testing is the link between descriptive statistics and determining if a descriptive statistic is a result from chance or not. We will work out hypothesis tests in simpler cases but the ideas are all universal to more complicated cases.

Page 17: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Hypothesis Testing Framework Set up a null and alternative hypothesis test. Calculate a test statistic (often using common descriptive statistics). Rejection decision using P-values based off the test statistic. Draw a conclusion based on the rejection decision.

Page 18: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Setting up Null and Alternative Hypothesis

I claim the average height of a UCR student is greater than 5.75 feet (5 feet, 9 inches). The hypothesis are:

𝐻0: 𝜇 = 5.75 (𝜇 ≤ 5.75 has an identical test) 𝐻𝑎: 𝜇 > 5.75

Null hypothesis is 𝐻0 and alternative is 𝐻𝑎

Page 19: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Structure of Null and Alternative 𝐻0 has the equality sign and there is never an equality sign for 𝐻𝑎 𝐻𝑎 can be one of 3 different things (for this example): 𝐻𝑎: 𝜇 < 5.75 ; 𝐻𝑎: 𝜇 ≠ 5.75 ; 𝐻𝑎: 𝜇 > 5.75 𝐻𝑎 reflects the question being asked

Page 20: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Why are these incorrect? 𝐻0:𝜇 > 5.75

𝐻𝑎:𝜇 = 5.75 𝐻0:𝜇 = 5.75

𝐻𝑎:𝜇 ≥ 5.75 𝐻0: 𝑋� = 5.75

𝐻𝑎:𝑋� > 5.75

Page 21: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Calculating a Test Statistic Let’s say that we collected a sample of 25 UCR students heights and X� = 5.9 and 𝑆 = .75

Our test statistic would be: Tn−1 = 𝑋�−𝜇0𝑆𝑛

How is this test statistic formed and why do we use it?

Page 22: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Test Statistic We are using this test statistic because: - We are assuming mean height is approximately normal - We do not have the population standard deviation - 𝜎: our approximation is a T distribution - Randomly sampled UCR students -*Uniformly Most Powerful

What do we do if our assumption is not met?

Page 23: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Rejection Rule Our T test statistic is calculated to be:

𝑇24 =5.9− 5.75

0.7525

=0.150.15 = 1

What is a p-value? A p-value is the probability that this data is by chance.

Since 𝐻𝑎 : 𝜇 > 5.75 ; this is a right tail test Data supports up to T = 1; anything greater is by chance 𝑃(𝑇24 > 1) --- We will use a T table to calculate this

Page 24: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

P-value and Level of Significance

• Since we have a one tail test; our T-value = 1 is between 0.685 and 1.318. This implies that are P-value is between 0.1 and 0.25.

• The level of significance = 𝛼 determines our threshold for what we believe is chance

Page 25: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Level of Significance While the level of significance is determined by the researcher; a typical level of significance is 𝛼 = 0.05 If our p-value is less than 𝛼, then we are saying that our data indicates it’s not by chance and we reject 𝐻0 If our p-value is greater than 𝛼, we say that we do not have enough evidence to reject 𝐻0

Page 26: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Decision and Conclusion Since our p-value is between .1 and .25; our p-value > .05, we would say we fail to reject 𝐻0. There is insufficient evidence to indicate that 𝜇 > 5.75. Does this mean we support that 𝜇 = 5.75?

Page 27: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Conclusions While we did not have enough evidence to indicate 𝜇 > 5.75; we are not stating that 𝜇 = 5.75 𝑜𝑜 𝜇 ≤ 5.75 There could be a number of reasons why we did not have enough evidence including not having a large enough sample size or incorrect assumptions While it is a possibility that 𝜇 = 5.75, our conclusion does not reflect that possibility.

Page 28: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Summary First start out with null and alternative hypothesis:

Are we testing means, variances, or another unknown parameter?

Calculate an appropriate test statistic based on assumptions:

Is the data normal, approximately normal, etc…?

Page 29: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Summary A p-value is calculated based on the test statistic and assumptions Pick a level of significance (usually 𝛼 = 0.05) Decision Rule:

Reject if 𝑝 − 𝑣𝑣𝑣𝑣𝑣 ≤ 𝛼 Do not reject if 𝑝 − 𝑣𝑣𝑣𝑣𝑣 > 𝛼

Page 30: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Summary Conclusion:

Reject 𝐻0: There is sufficient evidence to indicate:

{Whatever is stated by 𝐻𝑎}

Fail to Reject 𝐻0: There is insufficient evidence to indicate:

{Whatever is stated by 𝐻𝑎} Failure to reject 𝐻0 is not supporting 𝐻0

Page 31: Introduction to Statistical Inferencegradquant.ucr.edu/.../Introduction-to-Statistical-InferenceDesign.pdf · Introduction A coin is tossed 25 times; can we determine if it is unbiased

Note There are many different types of test statistics that correspond with specific assumptions. While the test statistics may rely on more complicated distributions and probabilities, the hypothesis testing framework is the same