statistics introductory - introduction to hypothesis testing · assuming the null hypothesis is...

22
Statistics Introductory Introduction to Hypothesis Testing Oscar BARRERA [email protected] April 10, 2018

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Statistics IntroductoryIntroduction to Hypothesis Testing

Oscar [email protected]

April 10, 2018

Page 2: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Outline

Oscar BARRERA Statistics Intermediate

Page 3: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Inferential Statistics and Hypothesis Testing

So far we have dealt mainly with descriptive statistics. Now let’s gointo inferential statistics and hypothesis testing.

Definitions 1. Inferential statisticsThe outcomes of which can be used to make inferences about whatwould likely be found in a population.

Definitions 2. Hypothesis testingIt is whether the weight of the evidence is sufficient enough for anoutcome to be statistically significant.

Oscar BARRERA Statistics Intermediate

Page 4: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Inferential Statistics and Hypothesis Testing

Example:

Say we provide the supplement Baguette to a sample of collegefreshmen and give a placebo to another sample of freshmen.We find the freshmen who took Baguette have a higher meanGPA compared to the students taking the placebo at the endof the year.There appears to be a relationship between taking Baguetteversus a placebo and GPA.

To assess this relationship we use inferential statistics.

IMPORTANT NOTEtwo things may be numerically different, but it does not mean thatthis difference is statistically different.

Oscar BARRERA Statistics Intermediate

Page 5: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Inferential Statistics and Hypothesis Testing

Inferential statistics boil down to a test-statistic, which is a ratio ofthe size of a relationship or effect to some measurement ofsampling error.

teststat =effect

error

We generally want the ratio of effect to error to be large. BUT howlarge the ratio needs to be for a relationship to be statisticallysignificant?

Oscar BARRERA Statistics Intermediate

Page 6: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Inferential Statistics and Hypothesis Testing

In any study one must explicitly state the predictions of their nulland alternate hypotheses

the null hypothesis (H0) predicts the expected relationship willnot be observedalternate hypothesis (H1) predicts the expected relationshipwill be observed.

From the Baguette exampleThe null hypothesis would predict that students takingBaguette will have GPAs that are different than the GPAs ofstudents taking a placebo.Alternate hypothesis would predict that students who aretaking Baguette will have GPAs that do differ from those ofstudents who are taking a placebo.

H0 = µbaguette = µplacebo

H1 = µbaguette 6= µplacebo

Oscar BARRERA Statistics Intermediate

Page 7: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Inferential Statistics and Hypothesis Testing

H0 = µbaguette = µplacebo

H1 = µbaguette 6= µplacebo

The symbols are used to state the null and alternate hypothesesshould be in terms of their populations. WHY?

REMEMBERInferential statistics use sample data to make inferences about whatshould be found in the population from which the sample came;thus, hypotheses should reflect the inferences.

Oscar BARRERA Statistics Intermediate

Page 8: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Null Hypothesis Significance Testing (NHST)

Some Characteristics

The NHST may seem a little strange, convoluted, and flawed atfirst. Indeed, there are some who are critical of hypothesis testingand believe it should be banned.

Null hypothesis significance testing is a tool for makinginferences from sample statistics about unknown parameters.Unfortunately, some people do not take into account many ofthe parameters of NHST and decide a relationship exists whenthe relationship is weak.In NHST, you start by assuming the null hypothesis is true:Without any evidence to the contrary you assume there shouldbe no relationship between the variables under study.

p(Outcome|H0 = True)

That is, the probability of observing our result if the null hypothesiswas true.

Oscar BARRERA Statistics Intermediate

Page 9: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Null Hypothesis Significance Testing (NHST)

p(Outcome|H0 = True)

That is, the probability of observing our result if the null hypothesiswas true.

Critically

p(Outcome|H0 = True) does NOT tell you the probability the nullis true; it is the probability of observing the result assuming the nullis true.

If the probability of observing a particular outcome given the null istrue is low, the null hypothesis is rejected.

if the probability of observing data is very low (assuming thenull is true), then the null is unlikely correct and we reject thenull hypothesis.Once the null hypothesis is rejected, we accept the alternatehypothesis.

Oscar BARRERA Statistics Intermediate

Page 10: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

Null Hypothesis Significance Testing (NHST)

But at what point is p(Outcome|H0 = True) low enough that thenull hypothesis can be rejected?

The accepted level of statistical significance is p = .05 or less,which is called the alpha-level (α).

Assuming the null hypothesis is true, if the probability ofobserving an outcome is p = .05 or less then it is unlikely

this result would be obtained if the null was true, so the nullis rejected.

Important:The alpha level is not the probability that the null hypothesis istrue or false; it is the probability of observing an particular outcomegiven the null hypothesis is true.

Oscar BARRERA Statistics Intermediate

Page 11: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

The z-test compares a sample mean to a population mean (µ) todetermine whether the difference between the means is statisticallysignificant. To use the z-test:

a sample must be drawn from a population with a knownmean (µ) and standard deviation (σ).Thus the z-test calculates a z-score for the sample mean withrespect to the population mean.The greater the difference between the sample mean andpopulation mean, the larger the z-Score, and the less likely thesample mean is statistically equivalent to the population mean.

Oscar BARRERA Statistics Intermediate

Page 12: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Come back to the example

Assume we predict taking Baguette will have an effect on freshmanGPA.

We randomly sample n = 25 freshmen taking Baguetteat the end of their freshman year we ask for their GPA and wefind this sample has a mean GPA of 3.20Let’s say we also know that the mean GPA of all freshmen(including the 25 students in the sample) at this university isµ = 2.80 σ = 1Because we know µ and σ, we can use the z-test to determinewhether the difference between the sample mean and µ isstatistically significant.

Oscar BARRERA Statistics Intermediate

Page 13: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Now we have to distinguish between directional (one-tailed) andnon-directional (two-tailed) hypotheses,

Directionality is related to the alternate hypothesis:A directional alternate hypothesis predicts the sample meanwill be, specifically, greater than or less than the populationmean.

From the Baguette example: Directional:

H0 : µBag = 2.80

H1 : µBag > 2.80

Non-directional:

H0 : µBag = 2.80

H1 : µBag 6= 2.80

Oscar BARRERA Statistics Intermediate

Page 14: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Alpha level

Two important characteristicsthe alpha-level is a probability; it’s the probability associatedwith statistical significance.the alpha-level is an area under the tail end of a distribution.So far so good, the alpha level is just a probability under adistribution.

Oscar BARRERA Statistics Intermediate

Page 15: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Directional

Oscar BARRERA Statistics Intermediate

Page 16: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Non Directional

Oscar BARRERA Statistics Intermediate

Page 17: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

The z-test calculates the z-Score of a sample mean with respect toa population mean. The z-test is simply the difference between asample mean and its population mean divided by the standard errorof the sampling distribution of the mean (standard error of themean):

z =X − µσX

σX =σ√n=

1√25

=15= 0.2

Oscar BARRERA Statistics Intermediate

Page 18: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

z =3.20− 2.80

0.2= 2

This value (z = 2) is the obtained value (zO) or test statistic, andtells us that the sample mean (3.20) is two standard errors abovethe population mean of 2.80.

ResultFor a non-directional hypothesis, look up the probability in column3 and then double that value (2 x .0228 = .0456). If this p-value (p= .0456) is less than half your alpha-level (.05)

Thus, for either alternate hypothesis there was a significantdifference between the sample mean and the population mean.

Oscar BARRERA Statistics Intermediate

Page 19: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Type 1 and 2 errors

Is hypothesis testing perfect?

Sadly, the answer is no and it is possible to make a mistake,because NHST is based on probabilities.

α = .05 means there is a small chance you would observe theoutcome with the null being true, but there is still a smallchance you would observe this data even with the null beingtrue.the value mean is variable, so you might obtain a statisticallysignificant result simply due to random selection.

Oscar BARRERA Statistics Intermediate

Page 20: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Type 1 and 2 errors

There are four combinations of your two statistical decisionsregarding the null hypothesis and two possible realities regardingthe null hypothesis. These are illustrated in the table below:

A Type I Error is committed when the null hypothesis isrejected, but in reality the null hypothesis is true and shouldbe retained.To decrease the chance of making a Type I Error, a smaller αcan be selected (e.g., .01, .001, .0001).

Oscar BARRERA Statistics Intermediate

Page 21: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Type 1 and 2 errors

A Type II Error is committed when you fail to reject the nullhypothesis when retain the null hypothesis, but in reality thenull hypothesis is false and should be rejected.The probability of making a Type II Error is equal tosomething called beta (β)The probability of not making a Type II Error, is equal to 1 - βand this is the power of a statistical test.

Oscar BARRERA Statistics Intermediate

Page 22: Statistics Introductory - Introduction to Hypothesis Testing · Assuming the null hypothesis is true, if the probability of observing an outcome is p = .05 or less then it is unlikely

Introduction to Hypothesis Testing

The z-test

Statistics IntroductoryIntroduction to Hypothesis Testing

Oscar [email protected]

April 10, 2018

Oscar BARRERA Statistics Intermediate