lectue 11_hypothesis testing

58
Probability, Statistics and Random Processes IC 210 Hypothesis Testing-1 Reference: Introductory statistics By Prem S. Mann available on Moodle – Chapter 9

Upload: ajaymeena

Post on 17-Sep-2015

227 views

Category:

Documents


2 download

DESCRIPTION

Hypothesis Testing

TRANSCRIPT

  • Probability, Statistics and Random Processes IC 210Hypothesis Testing-1

    Reference: Introductory statistics By Prem S. Mann available on Moodle Chapter 9

    lecture 7

  • Inferential StatisticsResearchers use inferential statistics to address two broad goals:Estimate the value of population parametersHypothesis testingStatistics:1. Model2. Estimation3. Hypothesis test

  • Hypothesis testingThe purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter.For Example: A software company may claim that, on average, it cans contain 12 ounces of soda. A government agency may want to test whether or not such cans do contain, on average, 12 ounces of soda. Here we are to test a hypothesis about the population mean . According to some survey 75% of the total charitable contributions in 2008 were given by individuals. An economist want to check if this percentage is still true for this year. Here we are to test a hypothesis about population proportion p.

  • *Hypothesis testing is designed to detect significant differences: differences that did not occur by random chance.

    In the one sample case: we compare a random sample (from a large group) to a population.

    We compare a sample statistic to a population parameter to see if there is a significant difference.Hypothesis testing

  • Nonstatistical Hypothesis TestingA criminal trial is an example of hypothesis testing without the statistics. Based in the available evidence, the judge or jury will make one of the two possible decisions.

    1. The defendant is innocent or not guilty

    2. The defendant is guilty

    At the outset of the trial, the person is presumed not guilty. The prosecutors efforts are to prove that the person has committed the crime and, hence is guilty.

  • *Nonstatistical Hypothesis TestingThe null hypothesis is denoted by H0:H0: The person is not guilty The alternative hypothesis is denoted by H1:

    H1: The person is guilty In statistics, the person is not guilty is called the Null Hypothesis.And the person is guilty is called the alternate hypothesis. In the beginning of the trial it is assumed that the person is not guilty. null hypothesis is usually the hypothesis that is assumed to be true to be begin with.

  • *Nonstatistical Hypothesis TestingTherefore, convicting the defendant is called rejecting the null hypothesis in favor of the alternative hypothesis. That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e., there is enough evidence to support the alternative hypothesis). In statistics, the null hypothesis states that a given claim (or statement) about a population parameter is true.

  • Example soft drink Soft drink company claim that, on average, its can contain 12 ounces of soda. In reality, this claim may not be true. However we will initially assume that the companys claim is true ( thats the company is not guilty of cheating and lying). To test that the claim of the soft-drink company, the null hypothesis is that the companys claim is true.*H0: =12 ouncesThe null hypothesis can also be written as 12 ounces, boz companys claim will still be true.H1:
  • *How do we judge the plausibility of the null hypothesis?The sample mean should be plausible under the sampling distribution of the mean.

    The further the observed value is from the mean of the expected distribution, the more significant the difference

  • *Plausibility of the null hypothesisThe plausibility of the null hypothesis is judged by computing the probability p of observing a sample mean that is at least as deviant from the population mean as the value we have observed.

  • *Plausibility of the null hypothesisThis computation is simplified by converting to z-scores.Under the assumption of normality, we can determine this probability from a standard normal table.

  • The person has not committed the crime but is declared guilty. In this case, court has made an error by punishing an innocent person. In statistics, this kind of error is called a type I or an (alpha) error.

    The person has committed the crime, but because of lack of evidence, is declared not guilty. In this case, court has committed an error by setting a guilty person free. *Two Types of Error (in non-statistical example)

  • A type I error will occur when H0 is actually true (that is, the cans do contain on average 12 ounces of soda. But it just happen that we draw a sample with a mean which is much less than 12 ounces and we wrongfully reject the null hypothesis H0.The value of , called the significance level of the test, represents the probability of making a type I error . In other words, is the probability of rejecting the null hypothesis, when in fact it is true. = P(Ho is rejected Ho is true)

    Note : the size of the rejection region depends on the value assigned to

    Two Types of Error (statistical example)

  • A type II error will occur when the null hypothesis is actually false (that is, the soda contained in all cans, on average, is less than 12 ounces), but it happens by chance that we draw a sample with a mean that is close to or greater than 12 ounces and we wrongfully accepted it. The value of represents the probability of making a type II error. It represents the probability that Ho is not rejected when Ho is false. = P(Ho is not rejected Ho is false)

    *Two Types of Error (statistical example)The value of 1- is called the power of the test. It represents the probability of not making a type II error.

  • H0: InnocentJury TrialHypothesis TestActual SituationActual SituationVerdictInnocentGuiltyDecisionH0 TrueH0 FalseInnocentCorrectErrorAcceptH01 - aType IIError (b)GuiltyErrorCorrectH0Type IError(a)Power(1 - b)Two Types of ErrorFalse NegativeFalse PositiveReject

    lecture 7

  • *Type I and Type II ErrorsType I error (false rejection error)the probability (equal to ) associated with rejecting a true null hypothesis. Type II error (false acceptance error)the probability associated with failing to reject a false null hypothesis.The two probabilities are inversely related. Decreasing one increases the other, for a fixed sample size.

    Actual SituationResearchers DecisionNull Hypothesis is TrueNull Hypothesis is FalseAccept the Null HypothesisReject the Null Hypothesis

  • Note*By rejecting H0, we are saying that the difference between the value of stated in H0 and the value of obtained from the sample is too large to have occurred because of the sampling error alone. Consequently, this difference is real. By not rejecting H0, we are saying that the difference between the value of stated in H0 and the value of obtained from the sample is small and it may have occurred because of the sampling error alone.

  • Tailed Tests

    Two-tailed hypothesis test A hypothesis test in which the region of rejection falls equally within both tails of the sampling distribution.

    One-tailed hypothesis test A hypothesis test in which the alternative is stated in such a way that the probability of making a Type I error is entirely in one tail of a sampling distribution.

    Right-tailed test A one-tailed test in which the sample outcome is hypothesized to be at the right tail of the sampling distribution.Note Whether a test is two-tailed or one-tailed is determined by the sign in the alternative hypothesis.

  • Two -Tailed TestsExample: According to a survey conducted in 2008, a sample of six graders in schools weighed an average of 18.4 pounds. Some magzine wants to check whether or not this mean changed since that surveyHo: the mean weight has not changed =18.4H1: the mean weight has changed 18.4

  • Right-tailed test*Example: The average price of homes in New Jersey was $461,216 in 2007. Suppose a real estate researcher wants to check whether the current mean price of homes in this Town is higher than $461,216 .

    Ho: =$ 461.216 H1: >$ 461.216

  • Left-tailed test*Example: The company claims that their soft-drink cans, on average, contain 12 ounces of soda. However, if these cans contain less than the claimed amount of soda, then the company can be accused of cheating. Suppose a consumer agency wants to test whether the mean amount of soda per can is less than 12 ounces.H0: = 12 ounces = mean is equal to 12 ouncesH1: < 12 ounces =The mean is less than 12 ounces

  • One-tail vs. Two-tail Test

    lecture 7

  • *Hypothesis tests Type I and type II errorsType I error:H0 rejected, when H0 is true.Type II error:H0 not rejected, when H0 is false.Significance level: a is the probability of committing a Type I error.One-sided testTwo-sided test

  • Example : Metal Cylinder ProductionThe machine that produces metal cylinders is set to make cylinders with a diameter of 50 mm.

    The two-sided hypotheses of interest areH0 : = 50versusHA : 50where the null hypothesis states that the machine is calibrated correctly.

  • Example : Car Fuel EfficiencyA manufacturer claim : its cars achieve an average of at least 35 miles per gallon in highway driving.

    The one-sided (left-tailed test) hypotheses of interest areH0 : 35 versusH1 : < 35The null hypothesis states that the manufacturers claim regarding the fuel efficiency of its cars is correct.

  • *Approaches to Hypothesis Testing There are two approaches to test whether the sample mean supports the alternative hypothesis (H1)The rejection region methodThe p-value method

  • *The rejection region is a range of values such that if the test statistic falls within that range, the null hypothesis is rejected in favour of the alternative hypothesis.The Rejection Region Method

  • *Steps in rejection region methodConstruct appropriate hypothesesDetermine a test statistics to be usedDetermine the critical valueCompare the test statistic with the critical value. Reject the null hypothesis if the former is greater than the latter.Make an appropriate conclusion.

  • Calculating Test StatisticsFor one sample tests, use Z test statistic if population is Normal, is known, or if sample size is largeFor one sample tests, use T static if population distribution is not known or if sample size is small (less than 30)

  • Procedure*First we find the critical value(s) of z from the normal distribution table for the given significance level.

    Then we find the value of the test statistic z for the observed value of the sample statistic.

    Finally we compare these two values and make a decision.

    Remember, if the test is one-tailed, there is only one critical value of z, and it is obtained by using the value of which gives the area in the left or right tail of the normal distribution curve depending on whether the test is left-tailed or right-tailed, respectively. However, if the test is two-tailed, there are two critical values of z and they are obtained by using area in each tail of the normal distribution curve.

  • Hypothesis Setups for Testing a Mean ()

  • Hypothesis Setups for Testing a Proportion (p)

  • Problem : A used car dealer says that the mean price of a 1995 Ford F-150 Super Cab is at least $16,500. You suspect this claim is incorrect and find that a random sample of 14 similar vehicles has a mean price of $15,700 and a standard deviation of $1250. Is there enough evidence to reject the dealers claim at = 0.05?

  • Solution:

    The claim is the mean price is at least $16,500. Ho: $16,500 (Claim) and H1 : < $16,500The graph shows the location of the rejection region and the standardized test statistic, t. Because t0 is in the rejection region, you should decide to reject the null hypothesis. There is enough evidence at the 5% level of significance to reject the claim that the mean price of a 1995 Ford F-150 Super Cab is at least $16,500. Because the test is a left-tailed test, the level of significance is 0.05. There are d.f. = 14 1 = 13 degrees of freedom and the critical value is t (from table )= -1.771.

    The rejection region is t < -1.771. Using the t-test, the standardized test statistic is: Since t0 < t, we reject

  • Example : An industrial company claims that the mean pH level of the water in a nearby river is 6.8. You randomly select 19 water samples and measure the pH of each. The sample mean and standard deviation are 6.7 and 0.24 respectively. Is there enough evidence to reject the companys claim at = 0.05? Assume the population is normally distributed.

  • The claim is the mean pH level is 6.8. So, the null and alternative hypotheses are:

    Ho: = 6.8 (Claim) and Ha : 6.8Because the test is a two-tailed test, the level of significance is = 0.05. There are d.f. = 19 1 = 18 degrees of freedom and the critical value is -t = -2.101 and t = 2.101 The rejection regions are t < -2.101 and t > 2.101. Using the t-test, the standardized test statistic is:

    The graph shows the location of the rejection region and the standardized test statistic, t. Because t0 is not in the rejection region, you should decide not to reject the null hypothesis. There is not enough evidence at the 5% level of significance to reject the claim that the mean pH is 6.8.

  • t distribution table

  • Probability ValuesZ statistic (obtained) The test statistic computed by converting a sample statistic (such as the mean) to a Z score. The formula for obtaining Z varies from test to test.P value The probability associated with the obtained value of Z.

  • The p-Value Approach*In this procedure, we find a probability value such that a given null hypothesis is rejected for any (significance level) greater than this value and it is not rejected for any less than thisvalue.

    In this approach, we calculate the p-value for the test, which is defined as the smallest level of significance at which the given null hypothesis is rejected.

    Using this p-value, we state the decision. If we have a predetermined value of , then we compare the value of pwith and make a decision.

  • Probability Values

  • Probability ValuesAlpha ( ) The level of probability at which the null hypothesis is rejected. It is customary to set alpha at the .05, .01, or .001 level.

  • Example: Normal Body Temperature What is normal body temperature? Is it actually 37.6oC (on average)?State the null and alternative hypothesesH0: m = 37.6oCHa: m 37.6oC

  • Example Normal Body Temp (cont) Data: random sample of n = 18 normal body temps37.2 36.8 38.0 37.6 37.2 36.8 37.4 38.7 37.2 36.4 36.6 37.4 37.0 38.2 37.6 36.1 36.2 37.5Variable n Mean SD SE to P Temperature 18 37.220.68 0.161 2.380.029Summarize data with a test statistic

  • STUDENTS t DISTRIBUTION TABLE

    Degrees of freedomProbability (p value) 0.100.0250.0116.31412.70663.65752.0152.5714.032101.8132.2283.169171.7402.1102.898201.7252.0862.845241.7112.0642.797251.7082.0602.7871.6451.9602.576

    lecture 7

  • Example Normal Body Temp (cont) Find the p-valueDf = n 1 = 18 1 = 17 p-value = 0.029From t Table: t17,.025= 2.11 calculated t0 =2.38

    Since t0 > tReject the null hypothesis

    -2.11+2.11t0tRejection region

  • Example Normal Body Temp (cont) Decide whether or not the result is statistically significant based on the p-valueUsing a = 0.05 as the level of significance criterion, the results are statistically significant because 0.029 is less than 0.05. In other words, we can reject the null hypothesis.Report the ConclusionWe can conclude, based on these data, that the mean temperature in the human population does not equal 37.6.

  • Exampleusing p valueWe want to see whether our data confirm a specific hypothesisExample: NYC Blackout Baby BoomData is births per day from two weeks in August 1966Test against usual birth rate in NYC (430 births/day)Formulate your hypotheses:Need a Null Hypothesis and an Alternative HypothesisCalculate the test statistic:Test statistic summarizes the difference between data and your null hypothesisFind the p-value for the test statistic:How probable is your data if the null hypothesis is true?

  • Null and Alternative HypothesesNull Hypothesis (H0): no effect or no change in the populationAlternative hypothesis (Ha): real difference or real change in the populationIf there is a large discrepancy between data and null hypothesis, then we will reject the null hypothesis

    NYC dataset: = mean birth rate in Aug. 1966Null hypothesis is that blackout has no effect on birth rate, so August 1966 should be the same as any other monthH0: = 430 (usual birth rate for NYC)Ha: 430

  • Test StatisticThe test statistic measures the difference between the observed data and the null hypothesisHow many standard deviations is our observed sample value from the hypothesized value?

    For our birth rate dataset, the observed sample mean is 433.6 and our hypothesized mean is 430

    Assume population variance = sample variance s

  • p-valuep-value is the probability that we observed such an extreme sample value if our null hypothesis is true If null hypothesis is true, then test statistic T follows a standard normal distribution

    If our alternative hypothesis was one-sided (Ha: >430), then our p-value would be 0.367Since are alternative hypothesis was two-sided our p-value is the sum of both tail probabilities (0.734)T = 0.342prob = 0.367 prob = 0.367 T = -0.342

  • Statistical SignificanceIs test statistic T=0.342 statistically significant?If the p-value is smaller than , we say the difference is statistically significant at level The -level is also used as a threshold for rejecting the null hypothesis (most common = 0.05)If the p-value < , we reject the null hypothesis that there is no change or differenceThe p-value = 0.734 for the NYC data, so we can not reject the null hypothesis at -level of 0.05Difference between null hypothesis and our data is not statistically significantData do not support the idea that there was a different birth rate than usual for the first two weeks of August, 1966

  • Tests and IntervalsThere is a close connection between confidence intervals and two-sided hypothesis tests100C % confidence interval is contains likely values for a population parameter, like the pop. mean Interval is centered around sample mean Width of interval is a multiple of

    A -level hypothesis test rejects the null hypothesis that = 0 if the test statistic T has a p-value less than

  • Tests and IntervalsIf our confidence level C is equal to 1 - where is the level of the hypothesis test, then we have the following connection between tests and intervals:

    A two-sided hypothesis test rejects the null hypothesis ( =0) if our hypothesized value 0 falls outside the confidence interval for

    So, if we have already calculated a confidence interval for , then we can test any hypothesized value 0 just by whether or not 0 is in the interval!

  • Example: NYC blackout baby boomBirths per day from two weeks in August 1966

    Difference between our sample mean and the population mean 0 = 430 had a p-value of 0.734, so we did not reject the null hypothesis at -level of 0.05We could have also calculated a 100(1-) % = 95 % confidence interval:

    Since our hypothesized 0 = 430 is within our interval of likely values, we do not reject the null hypothesis.If hypothesis was 0 = 410, then we would reject it!

  • Example Hypothesis Test for CalciumLet be the mean calcium intake for people below the poverty lineNull hypothesis is that calcium intake for people below poverty line is not different from RDA: 0 = 850 mg/dayTwo-sided alternative hypothesis: 0 850 mg/dayTo calculate test statistic, we need to know the population standard deviation of daily calcium intake.From previous study, we know = 188 mg

    Need p-value: if 0 = 850, what is the probability we get a sample mean as extreme (or more) than 747 ?

  • p-value for CalciumWe have two-sided alternative, so p-value includes standard normal probabilities on both sides:

    Looking up probability in table, we see that the two-sided p-value is 0.010+0.010 = 0.02Since the p-value is less than 0.05, we can reject the null hypothesisConclusion: people below the poverty line have significantly (at a =0.05 level) lower calcium intake than the RDA

    T = 2.32prob = 0.010prob = 0.010T = -2.32

  • Alternatively, we calculate a confidence interval for the calcium intake of people below poverty lineUse confidence level 100C = 100(1-) = 95%95% confidence level means critical value Z*=1.96

    Since our hypothesized value 0 = 850 mg is not in the 95% confidence interval, we can reject that hypothesis right away!

    Confidence Interval for Calcium

  • Cautions about Hypothesis TestsStatistical significance does not necessarily mean real significanceIf sample size is large, even very small differences can have a low p-valueLack of significance does not necessarily mean that the null hypothesis is trueIf sample size is small, there could be a real difference, but we are not able to detect itMany assumptions went into our hypothesis testsPresence of outliers, low sample sizes, etc. make our assumptions less realisticWe will try to address some of these problems next class

    *******