k300 (4392) statistical techniques (fall 2007) …...iupui/spea (fall 2007) k300 (4392) statistical...

12
IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test Comment (260 points, Due November 5) Instructor: Hun Myoung Park [email protected], (317) 274-0573 Please read the following instructions carefully. If you have any problem with any of the questions, please contact the instructor. Download the SPSS data set from OnCourse or the course web page at http://www.masil.org/method/statistics.html Conduct the t-test in question using SPSS and print out the output. Write down answers on the SPSS output. Use separate sheets if you really need or are asked to do so. Hand in this assignment by Monday, November 5. You may ask your classmates about using SPSS, but you MAY NOT discuss with other classmates when answering questions. Remember the Student Code of Conduct and SPEA policies. Please talk to me if you have any problem. Instructor’s Comment on Assignment 5: Some of you did not read the instructions above or ignore them consciously or unconsciously. I did not include the instructions for fun. First, you may not discuss with other classmates; you may not copy classmates’ work or allow others to copy yours. This is a cheating and violation of IUPUI Student Code of Conduct and SPEA Policies. I have a strict criterion on this issue. Second, I asked you to answer on SPSS output because I want you to be able to know how SPSS produces statistics for you. Many of you failed to replicate statistics. SPSS reports all necessary statistics for assignments; why didn’t you use them? SPSS output tells you if your answers are correct. If you get a different statistic, your computation is incorrect. As shown in the sample of assignment 5, necessary statistics for computation should be indicated by drawing a line to proper formula. Third, all necessary formula and examples are well illustrated in the Powerpoint slide (t-test) and http://www.masil.org/documents/ttest.pdf . If you skipped them, you lost a good opportunity to get a higher score. Fourth, I told you not to wait until the last minute; you may not able to complete assignments in an hour. Some of you must be in hurry to meet the deadline and thus made many silly mistakes. Finally, write down question numbers before answering and do not skip any question, or you will lose many points. Hypothesis testing Many of you do not seem to fully understand the logic of hypothesis test. You should read through the lecture notes 1 http://www.masil.org/teach/k300/Hypothesis_Test.pdf (revised version) and 2 http://www.masil.org/teach/k300/Hypothesis_Test2.pdf . If you do not understand hypothesis testing, you may not, I am 99 percent sure, get good scores in

Upload: others

Post on 12-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test

IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques

1

K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test Comment (260 points, Due November 5)

Instructor: Hun Myoung Park [email protected], (317) 274-0573

Please read the following instructions carefully. If you have any problem with any of the questions, please contact the instructor.

• Download the SPSS data set from OnCourse or the course web page at http://www.masil.org/method/statistics.html

• Conduct the t-test in question using SPSS and print out the output. • Write down answers on the SPSS output. Use separate sheets if you really need

or are asked to do so. • Hand in this assignment by Monday, November 5. • You may ask your classmates about using SPSS, but you MAY NOT discuss

with other classmates when answering questions. Remember the Student Code of Conduct and SPEA policies. Please talk to me if you have any problem.

Instructor’s Comment on Assignment 5: Some of you did not read the instructions above or ignore them consciously or unconsciously. I did not include the instructions for fun. First, you may not discuss with other classmates; you may not copy classmates’ work or allow others to copy yours. This is a cheating and violation of IUPUI Student Code of Conduct and SPEA Policies. I have a strict criterion on this issue. Second, I asked you to answer on SPSS output because I want you to be able to know how SPSS produces statistics for you. Many of you failed to replicate statistics. SPSS reports all necessary statistics for assignments; why didn’t you use them? SPSS output tells you if your answers are correct. If you get a different statistic, your computation is incorrect. As shown in the sample of assignment 5, necessary statistics for computation should be indicated by drawing a line to proper formula. Third, all necessary formula and examples are well illustrated in the Powerpoint slide (t-test) and http://www.masil.org/documents/ttest.pdf. If you skipped them, you lost a good opportunity to get a higher score. Fourth, I told you not to wait until the last minute; you may not able to complete assignments in an hour. Some of you must be in hurry to meet the deadline and thus made many silly mistakes. Finally, write down question numbers before answering and do not skip any question, or you will lose many points. Hypothesis testing Many of you do not seem to fully understand the logic of hypothesis test. You should read through the lecture notes 1 http://www.masil.org/teach/k300/Hypothesis_Test.pdf (revised version) and 2 http://www.masil.org/teach/k300/Hypothesis_Test2.pdf. If you do not understand hypothesis testing, you may not, I am 99 percent sure, get good scores in

Page 2: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test

IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques

2

following assignments and the final exam. Keep in mind that hypothesis testing is the most important topic in K300. Let me summarize three approaches of hypothesis testing. The test statistic approach compares a test statistic with the critical value. If |TS|>|CV|, you may reject the null hypothesis. To make it simple, compare just magnitudes of TS and CV. If |TS|>|CV|, TS is farther away from the zero than CV, implying observation (TS) of the sample is not likely if the null hypothesis is true. Therefore, you should change your conjecture (the null hypothesis). In the p-value approach, you are comparing p-value with the significance level (test size or alpha). If a p-value is smaller than alpha, reject the null hypothesis because the p-value means a low level of risk that you have to take when you reject the null hypothesis. A small p-value means you are relatively safe when you reject the null hypothesis. For example, a p-value of .003 means you just need to take .3% risk of making a wrong decision (rejecting the null hypothesis that is true). Definitely, it is safe to reject the null hypothesis. A p-value of .7 means you have 70% chance of rejecting the true null hypothesis (wrong inference). It is too risky to reject the null hypothesis. The confidence interval approach constructs the confidence interval and check if the hypothesized value is within the interval. The 95 percent confidence interval means “you are 95 percent sure that the population (hypothesized) mean exists in the interval.” If the hypothesized value is beyond the interval, your conjecture of “you are 95 percent sure that the population (hypothesized) mean exists in the interval” turns out wrong. Therefore, you need to reject your conjecture (the null hypothesis) Three approaches lead to the same conclusion. If one approach gives you a conclusion that is different from those of the other approaches (e.g., reject the null hypothesis in the p-value approach but do not in the test statistic approach), there must be something wrong. Some of you drew different conclusions depending on the approach; this should not be correct. You need to go back and check what is wrong with your statistical inferences. Other Issues Some of you still appear to feel difficulty stating the null hypothesis and alternative hypotheses. PLEASE DO NOT use English alphabets (sample statistics) in hypotheses. Once you obtain sample data, you already know the sample statistics. Why are you checking sample statistics that you already know? The goal of statistics is to know parameters (population properties) using the sample statistics. A hypothesis, by definition, is about what you do not know so you want to know. If you know something, that should not be a hypothesis. You already know Indianapolis is in Indiana State; do you still want to test if Indianapolis is Indiana State? Please read Powerpoint slides and lecture notes for hypotheses. The t-test slide explicitly mentions hypotheses for four types of t-test. Why are you losing points in this obvious question?

Page 3: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test

IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques

3

In the fourth step of hypothesis testing, I would first write down TS and CV, p-value and alpha, or the confidence interval depending on the approach. And then make a decision. See the table in lecture note 1. Please DO NOT use “confirm” or “accept,” but say “reject” or “do not reject.” Keep in mind we NEVE KNOW if a hypothesis is true or false; we can just evaluate the likelihood or plausibility of the hypothesis. Nobody can confirm that a hypothesis is true. There is always a possibility, even .0000000000001%, that our conclusion is not correct. The fifth step of hypothesis testing is substantive interpretation. Do not say reject or do not reject the null hypothesis. What does “substantive” mean? You need to go back to the step 1 of hypothesis. Ask “What do I want to know using this test?” One example is “The average family size is not equal to the national average of 3.18 (p<.036).” Please, please DO NOT ADD “There is not sufficient evidence to support the claim that”; this is redundancy and prosaic expression that you should avoid. Save your paper and pencil. Do not be confused with a p-value and critical value. A p-value is a probability of risk taking. A probability cannot be larger than 1 and smaller than zero. Some of you report a p-value lager than 1; how come? A p-value is the sum of rejection regions. A p-value should be compared to the test size (alpha). A critical value is a value in the probability distribution that forms rejection regions; that is, the area under the curve from negative infinity to negative CV and from positive CV and positive infinity is the significance level (rejection regions). CV should be compared to TS. DO NOT compare a p-value with CV; do not compare a monkey with a carrot! σ is a population standard deviation. A standard deviation you compute from the sample is a sample standard deviation, which is denoted as s. A variance is a standard deviation squared; this is its definition. T statistics follow the t-distribution with degrees of freedom. Do not look up the standard normal distribution table to get CV of a t-test. You need to pay attention to the head of the table for one-tailed and two-tailed tests. Do not read a wrong line. Test statistics for comparing proportion follow the normal distribution because a binomial distribution is approximated to the normal distribution when N is large. The F statistic of ANOVA follows the F distribution with two types of degrees of freedom since the statistic is a ratio of variance 1 to variance 2. In the F distribution table, read the column for the first degrees of freedom (numerator) and then the row for the second degrees of freedom (denominator). In independent sample t-test, you use the F-test to examine if two variables have equal variance before conduct the t-test. This F-test is different from the F-test of ANOVA. In fact, t-test is core in the independent sample t-test. The F-test is to compute correct standard error of the difference of two means. If the null hypothesis of equal variance is not rejected, you need to get pooled variance. Otherwise, you need to use individual variances and approximate the degrees of freedom. Again, the F-test for equal variance is a component of the independent sample t-test. See the flow chart of the t-test slide.

Page 4: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 5: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 6: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 7: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 8: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 9: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 10: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 11: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test
Page 12: K300 (4392) Statistical Techniques (Fall 2007) …...IUPUI/SPEA (Fall 2007) K300 (4392) Statistical Techniques 1 K300 (4392) Statistical Techniques (Fall 2007) Assignment 5: T-test