statistics for education research lecture 2 normal distributions & sampling distribution of...

81
Statistics for Statistics for Education Research Education Research Lecture 2 Lecture 2 Normal Distributions & Normal Distributions & Sampling Distribution of Sampling Distribution of Means Means Instructor: Dr. Tung- Instructor: Dr. Tung- hsien He hsien He [email protected] [email protected]

Upload: wesley-dennis

Post on 30-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Statistics for Statistics for Education ResearchEducation Research

Lecture 2Lecture 2

Normal Distributions & Normal Distributions & Sampling Distribution of Sampling Distribution of

MeansMeans

Instructor: Dr. Tung-Instructor: Dr. Tung-hsien Hehsien He

[email protected]@tea.ntue.edu.tw

Normal DistributionsNormal Distributions1. 1. TheoreticallyTheoretically,, any variable being any variable being

measured by measured by infiniteinfinite times would tend times would tend to display a to display a normal distributionnormal distribution..

2. Normal Distributions are described by 2. Normal Distributions are described by a mathematical equation (4.1, p. 86)a mathematical equation (4.1, p. 86)

3. Not every distribution of a variable 3. Not every distribution of a variable will match the normal distribution.will match the normal distribution.

4. Distributions of variables, however, 4. Distributions of variables, however, will approximate the normal will approximate the normal distributions.distributions.

5. Normal distributions have 5. Normal distributions have various various shapesshapes and and curvescurves (p. 88. Noted: (p. 88. Noted: these distributions are normal these distributions are normal distributions]distributions]

6. Shapes of normal distributions will 6. Shapes of normal distributions will be determined by be determined by meansmeans and and standard deviationstandard deviation

Features of Normal DistributionsFeatures of Normal Distributions1. 1. UnimodalUnimodal: Only with one mode: Only with one mode2. 2. SymmetricalSymmetrical 3. Bell-Shaped3. Bell-Shaped4. 4. Maximum Height as Mean Maximum Height as Mean 5. Values on X axis are continuous 5. Values on X axis are continuous 6. 6. Asymptotic (Asymptotic ( 漸進漸進 )) to X axis: the to X axis: the

curses never touch the X axiscurses never touch the X axis

7. Shapes are determined by mean and 7. Shapes are determined by mean and SD (See p. 88: Figure B/C: Which SD (See p. 88: Figure B/C: Which distributions have larger values of SD? distributions have larger values of SD? Why they are normal distributions?).Why they are normal distributions?).

8. 8. The number of normal distributions is The number of normal distributions is infiniteinfinite (since the requirements for a (since the requirements for a normal distribution can be easily met).normal distribution can be easily met).

9.9. After a variable has been tested for an After a variable has been tested for an infinite number of times, the scores of infinite number of times, the scores of all the tests will approximate the all the tests will approximate the normal distribution.normal distribution.

Standard Normal Distribution: Standard Normal Distribution: 1. The distribution of normally 1. The distribution of normally

distributed standard scores: distributed standard scores: zz score score2. Formula: See 4.2, p. 892. Formula: See 4.2, p. 893. 3. Mean = 0Mean = 04. 4. SD = 1SD = 15. The shape (proportion) of the 5. The shape (proportion) of the

distribution: See figure 4.3, p. 90distribution: See figure 4.3, p. 906. Determining Proportions: See Table 6. Determining Proportions: See Table

c.1, p. 618c.1, p. 618

7. Determining Percentiles by Using NSD: 7. Determining Percentiles by Using NSD: See Table c.2, p. 621See Table c.2, p. 621 E.g.: 70E.g.: 70thth PR with a mean = 85, SD = 20: PR with a mean = 85, SD = 20: (Think about how z-score is computed?)(Think about how z-score is computed?)Step 1: Check Table C.2 on p. 621Step 1: Check Table C.2 on p. 621Step 2: Find B (The Larger Area) = .70Step 2: Find B (The Larger Area) = .70Step 3: Find z that corresponds to B Step 3: Find z that corresponds to B = .70 -> z = .5244= .70 -> z = .5244Step 4: Use z score formula: ?-85/20 Step 4: Use z score formula: ?-85/20 = .5244 -> ? = 95.488= .5244 -> ? = 95.488

8. Determining Percentile Ranks:8. Determining Percentile Ranks: E.g.: PR102 with mean = 85, SD=20: (i.e., E.g.: PR102 with mean = 85, SD=20: (i.e., PR of raw score 102 is = ?)PR of raw score 102 is = ?)Step 1: Use z score formula: z = 102-85/20 Step 1: Use z score formula: z = 102-85/20 -> z = 0.85-> z = 0.85Step 2: Check C.2 Table on p. 618Step 2: Check C.2 Table on p. 618Step 3: Find z = .85, Area between mean Step 3: Find z = .85, Area between mean and z = .3023and z = .3023Step 4: 0.5+0.3023 = 0.8023 -> PR 102 = Step 4: 0.5+0.3023 = 0.8023 -> PR 102 = 80.2380.23

Normal Distribution the fundamental Normal Distribution the fundamental assumption for inferential statistics: assumption for inferential statistics:

a. a variable will be normally distributed if it a. a variable will be normally distributed if it is tested for an infinite number of times. is tested for an infinite number of times.

b. The selected sample(s) must represent b. The selected sample(s) must represent the characteristics of the population from the characteristics of the population from which the sample(s) is/are selected (i.e., which the sample(s) is/are selected (i.e., sample(s) must match the populationsample(s) must match the population))

c. c. Random selectionsRandom selections (accompanied by (accompanied by random assignmentsrandom assignments) are the only way to ) are the only way to guarantee the representation of the guarantee the representation of the sample(s).sample(s).

Chain of Reasoning in Inferential StatisticsChain of Reasoning in Inferential Statisticsa. Parameters: Populationsa. Parameters: Populationsb. Statistics: Samplesb. Statistics: Samplesc. Test hypothesis about parameters: There c. Test hypothesis about parameters: There

will be two types of hypotheses that will will be two types of hypotheses that will be tested: the be tested: the null hypothesisnull hypothesis and and alternative hypothesisalternative hypothesis..

d. Estimate parameters based on statisticsd. Estimate parameters based on statisticse. Inferential statistics is all about e. Inferential statistics is all about

““samplingsampling” and “” and “hypothesis testinghypothesis testing””

Two types of Samples: Probability Two types of Samples: Probability Samples vs. Nonprobability Samples Samples vs. Nonprobability Samples

Probability Samples (Probability Samples ( 隨機非隨便樣本隨機非隨便樣本 ))1. 1. Simple Random Sampling:Simple Random Sampling: Every Every

member in the target population has member in the target population has identical chances to be selectedidentical chances to be selected

a. Sampling with replacement: Selected a. Sampling with replacement: Selected subjects are included in the next subjects are included in the next selection proceduresselection procedures

b. Sampling without replacement: b. Sampling without replacement: Selected subjects are excluded from Selected subjects are excluded from next selection proceduresnext selection procedures

c. The two methods will yield different c. The two methods will yield different probabilities. probabilities.

d. Using SPSS to random select cases d. Using SPSS to random select cases (i.e., the random seeds)(i.e., the random seeds)

2. 2. Systematic SamplingSystematic Sampling:: Choosing every Choosing every kth member of a list that contains all kth member of a list that contains all members of a population.members of a population.

3. 3. Cluster SamplingCluster Sampling:: Clusters (naturally Clusters (naturally formed groups) are randomly selected formed groups) are randomly selected from the population of clusters.from the population of clusters.

4. 4. Stratified Random Sampling: Stratified Random Sampling: a.a. To select samples from a To select samples from a

heterogeneous population that heterogeneous population that contains several subpopulations contains several subpopulations (strata);(strata);

b. Strata have to be defined well;b. Strata have to be defined well;c. Random samples of members of c. Random samples of members of

each stratum are selected. each stratum are selected.

Nonprobability Samples (No random Nonprobability Samples (No random selection is involved) : selection is involved) :

1.1. Purposive SamplesPurposive Samples:: a. Starting off with a large group of a. Starting off with a large group of potential subjects;potential subjects;b. Using screening criteria to select b. Using screening criteria to select those subject who meet these those subject who meet these criteria.criteria.

2.2. Convenience Samples (Convenience Samples ( 立意樣本立意樣本 ):): Selecting whoever you want to be Selecting whoever you want to be subjects. subjects.

3.3. Quota SamplesQuota Samples::a. Deciding X percent of a certain kind a. Deciding X percent of a certain kind of subjects and Y percent of another of subjects and Y percent of another kind of subjects;kind of subjects;b. Going out to select whoever these b. Going out to select whoever these kinds of subjects to be subjects.kinds of subjects to be subjects.

4. 4. Snowball SamplesSnowball Samples::a. Starting off with a convenience a. Starting off with a convenience sample;sample;b. Recruiting more others related to b. Recruiting more others related to this sample like family members, this sample like family members, friends, . . .friends, . . .

5. 5. Returning Questionnaires in Returning Questionnaires in SurveysSurveys (Why?) (Why?)

6. 6. VolunteersVolunteers

Sampling Distributions of Mean: Key Sampling Distributions of Mean: Key Concept & Assumption for Inferential Concept & Assumption for Inferential StatisticsStatistics

a. Meaning: the distributions of all a. Meaning: the distributions of all possible means of a certain size of possible means of a certain size of samples that have been selected and samples that have been selected and tested for an infinite number of tested for an infinite number of times (theoretically possible only)times (theoretically possible only)

b. Interpretation: Theoretically, a b. Interpretation: Theoretically, a certain side of samples can be chosen certain side of samples can be chosen for infinite times from a population, for infinite times from a population, and each time when a sample is and each time when a sample is drawn, there will be a corresponding drawn, there will be a corresponding mean for this particular sample. mean for this particular sample. Because samples can be infinitely Because samples can be infinitely chosen (chosen (only theoretically possibleonly theoretically possible), ), there will be an infinite number of there will be an infinite number of sample means. Sampling distribution sample means. Sampling distribution of mean represents the distribution of of mean represents the distribution of these means. these means.

c. Properties: c. Properties: 1. Shape: 1. Shape:

(a) As sample size increases, the (a) As sample size increases, the sampling distribution of the mean for sampling distribution of the mean for simple random samples of n cases will simple random samples of n cases will approximates a normal distributionapproximates a normal distribution(b) if the sample size is no smaller (b) if the sample size is no smaller than 30, the sampling distribution of than 30, the sampling distribution of the mean (n=30) will approximate the mean (n=30) will approximate normal distributions. normal distributions.

2. A normal Distribution;2. A normal Distribution;

3. Variance & Standard Deviation of 3. Variance & Standard Deviation of Sampling Distribution of Mean:Sampling Distribution of Mean:(a) Variance: 6.8, p. 150(a) Variance: 6.8, p. 150(b) Standard Deviation (Standard Error): (b) Standard Deviation (Standard Error): 6.9, p. 150 6.9, p. 150

4. Mean of the Distribution of Sampling 4. Mean of the Distribution of Sampling Means equals Means equals

5. As sample size (n) increases, variability of 5. As sample size (n) increases, variability of the sampling distribution of the mean the sampling distribution of the mean decreases (Why?)decreases (Why?)

6. See figure 6.7, 6.8 on p. 152 & p. 153 for 6. See figure 6.7, 6.8 on p. 152 & p. 153 for standard sampling distribution of the mean standard sampling distribution of the mean (z score) (z score)

Hypothesis Testing: (A must-known concept Hypothesis Testing: (A must-known concept for you to understand inferential statistics):for you to understand inferential statistics):

1. Meaning: Making inferences about the 1. Meaning: Making inferences about the nature of the population on the basis of nature of the population on the basis of observations of a sample drawn from the observations of a sample drawn from the populationpopulation

2. Logics: as the differences between 2. Logics: as the differences between hypothesized value for the population mean hypothesized value for the population mean and the sample mean are computed and and the sample mean are computed and found to be very large, the hypothesis is found to be very large, the hypothesis is rejected.rejected.

3. Figure 7.1 on p. 166 for detailed 3. Figure 7.1 on p. 166 for detailed explanations:explanations:

Types of HypothesisTypes of Hypothesis1. A conjecture about a or more 1. A conjecture about a or more

population parameterspopulation parameters2. Testing a specific hypothesis does 2. Testing a specific hypothesis does

not mean prove or disprove the not mean prove or disprove the conjecture; it only tells conjecture; it only tells how likely how likely (i.e., the probability) this hypothesis (i.e., the probability) this hypothesis may be truemay be true..

a. Null Hypothesis [a. Null Hypothesis [ 虛無假設虛無假設 ]: ]: (1) Symbol: H(1) Symbol: Hoo

(2) It means “(2) It means “no relationno relation between between variables”:variables”: the relation index the relation index is equal is equal to 0to 0(3) It can also mean: “(3) It can also mean: “no differenceno difference in in means”: means”: mean1 is equal to mean2mean1 is equal to mean2 (3) E.g. 1:(3) E.g. 1: Ho: Ho: = 455 = 455(4) E.g. 2: (4) E.g. 2: the mean of the the mean of the experimental group is equal to the experimental group is equal to the mean of the control groupmean of the control group..

(5) If Ho(5) If Ho is retained, it means:is retained, it means: (a) (a) the null hypothesis the null hypothesis is very likely to beis very likely to be truetrue (or happen) (or happen) at certain level of at certain level of confidenceconfidence; ; (b) the (b) the probabilityprobability for for the null hypothesis the null hypothesis to be true is very to be true is very highhigh; ; (c) there is (c) there is no relationno relation between two between two variables; variables; (e) (e) differencesdifferences in two means are in two means are so small so small and and nonsignificantnonsignificant that the differences that the differences can be discarded (i.e., the can be discarded (i.e., the differences differences may be caused by may be caused by sampling errorsampling error))

b. Alternative Hypothesis [b. Alternative Hypothesis [ 對立假設對立假設 ]] ::(1) Symbol: (1) Symbol: HHaa

(2) Relations between variables exist (2) Relations between variables exist (i.e., the relation index (i.e., the relation index is not equal to is not equal to 00); two or more means are different ); two or more means are different from each other or one another (i.e., from each other or one another (i.e., mean1 is not equal to mean2mean1 is not equal to mean2))(3) (3) Against the Null HypothesisAgainst the Null Hypothesis(4) The researchers’ expected (4) The researchers’ expected outcome (outcome (researchers’ hypothesisresearchers’ hypothesis) ) (5) Example: (5) Example: HHaa: : 455 455

(6) If (6) If HaHa is retained, it means: is retained, it means: (a) The (a) The null hypothesis (Ho) is rejected.null hypothesis (Ho) is rejected.(b) A (b) A significantsignificant relation or relation or significant significant differencesdifferences are detected. are detected.(c) (c) HaHa is very likely to be true is very likely to be true (or to (or to happen) at certain level of confidence.happen) at certain level of confidence.(d) T(d) The he probabilityprobability for the alternative for the alternative hypothesis to be true is very hypothesis to be true is very highhigh..

(e) (e) DifferencesDifferences in two means are so in two means are so significantsignificant that the two means are that the two means are very likely to be very likely to be differentdifferent..

Meaning of Accepting Ho: Meaning of Accepting Ho: = 455 = 455, , but but Rejecting Ha: Rejecting Ha: 455 when the sample 455 when the sample mean was found to be 454: mean was found to be 454:

1. Condition: We assume (hypothesize) that 1. Condition: We assume (hypothesize) that of a population is 455 of a population is 455. We formulate the . We formulate the following hypotheses:following hypotheses:Ho: Ho: = 455, H = 455, Haa: : 455 455Then we select and test a sample, and find Then we select and test a sample, and find its its sample mean = 454sample mean = 454. . Based on the sample mean, it is very likely Based on the sample mean, it is very likely that the population mean will be 455. So, that the population mean will be 455. So, Ho is retained but Ha is rejected. Ho is retained but Ha is rejected.

2. Interpretation2. Interpretationa. Because a. Because the differencethe difference ( ( - X bar = 455- - X bar = 455-454 = 1454 = 1) between the expected mean and ) between the expected mean and the observed mean the observed mean is very smallis very small, i.e., , i.e., nonsignificantnonsignificant, , Ho is retained and HaHo is retained and Ha is is rejectedrejected . . b. Since we formulate two hypothesis, b. Since we formulate two hypothesis, namely, Ho: namely, Ho: = 455, H = 455, Haa: : 455, and Ho 455, and Ho is retained, we can draw a conclusion: is retained, we can draw a conclusion: It It is very likely that the population mean is is very likely that the population mean is 455455..c. Where does the difference, that is, 1, c. Where does the difference, that is, 1, come from? It comes from come from? It comes from sampling error!sampling error!

Meaning of Rejecting Ho: Meaning of Rejecting Ho: = 455, = 455, but Accepting Hbut Accepting Haa: : 455 when the 455 when the sample mean was found to be sample mean was found to be 80,000:80,000:

1. Condition: We assume 1. Condition: We assume (hypothesize) that (hypothesize) that of a population of a population is 455is 455. We formulate the following . We formulate the following hypotheses:hypotheses:Ho: Ho: = 455, H = 455, Haa: : 455 455Then we select and test a sample, Then we select and test a sample, and find its and find its sample mean = 80,000sample mean = 80,000

Based on the sample mean, it is very Based on the sample mean, it is very unlikely that the population mean unlikely that the population mean will be 455. So, Ho is rejected but Ha will be 455. So, Ho is rejected but Ha is retained. is retained.

2. Interpretation: 2. Interpretation: a. Because it is very unlikely that the a. Because it is very unlikely that the population mean will be population mean will be 455455. So, . So, Ho is Ho is rejected in favor of Harejected in favor of Ha..b. Because differences between the b. Because differences between the expected population mean and theexpected population mean and the observed meanobserved mean are very huge, i.e.,are very huge, i.e., significant, Ho is rejectedsignificant, Ho is rejected (i.e., Ho: (i.e., Ho: = 455, H= 455, Haa: : 455). Thus, we can 455). Thus, we can reach the following reach the following conclusionconclusion: : it is it is very unlikely that mean of population very unlikely that mean of population is 455is 455. .

Rule of Thumb: Rule of Thumb: 1. If differences between the 1. If differences between the expected meanexpected mean and the and the observed observed meanmean are very are very smallsmall, i.e., , i.e., nonsignificant, you should retain Ho nonsignificant, you should retain Ho but reject Ha.but reject Ha.2. 2. If differences between the If differences between the expected meanexpected mean and the and the observed observed meanmean are very are very hugehuge, i.e., , i.e., significant, significant, your should reject Ho but retain Ha.your should reject Ho but retain Ha.

To retain or to reject Ho , that is a To retain or to reject Ho , that is a question (see p. 166):question (see p. 166): a. Since it is impossible to know the a. Since it is impossible to know the true true of a population, we can only of a population, we can only hypothesize its value.hypothesize its value. If we If we hypothesize hypothesize to be to be 455455, and its , and its standard deviation of the population, standard deviation of the population, , is hypothesized to be , is hypothesized to be 100100. Then we . Then we select a sample of select a sample of 144144 subjects and subjects and find its find its sample mean = 535sample mean = 535. We . We formulate the following hypotheses:formulate the following hypotheses:HoHo: : = 455, = 455, HHaa: : 455 455

Q: Q: Based on the sample mean, Based on the sample mean, should we reject or retain Ho?should we reject or retain Ho?

(a) At first sight, (a) At first sight, we should reject Howe should reject Ho because the difference between the because the difference between the observed observed 535535 and hypothesized and hypothesized 455455 is is 8080, and it seems very huge. But, , and it seems very huge. But, that is our feeling only. How will that is our feeling only. How will statistics tell us? There are a few key statistics tell us? There are a few key points that need to be taken into points that need to be taken into account before we can answer this account before we can answer this question:question:

(b) The sampling distribution of means (b) The sampling distribution of means is our solution to this question because:is our solution to this question because:(1) the mean of the sampling (1) the mean of the sampling distribution of means is the population distribution of means is the population mean;mean;(2) the sampling distribution of means is (2) the sampling distribution of means is a normal distribution, and the a normal distribution, and the standard standard deviationdeviation of the sampling distribution of of the sampling distribution of means (means (standard error, standard error, 標準誤標準誤 ) = ) = /√n /√n, , whenwhen (population SD) = 100 (population SD) = 100 (hypothesized); n (number of subjects) (hypothesized); n (number of subjects) =144=144

(3) When we select a sample and get (3) When we select a sample and get its mean, we don’t expect this mean its mean, we don’t expect this mean to be perfectly equal to to be perfectly equal to , , particularly when we do not know particularly when we do not know what the what the is. But we are sure if we is. But we are sure if we sample the population for infinite sample the population for infinite times, one of the sample means will times, one of the sample means will be equal to the population mean. be equal to the population mean. And this mean is exactly the mean of And this mean is exactly the mean of the sampling distribution of means. the sampling distribution of means.

(4) Why may or may not the (4) Why may or may not the observed mean be equal to the observed mean be equal to the population mean?population mean? It is because when It is because when we we draw a sampledraw a sample, we will make , we will make errorserrors called called “sampling errors” (any “sampling errors” (any sampling procedure will yield sampling procedure will yield sampling errors, including random sampling errors, including random selections).selections). Thus, the observed Thus, the observed sample mean stems from two sample mean stems from two resources: resources: ““sampling error + true population sampling error + true population mean”mean”

(c) In our example, the (c) In our example, the observed observed sample meansample mean is is 535535 but we but we hypothesize the hypothesize the population meanpopulation mean to to be be 455455. The difference between the . The difference between the two numbers is 80. Since we know two numbers is 80. Since we know observed sample mean = sampling observed sample mean = sampling error + true population meanerror + true population mean, there , there are at least two possible reasons to are at least two possible reasons to account for why the observed mean account for why the observed mean is 535:is 535:

(1) Possible Explanation 1:(1) Possible Explanation 1:true population mean = 455, sampling errors = true population mean = 455, sampling errors = 8080Thus,Thus, Ho: Ho: = 455 = 455So, we retain Ho: So, we retain Ho: = 455, but reject H = 455, but reject Haa: : 455 455(2) Possible Explanation 2:(2) Possible Explanation 2:true population mean ≠455, sample errors ≠ 80 true population mean ≠455, sample errors ≠ 80 (i.e., either > or < 80), but sampling error + (i.e., either > or < 80), but sampling error + true population mean = 535true population mean = 535Thus,Thus, HHaa: : 455 455So, we reject Ho: So, we reject Ho: = 455, but retain H = 455, but retain Haa: : 455 455

(d) Now we can test the possibility (d) Now we can test the possibility (i.e., (i.e., probability)probability) of the two possible of the two possible explanations by estimating whether the explanations by estimating whether the probability of the sample error is 80 or probability of the sample error is 80 or not. not. (e) Remember:(e) Remember:[1) the mean of the sampling [1) the mean of the sampling distribution of means is the population distribution of means is the population mean and a normal distribution;mean and a normal distribution;(2) the (2) the standard deviationstandard deviation ( (standard standard errorerror) of the sampling distribution of ) of the sampling distribution of means is means is /√n /√n . .

(f) Now, we can compute the probability (f) Now, we can compute the probability of the occurrence of 535 when it is put of the occurrence of 535 when it is put into the sampling distribution of means into the sampling distribution of means whose mean is hypothesized as 455.whose mean is hypothesized as 455. Z = x1 –x2/SD ->Z = x1 –x2/SD -> Z score of 535: Z score of 535: x1 = 535x1 = 535x 2= x 2= = 455 = 455SD = Standard Error = SD = Standard Error = /√n = 100 / /√n = 100 / √144 = 100/12 = 8.33√144 = 100/12 = 8.33Z = x- Z = x- /( /( /√n) = 535-455/8.33= /√n) = 535-455/8.33= 80/8.33 = 9.680/8.33 = 9.6

(g) What does Z = 9.6 tell us? (g) What does Z = 9.6 tell us? (1) Check the (1) Check the standard normal standard normal distribution graphdistribution graph. Z = 9.6 will fall . Z = 9.6 will fall extremely farther on the right end of extremely farther on the right end of the distribution. In other words, the the distribution. In other words, the probability for Z = 9.6 to take place probability for Z = 9.6 to take place in this standard normal distribution in this standard normal distribution isis very extremely lowvery extremely low. .

(2) Check (2) Check Z score TableZ score Table to find the to find the large portion of Z = 9.6.large portion of Z = 9.6. 1- the large 1- the large portionportion = = probabilityprobability for Z = 9.6 to for Z = 9.6 to take place when take place when = 455. = 455. The value of The value of probability means the chance for Ho probability means the chance for Ho to be retainedto be retained. The higher the value . The higher the value is, the higher probability is and the is, the higher probability is and the more likely Ho is retained. more likely Ho is retained. (3) z = 3.2905 (see Table C.2, p. 621), (3) z = 3.2905 (see Table C.2, p. 621), the larger area = .9995, probability = the larger area = .9995, probability = 1-.99995 = .0005. Probability for z = 1-.99995 = .0005. Probability for z = 9.6 will be much smaller than .0005. 9.6 will be much smaller than .0005.

(4) In other words, for a population (4) In other words, for a population whose mean is hypothesized as 455 whose mean is hypothesized as 455 (in the z score distribution, the mean (in the z score distribution, the mean will be 0), the will be 0), the probabilityprobability for this for this sample mean, 535, (in the z score sample mean, 535, (in the z score distribution is 9.6) is less than distribution is 9.6) is less than 0.0005. Thus, based on our sample 0.0005. Thus, based on our sample mean = 535, it is extremely mean = 535, it is extremely impossible that the population mean impossible that the population mean will be 455. will be 455.

(h) Results:(h) Results:Look at the two hypotheses:Look at the two hypotheses:Ho: Ho: = 455 = 455HHaa: : 455 455ProbabilityProbability for Ho to be retained is for Ho to be retained is less than 0.0005less than 0.0005. .

(i) Conclusion:(i) Conclusion:(1) Based on the statistics of the sample (1) Based on the statistics of the sample mean, that is, 535, the probability for the mean, that is, 535, the probability for the hypothesized population mean = 455 to hypothesized population mean = 455 to take place is less than 0.00003. In other take place is less than 0.00003. In other words, the probability for Ho: words, the probability for Ho: = 455 to be = 455 to be retained is less than 0.0005. Because this retained is less than 0.0005. Because this probability is too low,probability is too low, Ho: Ho: = 455 = 455 should should NOT be retainedNOT be retained. Ho: . Ho: = 455 should be = 455 should be rejected. Instead, rejected. Instead, HHaa: : 455 should be 455 should be retainedretained.. Thus, based on our sample mean, Thus, based on our sample mean, 535,535, it is extremely impossible that the it is extremely impossible that the population mean will be 455population mean will be 455..

(2)(2) Hence, the difference between Hence, the difference between the hypothesized population mean the hypothesized population mean and the observed sample mean is so and the observed sample mean is so huge and so huge and so significantsignificant that the that the sampling errors can not be used to sampling errors can not be used to explain this differenceexplain this difference (in this case, (in this case, the sampling error should be less the sampling error should be less than 80).than 80).

(3) The (3) The probabilityprobability for for = 455 = 455 is is smaller than .05 [smaller than .05 [p p <.05]<.05];; the the difference between the sample mean difference between the sample mean and assumed population mean is and assumed population mean is statistically significant. The statistically significant. The population mean should not be 455.population mean should not be 455.

Criterion for Rejecting Ho:Criterion for Rejecting Ho:1. Researchers can set up the 1. Researchers can set up the region of region of

rejectionrejection (in the normal distribution) to (in the normal distribution) to reject a Ho. reject a Ho.

2. This proportion of area is referred to 2. This proportion of area is referred to ““level of significancelevel of significance” and notated as ” and notated as , , i.e., i.e., is a is a probability)probability)

3. It equals 3. It equals the maximum probabilitythe maximum probability of of rejecting Horejecting Ho..

4. In the field of language education, 4. In the field of language education, is is usually set at usually set at 0.050.05. . can also be set at 0.1, can also be set at 0.1, 0.01, 0.001, or even smaller. 0.01, 0.001, or even smaller.

5. Depending on 5. Depending on the type of distributionthe type of distribution used, a cutting used, a cutting value, i.e., value, i.e., critical valuescritical values in the statistic term, will be in the statistic term, will be computed. computed.

6. For the z distribution, if 6. For the z distribution, if = .05 = .05, then the critical value , then the critical value of z score for rejecting or retaining Ho will be of z score for rejecting or retaining Ho will be 1.6449 1.6449 (check Table C.2 on p. 621.)(check Table C.2 on p. 621.)Step 1: 1-.05 = .95Step 1: 1-.05 = .95Step 2: Find B (large area) = .950 from Table C.2Step 2: Find B (large area) = .950 from Table C.2Step 3: Find z score corresponding to B = .950Step 3: Find z score corresponding to B = .950Step 4: z = 1.6449Step 4: z = 1.6449If If = 0.025 = 0.025, the critical z score will be , the critical z score will be 1.961.96

7. The smaller value of an 7. The smaller value of an : : (1)(1) the smaller rejecting area the smaller rejecting area (2) the more difficult to reject the Ho(2) the more difficult to reject the Ho(3) the less likely to reject the Ho(3) the less likely to reject the Ho(4) the more likely to accept Ho(4) the more likely to accept Ho

ButBut(5) the easier to reject Ha(5) the easier to reject Ha(b) the more likely to reject Ha(b) the more likely to reject Ha(7) the more difficult to accept Ha(7) the more difficult to accept Ha(8) the less likely to accept Ha(8) the less likely to accept Ha(9) a larger critical value.(9) a larger critical value.

8. The smaller an 8. The smaller an , the more conservative it , the more conservative it is. In the field of medicine, is. In the field of medicine, is is conventionally set a very conservative level conventionally set a very conservative level such as 0.01 or 0.001. It is because an such as 0.01 or 0.001. It is because an extremely conservative extremely conservative will make it very will make it very unlikely to reject Ho. Thus, unlikely to reject Ho. Thus, in order to in order to reject Ho in favor of Hareject Ho in favor of Ha, , the differences the differences between two means must be very, very between two means must be very, very largelarge. . In medicinal experiments, taking In medicinal experiments, taking new drugs must make huge differences in new drugs must make huge differences in patients in order to reject the Ho and patients in order to reject the Ho and accept the Haaccept the Ha. So, a very conservative . So, a very conservative is is used. used.

9. If the 9. If the p valuesp values are are lower thanlower than the the level of significance, i.e., level of significance, i.e., , , it means it means the probability to retain Ho is too the probability to retain Ho is too smallsmall. Thus, . Thus, Ho should be rejectedHo should be rejected. . E.g., E.g., pp = 0.00034 < = 0.00034 < = 0.05 = 0.05, i.e., , i.e., pp < < 0.050.05, it means the , it means the probability to probability to accept Ho is too smallaccept Ho is too small. So, . So, rejecting rejecting Ho and accepting HaHo and accepting Ha. Results can be . Results can be written as: written as: The difference is The difference is statistically significant at the level of statistically significant at the level of 0.050.05..

10. If the 10. If the pp value is value is larger than larger than the the level of significance, i.e., level of significance, i.e., , , it means it means the probability to retain Ho is highthe probability to retain Ho is high. . E.g., E.g., p = p = 0.45 < 0.45 < = 0.05 = 0.05, , i.e., , , i.e., pp > > 0.050.05 it means the it means the probability to probability to accept Ho is hugeaccept Ho is huge. So, . So, retaining Ho retaining Ho and rejecting Haand rejecting Ha. Results can be . Results can be written as: written as: The difference is not The difference is not significant at the level of 0.05significant at the level of 0.05..

Important Note:Important Note:The p-value indicates the probability The p-value indicates the probability to accept Hoto accept Ho

Errors in Hypothesis TestingErrors in Hypothesis Testing: : 1. Meaning: No matter Ho is rejected or 1. Meaning: No matter Ho is rejected or retained, researchers will take the risk retained, researchers will take the risk of making errors in their decisions.of making errors in their decisions.2. 2. Two Types of ErrorsTwo Types of Errors::(1) (1) Type I ErrorType I Error: : a. Meaning: a. Meaning: Rejecting a true null Rejecting a true null hypothesishypothesisb. Explanation: Researchers decide to b. Explanation: Researchers decide to reject a null hypothesis, but this null reject a null hypothesis, but this null hypothesis is actually true.hypothesis is actually true.

c. Reasons:c. Reasons: Researchers Researchers reject the Horeject the Ho because because the the sample mean falls in the rejecting areasample mean falls in the rejecting area.. That That is, the is, the probabilityprobability for the Ho to be retained is for the Ho to be retained is very very lowlow ( (but not 0but not 0). However, ). However, Ho may still stand Ho may still stand truetrue, although it is very unlikely. Because the , although it is very unlikely. Because the probabilityprobability for the Ho to be retained for the Ho to be retained is not 0is not 0 (e.g., (e.g., p-value = 0.0001p-value = 0.0001), there should be a very ), there should be a very slightest chance (slightest chance (i.e., 0.0001i.e., 0.0001) that the Ho may be ) that the Ho may be true and should be retained. Researchers decide true and should be retained. Researchers decide to reject the to reject the Ho because of its low probabilityHo because of its low probability, , not because the probability is zeronot because the probability is zero. Thus, when . Thus, when the Ho is rejected, researchers may make a Type the Ho is rejected, researchers may make a Type I error since the Ho can be true but it has been I error since the Ho can be true but it has been rejected.rejected.

(2) (2) Type II ErrorType II Error::a. Meaning: a. Meaning: Retaining a false null Retaining a false null hypothesishypothesisb. Explanation: Researchers decide b. Explanation: Researchers decide to retain a null hypothesis, but this to retain a null hypothesis, but this null hypothesis is actually false.null hypothesis is actually false.

c. Reasons: Researchers c. Reasons: Researchers retain the Horetain the Ho because the because the sample mean does not fall in sample mean does not fall in the rejecting areathe rejecting area. That is, the probability . That is, the probability for the Ho to be retained is very high (for the Ho to be retained is very high (but but not 1not 1). However, ). However, Ho may still stand falseHo may still stand false, , although it is very unlikely. Because the although it is very unlikely. Because the probability for the Ho to be retained is not probability for the Ho to be retained is not 1 (e.g., 1 (e.g., pp = 0.99 = 0.99), there should be a very ), there should be a very slightest chance (slightest chance (i.e., 0.01i.e., 0.01) that the Ho ) that the Ho should be rejected because it’s false. Thus, should be rejected because it’s false. Thus, when the Ho is retained, researchers may when the Ho is retained, researchers may make a Type II error since the Ho can be make a Type II error since the Ho can be false but it has been retained. false but it has been retained.

Level of Confidence [Level of Confidence [信心水準信心水準 ]:]: In a survey In a survey study, the opposite of study, the opposite of value value (1- (1- ]] . .(a) Meaning: (a) Meaning: the degree of confidence that the degree of confidence that researchers have in not making the error researchers have in not making the error when they decide to reject Howhen they decide to reject Ho..(b) Interpretation:(b) Interpretation: if the if the = .05, = .05, researchers realize when a null hypothesis researchers realize when a null hypothesis is rejected, they may make a mistake 5 is rejected, they may make a mistake 5 times out of 100 times. That is, they are times out of 100 times. That is, they are 95% confident95% confident in their in their results of rejecting results of rejecting HoHo..

Levels of Significance vs. Types of ErrorsLevels of Significance vs. Types of Errors::(1) if the value of (1) if the value of is reduced is reduced from 0.05 from 0.05 to 0.01, the probability of making to 0.01, the probability of making Type I Type I error decreaseserror decreases, whereas the probability , whereas the probability of making of making Type II error increasesType II error increases..(2) if the value of (2) if the value of is raised is raised from 0.01 to from 0.01 to 0.05, the probability of making 0.05, the probability of making Type I Type I error increaseserror increases, whereas the probability , whereas the probability of making of making Type II error decreasesType II error decreases..(3) (3) = 0.05 is more likely to reject Ho = 0.05 is more likely to reject Ho than than = 0.01 (Why?) = 0.01 (Why?)

5. One-Tailed [5. One-Tailed [單尾單尾 ] vs. Two-Tailed [] vs. Two-Tailed [雙尾雙尾 ] ] Hypothesis Testing: Hypothesis Testing: (1) One-Tailed (Directional): Ho: (1) One-Tailed (Directional): Ho: 455; 455; H Haa: : 455 455(2) Two-Tailed (Non-Directional): Ho: (2) Two-Tailed (Non-Directional): Ho: = = 455; H455; Haa: : 455 455 (3) Critical values [(3) Critical values [決斷值決斷值 ] for rejecting ] for rejecting Ho is different: Figures 7.6 (p. 178), Ho is different: Figures 7.6 (p. 178), Figure 7.7 & 7.8 (p. 180/181)Figure 7.7 & 7.8 (p. 180/181) one-tailedone-tailed critical value of critical value of z = z = 1.645 1.645 (critical value = (critical value = 1.645) as 1.645) as = .05, = .05, whereas whereas two-tailedtwo-tailed z = z = 1.961.96

(4) One-tailed is more likely to reject (4) One-tailed is more likely to reject HoHo (Why?) (Why?)

Standard Sampling Distribution of Standard Sampling Distribution of the Mean: Using Student’s t the Mean: Using Student’s t distribution if distribution if is unknown: is unknown:(1) Standard error: (1) Standard error: ss/√/√nn as as ss: : standard deviation of the samplestandard deviation of the sample

Student’s t distributions:Student’s t distributions:a. For small samples, sampling distribution a. For small samples, sampling distribution

of the mean departs considerably away of the mean departs considerably away from normal distributions;from normal distributions;

b. a family of distributions;b. a family of distributions;c. as sample sizes increase (n=30), c. as sample sizes increase (n=30),

distributions of sampling distribution of distributions of sampling distribution of the mean approximates normal the mean approximates normal distribution;distribution;

d. t distributions with a d. t distributions with a meanmean equal to equal to 00 and and SD = 1SD = 1 (Exactly like z- score (Exactly like z- score distribution) distribution)

e. All testing procedures are identical e. All testing procedures are identical to z-score distributionto z-score distribution

f. Each t distribution is related to f. Each t distribution is related to degree of freedomdegree of freedom ( (dfdf] [] [ 自由度自由度 ): ): n-1n-1(1) df: the number of elements of (1) df: the number of elements of data that are free to vary in data that are free to vary in calculating a statistic. calculating a statistic. [2] why df: each t distribution [2] why df: each t distribution responds to a df.responds to a df.[3] if one restriction is added, the [3] if one restriction is added, the number of freedom will be one less.number of freedom will be one less.

[4] x = [4] x = Σ/n, but s = Σ/n, but s = √√ΣΣ(( x-xx-x )) 22/n-1/n-1 -> -> xx is added as a restriction; thus, df = is added as a restriction; thus, df = n-1n-1

[5] when df increases, t distribution [5] when df increases, t distribution approximates normal distribution; approximates normal distribution; when df=29, t distribution becomes when df=29, t distribution becomes a normal distribution. Thus, the a normal distribution. Thus, the statistic techniques that use t statistic techniques that use t distributions must have a df over 29, distributions must have a df over 29, that is, n at least is 30. that is, n at least is 30. g. Critical values for t distributions g. Critical values for t distributions can be found in Table c.3, p. 622.can be found in Table c.3, p. 622.

Statistical Precision: Statistical Precision: (1) the inverse of a standard error;(1) the inverse of a standard error;(2) the smaller a standard error is, (2) the smaller a standard error is, the greater the statistical precision;the greater the statistical precision;(3) as the sample size is increased, (3) as the sample size is increased, the precision is increased the precision is increased accordingly.accordingly.

Example:Example: Scenario: A researcher Scenario: A researcher hypothesizes hypothesizes that GPA of student athletes is less that GPA of student athletes is less than 2.5than 2.5. To test this hypothesis, the . To test this hypothesis, the researcher selects researcher selects 2020 subjects and subjects and find the GPA find the GPA meanmean of this sample is of this sample is 2.452.45, , ss = 0.54 = 0.54, , ss22 = 0.29 = 0.29, , is set as is set as 0.050.05 level. What inferences can the level. What inferences can the researcher make? researcher make?

Computation: Computation:

1. t distributions is used since 1. t distributions is used since population variance is not known.population variance is not known.

2. Hypotheses:2. Hypotheses:Ho: Ho: = 2.5; Ha: = 2.5; Ha: ≠ 2.5 ≠ 2.53. t = x - 3. t = x - /standard error /standard error 4. standard error = 4. standard error = ss /√n -> 0.54/ √20 /√n -> 0.54/ √20 = 0.12= 0.125. t = 2.45-2.50/0.12 = -0.425. t = 2.45-2.50/0.12 = -0.426.6. = 0.05 = 0.05, , df=19df=19, , one-tailedone-tailed t critical t critical valuevalue = -1.729= -1.7297. t = -0.42 > -1.729, outside the Ho 7. t = -0.42 > -1.729, outside the Ho rejection area; thus, Ho should not be rejection area; thus, Ho should not be rejected but retained -> rejected but retained -> = 2.5 = 2.5

8. 8. = 2.5, = 2.5, p p > .05> .05 means: means:a. a. The difference in the observed The difference in the observed

mean and hypothesized mean is very mean and hypothesized mean is very smallsmall (i.e., non-significant);(i.e., non-significant);

b. The probability for sampling error b. The probability for sampling error to account for this difference is to account for this difference is higher than 5%;higher than 5%;

c. The probability for Ho to be true is c. The probability for Ho to be true is larger than .05larger than .05

Confidence Interval (CI: Confidence Interval (CI: 信賴區間信賴區間 ): ): The estimation of The estimation of (a) CI: A range of values that we are (a) CI: A range of values that we are confident contains the population confident contains the population parameter (i.e., parameter (i.e., ]. ]. [b] CI = X± (t[b] CI = X± (tcvcv)*standard error)*standard error

[c] E.g.: A researcher hypothesizes [c] E.g.: A researcher hypothesizes that GPA of student athletes is less that GPA of student athletes is less than 2.5. To test this hypothesis, the than 2.5. To test this hypothesis, the researcher selects 20 subjects and researcher selects 20 subjects and find the GPA mean of this sample is find the GPA mean of this sample is 2.45, 2.45, ss = 0.54, = 0.54, ss22 = 0.29, = 0.29, is set as is set as 0.05 level. What is the CI?0.05 level. What is the CI?

Computation: Computation: 1. standard error = 1. standard error = ss /√n -> 0.54/ √20 /√n -> 0.54/ √20 = 0.12= 0.122. Critical Values of t when n = 20, 2. Critical Values of t when n = 20, df= 19, df= 19, = 0.05, two-tailed -> t = 0.05, two-tailed -> tcvcv= = 2.0932.0933. CI3. CI9595 = 2.45 ±[2.093]*0.12 = 2.45 ± = 2.45 ±[2.093]*0.12 = 2.45 ± 0.25 = (2.20, 2.70)0.25 = (2.20, 2.70)Interpretation: Interpretation: We are 95% confident that the We are 95% confident that the will will fall between 2.20 and 2.70. fall between 2.20 and 2.70.

Statistics’ A, B, & CStatistics’ A, B, & CA: In theory, an infinite number of samples A: In theory, an infinite number of samples

can be selected from a population.can be selected from a population. All the All the possible sample means will be normally possible sample means will be normally distributeddistributed..

B: The B: The mean mean of of all the possible sample all the possible sample means means is the is the hypothesized population hypothesized population meanmean. .

C: For a particular sample mean, there is a C: For a particular sample mean, there is a corresponding probability. corresponding probability. This probability This probability indicates the chance for the null indicates the chance for the null hypothesis to be truehypothesis to be true. .