selected nonparametric and parametric statistical … nonparametric and parametric statistical tests...

5
Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1 Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases 1 The T-statistic is used to test differences in the means of two groups. The grouping variable is categorical and data for the dependent variable is interval scaled. The following table shows alternative statistical techniques that can be used to analyze this type of data when different levels of measurement are available. The t-distribution was developed by W. S. Gosset (1908) 1 . As an employee of the Guinness brewery in Dublin, Gosset was not permitted to publish research findings in his own name, and hence penned the pseudonym "Student". The t-distri- bution, as it was first designated, has been known under a variety of names, including the Student's distribution and Student's t-distribution. The t-distribution revolutionized statistics and the ability to work with small samples. Prior to this time, statistical work was based largely on the value of z, which was used to designate a point on the normal distribution where population pa- rameters were known. The z value is the deviation of the sample mean from the mean of the population and is expressed in terms of variance within a normally distributed population. 2 The purpose of the z value is to express the amount of deviation between the sample mean and the population mean and to permit the making of inferences as to whether the sample mean belongs to the population in question. The mean and variance characteristics of the population (μ and σ), to which we desire to make inferences, are rarely known. This state of perfect knowledge is the assumption made by the z test and in actual use is difficult to justify. The t statistic does not require the population variance information needed for the z test, but instead uses the sample variance (and sample standard deviation). The t-distribution is symmetrical about the mean and is approximately normal. It is centered at the population mean of 0 and for large samples has a variance σ = 1. 1 November 1, 2011 Version: This tutorial is edited from the BIOMED statistical package t-test program, as developed under a National Science Founda- tion grant.

Upload: haanh

Post on 10-Mar-2018

240 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Selected Nonparametric and Parametric Statistical … Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1 Selected Nonparametric and Parametric Statistical Tests

Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1

Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases1

The T-statistic is used to test differences in the means of two groups. The grouping variable is categorical and data for the dependent variable is interval scaled. The following table shows alternative statistical techniques that can be used to analyze this type of data when different levels of measurement are available.

The t-distribution was developed by W. S. Gosset (1908)1. As an employee of the Guinness brewery in Dublin, Gosset was not permitted to publish research findings in his own name, and hence penned the pseudonym "Student". The t-distri-bution, as it was first designated, has been known under a variety of names, including the Student's distribution and Student's t-distribution.

The t-distribution revolutionized statistics and the ability to work with small samples. Prior to this time, statistical work was based largely on the value of z, which was used to designate a point on the normal distribution where population pa-rameters were known. The z value is the deviation of the sample mean from the mean of the population and is expressed in terms of variance within a normally distributed population.2

The purpose of the z value is to express the amount of deviation between the sample mean and the population mean and to permit the making of inferences as to whether the sample mean belongs to the population in question. The mean and variance characteristics of the population (μ and σ), to which we desire to make inferences, are rarely known. This state of perfect knowledge is the assumption made by the z test and in actual use is difficult to justify. The t statistic does not require the population variance information needed for the z test, but instead uses the sample variance (and sample standard deviation).

The t-distribution is symmetrical about the mean and is approximately normal. It is centered at the population mean of 0 and for large samples has a variance σ = 1.

1 November 1, 2011 Version: This tutorial is edited from the BIOMED statistical package t-test program, as developed under a National Science Founda-tion grant.

Page 2: Selected Nonparametric and Parametric Statistical … Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1 Selected Nonparametric and Parametric Statistical Tests

Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 2

The central limit theorem tells us that the sampling distribution of all possible sample means x ̅, approaches normality as the size of the samples increase. This is true, even when the population is not normally distributed. The t-distribution rapidly approaches the shape of a normal distribution as sample size increases. The t-distribution is considered normal when n=30.

The t-distribution is used to make inferences concerning the difference between the two populations μ1 and μ2. The spe-cific statistical theory used relates to the distribution of differences between the two sets of independent sample means, the sampling distribution of x1- x2.

MATHEMATICAL COMPUTATIONS FOR THE T-TEST

Statistical analysis programs compute t-statistics and associated probability levels for the equality of the means of two groups based on pooled and separate variance estimates. An F-statistic and associated probability level for the equality of group variances is also computed. Groups may be defined by specifying codes to be included. Several dependent vari-ables may often be analyzed concurrently. Paired comparison t-ratios may be obtained through the use of IF and RECODE commands (SPSS).

Typical t-test computations include:1. F-ratio of variance2. t-value (based on pooled variance estimate)3. t-value (based on separate variance estimate)4. Two-tailed probability levels for each t and for the F5. Means6. Standard deviations7. Standard error of the means8. Number of observations included in computing 5-7 above

In computing the variance, it is necessary to pool variance estimates when for the two groups, sample sizes are unequal and variances are unequal (Levine’s test or O’brien’s test). In case of pooling, we pool the point estimate by simple aver-ages. The pooled standard error is given by:

COMPUTATIONAL PROCEDURE

Each problem is divided into two groups: an X and a Y category. For each analysis, the number of non-missing observa-tions, the mean, standard deviation, and standard error are computed for each variable of each category. The t-values, F-values, and corresponding probability level for between-category comparison are computed for each variable.

ˉ ˉ

𝑆𝑆!  =   𝑛𝑛! − 1 𝑠𝑠!!  + 𝑛𝑛! − 1 𝑠𝑠!!

𝑛𝑛! +  𝑛𝑛! − 2    

 

Page 3: Selected Nonparametric and Parametric Statistical … Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1 Selected Nonparametric and Parametric Statistical Tests

Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 3

STEP 1 Xijk ,where i=1,2,…,n j=1,2,…,(p+q) k=1,2;1=x category;2=y category

STEP 2: Mean Computation

Variance

Standard Deviation

Standard error (of Mean)

Degrees of freedom based on pooled variance estimate:Dp= max (ni1-1,0)+max (ni2-1,0)

t based on pooled variance estimate:

X!" =  Σ  x𝑖𝑖𝑖𝑖𝑖𝑖n𝑗𝑗𝑗𝑗

 

𝑆𝑆!" = 𝑆𝑆!"!    

𝑆𝑆𝑆𝑆!" =  𝑆𝑆𝑗𝑗𝑗𝑗𝑛𝑛𝑗𝑗𝑗𝑗

 

𝐹𝐹! =  𝑆𝑆𝑗𝑗12

𝑆𝑆𝑗𝑗22      𝑖𝑖𝑖𝑖  𝑆𝑆𝑗𝑗12 ≥ 𝑆𝑆𝑗𝑗22    

𝐹𝐹! =  𝑆𝑆𝑗𝑗22

𝑆𝑆𝑗𝑗12      𝑖𝑖𝑖𝑖  𝑆𝑆𝑗𝑗12 < 𝑆𝑆𝑗𝑗22  

𝑡𝑡 =  X1 −  X2

  1𝑛𝑛1 +  1𝑛𝑛2    𝑆𝑆1

2   n1 − 1 + 𝑆𝑆22 n2 − 1n1 − 1+ n2 − 1  

 

 

Page 4: Selected Nonparametric and Parametric Statistical … Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1 Selected Nonparametric and Parametric Statistical Tests

Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 4

t based on separate variance estimate:

Degrees of freedom based on separate variance estimate:

• Incaseofpooling,thepooledpointestimateandpooledstandarderrorarecalculated.Incaseoftheseparate variance estimate, the difference between the two groups is used as the data. The mean and standard error of the difference are computed.

• F-valueistheratioofthemeansquareerror(meanofthesumofsquares)foreachofthetwogroups.

• TheP-valueisthesignificancelevelcorrespondingtothecomputedF-value.IfthisP-valueislessthanorequalto the significance level of the test, then we reject the null hypothesis.

• Thetwogroupsaredefinedbythecut-offvalueselected.Thecasesareassignedtothetwogroupsasperthecut- off value.

FootnotesStudent (1908), "The Probable Error of a Mean", Biometrica, 6:1.The normal probability distribution is defined by the equation:

and

𝑡𝑡 =  X! −  X!

 𝑆𝑆!!

𝑛𝑛!+  𝑆𝑆!

!

𝑛𝑛!      

 

 

𝐷𝐷   =  1

1𝑛𝑛!! − 1

∗   𝑆𝑆𝑆𝑆!!!

𝑆𝑆𝑆𝑆!!! +   𝑆𝑆𝑆𝑆!!

!

!

+   1𝑛𝑛!! − 1

∗   𝑆𝑆𝑆𝑆!!!

𝑆𝑆𝑆𝑆!!! +   𝑆𝑆𝑆𝑆!!

!

!  

 

𝑓𝑓 𝑥𝑥 =  12𝜎𝜎 𝑒𝑒

!(!!)[!!!! ]!   𝑠𝑠! =  

Σ(𝑥𝑥 − 𝑥𝑥)!

𝑛𝑛 − 1  

Page 5: Selected Nonparametric and Parametric Statistical … Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1 Selected Nonparametric and Parametric Statistical Tests

Selected Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 5

We’re Here to Help!

Qualtrics.com provides the most advanced online survey building, data col-lection (via panels or corporate / personal contacts), real-time view of survey results, and advanced “dashboard reporting tools”.

If you are interested in learning more about how the Qualtrics professional services team can help you with a conjoint analysis research project, contact us at [email protected].