Download - Theory of Decision
QTIA Statistical Applications
through SPSS
Statistical Applications through SPSS
S. Ali Raza Naqvi
Variables:
A quantity which changes its value time to time, place to place and person to person is called variable and if the
corresponding probabilities are attached with the values of variable then it is called a random variable.
For example
If we say x= 1 or x=7 or x=-6 then x is a variable but if a variable appears in the following way then it is known as
a random variable.
x P(x)
1 0.2
2 0.3
3 0.1
4 0.4
Population:
A large count or the whole count of the object related things is called population. There are two types of population
it may be finite or infinite. If the population elements are countable then it is known as finite population but if the
population elements are uncountable then it is called an infinite population.
For example:
Population of MBA students at IUGC (Finite Population)
Population of the University teachers in Pakistan (Finite Population)
Population of trees (Infinite Population)
Population of sea life (Infinite Population)
The population is also categorized in two ways.
1. Homogeneous population
2. Heterogeneous population
Homogeneous Population:
If all the population elements have the same properties then the population is known as homogeneous population.
Quantitative Techniques in Analysis Page 2
Statistical Applications through SPSS
S. Ali Raza Naqvi
For example: Population of shops, Population of houses, Population of boys, Population of rice in a box etc.
Heterogeneous Population:
If all the population elements do not have the same properties then the population is known as homogeneous
population.
For example: Population of MBA students (Male and Female), Population of plants, etc.
Parameter:
A constant computed from the population or a population characteristic is known as parameter.
For Example:
Population Mean µ, Population standard deviation δ, coefficient of skewness and kurtosis for the population.
Statistic:
A constant computed from the Sample or a sample characteristic is known as parameter.
For Example:
Sample mean , sample standard deviation s, coefficient of skewness and kurtosis for the sample.
Estimator:
A Sample statistic used to estimate the population parameter is known as estimator.
For Example:
Sample mean is used to estimate the population mean. So sample mean is also called an estimator of population
mean.
Sample variance is used to estimate the population variance. So sample variance is also called an estimator of
population variance.
Hypothesis:
An assumption about the population parameter tested on the basis of sample information is called hypothesis or
hypothesis testing.
These assumptions are established in the way that we generate two alternative statements say “null and alternative
hypothesis” in such a manner if one statement is found wrong automatically other one is selected as correct
statement.
Quantitative Techniques in Analysis Page 3
Statistical Applications through SPSS
S. Ali Raza Naqvi
Types of Hypothesis:
1) Null Hypothesis:
A Statement or the first think about the parameter value is called a null hypothesis. But statistically we can say that
a null hypothesis is a statement should consist equality sign such as:
H0: µ = µ0
H0: µ ≤ µ0
H0: µ ≥ µ0
As it is clear from above statements there are two types of null hypothesis.
1- Simple null hypothesis
2- Composite null hypothesis
1-Simple Null Hypothesis:
If a null hypothesis is based on the single value (or it consist of only equal sign) then the null hypothesis is called a
simple null hypothesis
For Example H0: µ = µ0
Phrases
Average rain fall in United States of America during 1999 was 200 mm.
The average concentrations of two substances are same
The IQ level of MBA and BBA students are same.
IQ level is independent from education level.
2-Composite Null Hypothesis:
If a null hypothesis is based on the interval of the parameter value (or it consist of less then or greater then sign with
equal sign) then the null hypothesis is called a Composite null hypothesis
For Example H0: µ ≤ µ0
H0: µ ≥ µ0
Phrases
Quantitative Techniques in Analysis Page 4
Statistical Applications through SPSS
S. Ali Raza Naqvi
The mean height of BBA students are at most 70 inches
The performance of PHD students is at most same as MBA students
Variability in a data set must be positive (Greater or equal to zero)
2) Alternative Hypothesis:
An Automatically generated statement against the established null hypothesis is called an alternative hypothesis.
For Example:
Null Hypothesis Alternatives Hypothesis
H0: µ = µ0 (H1:µ ≠ µ0, H1:µ > µ0, H1: µ < µ0)
H0: µ ≤ µ0 (H1:µ ≠ µ0, H1:µ > µ0, H1: µ < µ0)
H0: µ ≥ µ0 (H1:µ ≠ µ0, H1:µ > µ0, H1: µ < µ0)
It is clear from the above stated alternatives that there are two different types of alternatives.
1- One tailed or One sided alternative hypothesis
2- Two tailed or two sided alternative hypothesis
1-One tailed Alternative Hypothesis:
If an alternative is based on either the greater then (>) or a less then (<) sign in the statement then the alternative
hypothesis is known as the one tailed hypothesis.
For Example: H1:µ > µ0, Or H1: µ < µ0
Phrases
Average rain fall in Pakistan is more then from average rain fall in Jakarta.
Inzamam is more consistent player then Shaid Afridi.
Waseem Akram is a better bowler then McGrath.
Gold prices are dependent on oil prices.
2-Two tailed Alternative Hypothesis:
If an alternative is based on only an unequal (≠) sign in the statement then the alternative hypothesis is known as the
two tailed hypothesis.
For Example: H1: µ ≠ µ0,
Phrases
Quantitative Techniques in Analysis Page 5
Statistical Applications through SPSS
S. Ali Raza Naqvi
The Concentration of two substances is not same.
There is a significant difference between the wheat production of Sind and Punjab.
The consistency of KSE and SSE is not same.
In this type of alternatives the total chance of type I error remain in only one side of the normal curve
In this type of alternatives the total chance of type I error is divided in two sides of the normal curve
Probabilities Associated with Decisions:
Ho is True Ho is False
Accept HoCorrect Decision
1-β
False Decision
Type II Error
β
Reject Ho
False Decision
Type I Error
α
Correct Decision
1-α
Quantitative Techniques in Analysis Page 6
Statistical Applications through SPSS
S. Ali Raza Naqvi
It is clear from the above figures that both the errors can not be minimized at the same time. An increase is observed
in the type II error when type I error is minimized.
P- Value:
It is the minimum value of alpha “α” which is needed to reject a true null hypothesis. As it is the value of “α” so it
can be explain as the minimum value of type I error which is associated with a hypothesis while it is testing.
Therefore, it is used in two ways, one in decision making and the other to determine the probability of type I error
associated with the testing.
Decision Rule on the basis of p - value:
Reject Ho if p – value < 0.05
Accept Ho if p – value ≥ 0.05
For example:
If the p-value for any test appears 0.01. It is indicating that our null hypothesis is to be rejected and there is only 1%
chance of rejecting a true null hypothesis. That further can explain as we are 99% confident in rejection of the null
hypothesis. Or we can say that we can reject our this null hypothesis up to α = 1% or 99% confidence level
If the p-value for any test appears 0.21. It is indicating that our null hypothesis is to be accepted and there is 21%
chance of rejecting a true null hypothesis. That further can explain as we are 79% confident in our decision and
rejection of the null hypothesis. Or we can say that our this true null hypothesis may be rejected at α = 21%.
Quantitative Techniques in Analysis Page 7
True Population
Other Population
Statistical Applications through SPSS
S. Ali Raza Naqvi
T-test:-
A t-test is a statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.
Uses of T-test:-
Among the most frequently used t tests are:
A test of whether the mean of a normally distributed population has a value specified in a null hypothesis.
A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points. We can use some kind of t-test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. There are different versions of the t- test depending on whether the two samples are
o Unpaired, independent of each other (e.g., individuals randomly assigned into two groups, measured after an intervention and compared with the other group), or
o Paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention.
Interpretation of the results:-
If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis which usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.
A test of whether the slope of a regression line differs significantly from 0.
Statistical Analysis of the t-test:-
The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-to-noise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group
Quantitative Techniques in Analysis Page 8
Statistical Applications through SPSS
S. Ali Raza Naqvi
difference. Figure shows the formula for the t-test and how the numerator and denominator are related to the distributions.
The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square rootThe t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value we have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, we need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred we would find a statistically significant difference between the means even if there was none (i.e., by "chance"). We also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, we can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, we can conclude that the difference between the means for the two groups is different (even given the variability.
Calculations:-
a) Independent one-sample t -test
In testing the null hypothesis that the population means is equal to a specified value μ0, one uses the statistic.
Quantitative Techniques in Analysis Page 9
Statistical Applications through SPSS
S. Ali Raza Naqvi
Where “s” is the sample standard deviation of the sample and “n” is the sample size. The degrees of freedom used in this test is “n – 1”.
b) Independent two-sample t -test:-
A) Equal sample sizes, equal variance
This test is only used when both:
the two sample sizes (that is, the n or number of participants of each group) are equal;
It can be assumed that the two distributions have the same variance.
Violations of these assumptions are discussed below.
The t statistic to test whether the means are different can be calculated as follows:
Where;
Here “ ” is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 = group two. The denominator of “t” is the standard error of the difference between two means.
For significance testing, the degrees of freedom for this test is “n1 + n2 − 2” where n1 = # of participants of group # “1” and “n2”= # of participants of group # “2”
B) Unequal sample sizes, unequal variance
Quantitative Techniques in Analysis Page 10
Statistical Applications through SPSS
S. Ali Raza Naqvi
This test is used only when the two sample sizes are unequal and the variance is assumed to be different. See also Welch's t test. The t statistic to test whether the means are different can be calculated as follows:
Where n1 = number of participants of group “1” and n2 is number of participants group two.
In this case, variance is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as being an ordinary Student's t distribution with the degrees of freedom calculated using
This is called the Welch-Satterthwaite equation. Note that the true distribution of the test statistic actually depends (slightly) on the two unknown variances.
This test can be used as either a one-tailed or two-tailed test.
c) Dependent t-test for paired samples:-
This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired".
For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The average (XD) and standard deviation (sD) of those differences are used in the equation. The constant μ0 is non-zero if you want to test whether the average of the difference is significantly different than μ0. The degree of freedom used is “N – 1”.
Example # 01
Quantitative Techniques in Analysis Page 11
Statistical Applications through SPSS
S. Ali Raza Naqvi
Analysis through SPSS:-
A) One-sample t-test:-
SPSS need:-
1) The data should be in the form of numerical (i.e the numerical variable)2) A test value which is our hypothetical value to which we are going to test.
To analyze the “one-sample t-test” I have use the employees’ salaries of an organization. For this purpose, I have select the sample of “474” employees of the company.
The hypotheses are:
a) The null hypothesis states that the average salary of the employee is equal to “30,000”.
H0 : 30,000
b) The alternative hypothesis states that the average salary of the employee is not equal to “30,000”.
HA: 30,000
Method:
Enter the data in the data editor and the variable is labeled as employee's current salary. Now click on Analyze which will produce a drop down menu, choose Compare means from that and click on one-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select a variable, which is to be computed. The variable computed in our case is Current salaries of the employees. The variables can be selected for analysis by transferring them to the test variable box. Next, change the value in the test value box, which originally appears as 0, to the one against which you are testing the sample mean. In this case, this value would be 35000. Now click on OK to run the analysis.
Pictorial Representation
Analyze Compare Means One-Sample T Test Drag Test Variable (Scale) Give Test Value OK
Quantitative Techniques in Analysis Page 12
Statistical Applications through SPSS
S. Ali Raza Naqvi
SPSS output:-
One-Sample Statistics
N Mean Std. DeviationStd. Error
Mean
Current Salary 474 $34,419.57 $17,075.661 $784.311
Interpretation:-
Quantitative Techniques in Analysis Page 13
Statistical Applications through SPSS
S. Ali Raza Naqvi
In above table “N” shows the total number of observation. The average salary of total employees is “34,419.57”. The standard deviation of the data is “17,075.661”and the standard error of the mean is “784.311”.
One-Sample Test
Test Value = 30000
t df Sig. (2-tailed)Mean
Difference
95% Confidence Interval of the Difference
Lower Upper
Current Salary 5.635 473 .000 $4,419.568 $2,878.40 $5,960.73
Interpretation:-
Through above table we can observe that,
i) “T” value is positive which show that our estimated mean value is greater than actual value of mean.
ii) Degree of freedom is (N – 1) = 473.
iii) The “P-value” is “0.000” which is less than “0.05”.
iv) The difference between the estimated & actual mean is “4,419.568”.
v) Confidence interval has the lower & upper limit 2,878.4 & 5,960.73 respectively. The confidence interval limits does not contains zero.
Decision:-
On the basis of following observation I reject my “Null hypothesis” and accept the “Alternative hypothesis”. I am almost “100%” sure on my decision.
i) The “P-value” is “0.000” which is less than “0.05”.ii) The confidence interval limits does not contains zero.
Comments:-
The average salary of employees is not equal to “30,000”.
Example # 02
B) Independent t-test:-
SPSS need:-
Quantitative Techniques in Analysis Page 14
Statistical Applications through SPSS
S. Ali Raza Naqvi
1) Two variable are required one should be numerical and other should be categorical with two levels.
To analyze the “independent t-test” I have use the employees’ salaries of an organization. For this purpose, I have select the sample of “474” employees of the company containing the both males and females. In my analysis I assigned males as “m” and female as “f”.
The hypotheses are:
a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee.
H0 :
i.e
b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee.
HA :
i.e
Method:
Enter the data in the data editor and the variables are labeled as employee's beginning salary and employee's designations respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Beginning salary of the employees is the dependent variable to be analyzed and should be transferred into test variable box by clicking on the first arrow in the middle of the two boxes. Job category is the variable which will identify the groups of the employees and it should be transferred into the grouping variable box.
Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents the employees belong to clerical category and group2 represents the employees belong to the custodial category. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.
Pictorial Representation
Analyze Compare Means Independent-Samples T Test Drag Test & Grouping Variable Define Groups OK
Quantitative Techniques in Analysis Page 15
Statistical Applications through SPSS
S. Ali Raza Naqvi
SPSS output:-
Quantitative Techniques in Analysis Page 16
Statistical Applications through SPSS
S. Ali Raza Naqvi
Group Statistics
Gender N Mean Std. Deviation Std. Error Mean
Current Salary Male
Female
258
216
$41,441.78
$26,031.92
$19,499.214
$7,558.021
$1,213.968
$514.258
Interpretation:-
Through above table we can observe that,
i) Total number of male is “258” and the female is “216”.ii) The mean value of salaries of male employee is 41,441.78 & the female employee is
26,031.92.
iii) Standard deviation of salaries of male employee is 19,449.214 & the female employee is 7,558.021.
iv) Standard error of mean of salaries of male employees is 1,213.968 & the Standard error of mean of salaries of female employees is 514.258.
Independent Samples Test
Current Salary
Levene's Test for Equality of Variances
t-test for Equality of Means
F Sig. t dfSig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval of the Difference
Lower Upper
Equal variances assumed
Equal variances not assumed
119.669 .000
10.945
11.688
472
344.262
.000
.000
$15,409.862
$15,409.862
$1,407.906
$1,318.400
$12,643.322
$12,816.728
$18,176.401
$18,002.996
Interpretation:-
In above table we have two parts (a) f-test, (b) t-test, through which we can observe that,
i) “F” value is “119.669” with significant value of “0.00” which is less than “0.05”.ii) On the basis of P-value of F-test part we assume that that the variance of the two
populations is not equal.
iii) “T” value is positive which show that the mean value of salaries of male employees is greater than the mean value of salaries of female employees
Quantitative Techniques in Analysis Page 17
Statistical Applications through SPSS
S. Ali Raza Naqvi
iv) Degree of freedom is “344.262”.
v) The “P-value” is “0.000” which is less than “0.05”.
vi) The difference between the two population mean is “15,409.862”.
vii) The standard error difference between the two population mean is “1,318.400”.
viii) Confidence interval has the lower & upper limit “12,816.728” & “18,002.996” respectively. The confidence interval limits does not contains zero.
Decision:-
On the basis of following observation I reject my “Null hypothesis” and accept the “Alternative hypothesis”. I am almost “100%” sure on my decision.
i) The “P-value” is “0.000” which is less than “0.05”.ii) The confidence interval limits does not contains zero.
Comments:-
The average salaries of male & female employees are not equal.
Example # 03
C) Paired t-test:-
SPSS need:-
1) Two numerical variables are required which should be equal in numbers.
To analyze the “paired t-test” I used the begging & ending salaries of the employees’ of an organization. For this purpose, I have select the sample of “474” employees of the organization.
The hypotheses are:
a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee.
H0 :
Quantitative Techniques in Analysis Page 18
Statistical Applications through SPSS
S. Ali Raza Naqvi
i.e 0
b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee.
HA :
i.e 0
Method:
Enter the data in the data editor and the variables are labeled as employee's current and beginning salary respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on Paired-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select variables, which are to be computed. The two variables computed in our case are Current and Beginning salaries. Select these together and they will immediately appear in the box at the bottom labeled current selection. They are simultaneously highlighted in the box in which they originally appeared. Once the variables are selected the arrow at the center becomes active. The variables can be transferred to the Paired-Variables box by clicking on this arrow. They will appear in the box as Current-Beginning. Now click on OK to run the analysis.
Pictorial Representation
Analyze Compare Means Paired-Samples T Test Drag Paired Variables (Scale) OK
Quantitative Techniques in Analysis Page 19
Statistical Applications through SPSS
S. Ali Raza Naqvi
SPSS output:-
Paired Samples Statistics
Mean N Std. DeviationStd. Error
MeanPair 1 Current Salary $34,419.57 474 $17,075.661 $784.311
Beginning Salary $17,016.09 474 $7,870.638 $361.510
Interpretation:-
Through above table we can observe that,
i) The mean vale of current & beginning salary is “34,419.57” & “17,016.09” respectively.
ii) Total number of both groups is “474” individually.
iii) The standard deviation of current & beginning salary is “17,075.661” & “7,870.638” respectively.
iv) The standard error mean of current & beginning salary is “784.331” & “361.510” respectively.
Paired Samples Correlations
N Correlation Sig.Pair 1 Current Salary &
Beginning Salary 474 .880 .000
Interpretation:-
Through above table we can observe that,
i) The total number of pair is “474”.ii) “0.88” show that the both values of group are highly co-related, which indicate that
the employees who has greater begging salary has also greater current salary.
iii) The P-value is “0.00” which is less than “0.05”.
Paired Samples Test
t df Sig. (2-tailed)Mean
Std. Deviation
Std. Error Mean
95% Confidence Interval of the Difference
Lower Upper
Quantitative Techniques in Analysis Page 20
Analyze
Statistical Applications through SPSS
S. Ali Raza Naqvi
Pair 1 Current Salary - Beginning Salary
$17,403.481 $10,814.620 $496.732 $16,427.407 $18,379.555 35.036 473 .000
Interpretation:-
In above table we have two parts (a) f-test, (b) t-test, through which we can observe that,
i) The mean value of pair is “17,403.481”.ii) The standard deviation of pair is “10,814.620”.
iii) The standard error mean of pair is “496.732”.
iv) Confidence interval has the lower & upper limit “16,427.407” & “18,379.555” respectively. The confidence interval limits does not contains zero.
v) T- Value is “35.036”.
vi) Degree of freedom is (N-1) = “473”.
vii) P-vale is “0.00” which is less than “0.05”.
Decision:-
On the basis of following observation I reject my “Null hypothesis” and accept the “Alternative hypothesis”. I am almost “100%” sure on my decision.
iii) The “P-value” is “0.000” which is less than “0.05”.iv) The confidence interval limits does not contains zero.
Comments:-
The mean difference of the two paired variables i.e. current and beginning salary is significant or not same.
One-Way ANOVA
ANOVA is a commonly used statistical method for making simultaneous comparisons between two or more population means, that yield values that can be tested to determine whether a significant relation exist between variables or not. Its simplest form is One-Way ANOVA, it involves only one dependent variable and one or more independent variables.
Data Source:C:\SPSSEVAL\Employee Data
Quantitative Techniques in Analysis Page 21
Statistical Applications through SPSS
S. Ali Raza Naqvi
Variables: Here we analyze two different variables by One-Way ANOVA, i.e.
A) Current salary of the employees.B) Employment Category.
Hypothesis:
H0: µ1 = µ2 = µ3
HA: at least one mean is not equal.
SPSS Need: SPSS need two types of variables for analyzing one-way ANOVA.
Numerical Variable (Scale). Categorical Variable (with more than two categories).
Method:
First of all enter the data in the data editor and the variables are labeled as employee's current salary and employment category respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on One-Way ANOVA, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform one-way ANOVA, transfer the dependent variable into the box labeled Dependent List and all factoring variable into the box labeled Factor. In our case Current salary is the dependent variable and should be transferred to the dependent list box by clicking on the first arrow in the middle of the two boxes. Employment Category is the factoring variable and should be transferred to the factor box by clicking on the second arrow and then click OK to run the analysis.
If the null hypothesis is rejected, ANOVA only tells us that all population means are not equal. Multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.
Pictorial Representation
Analyze Compare Means One-Way ANOVA Drag Dependent List & Factors Post Hoc (Optional) OK
Quantitative Techniques in Analysis Page 22
Statistical Applications through SPSS
S. Ali Raza Naqvi
Output:
The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled between groups gives the variability due to the different designations of the employees (known reasons). The second row labeled within groups gives the variability due to random error (unknown reasons), and the third row gives the total variability. In this case, F-value is 434.481, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis and conclude that the average salary of the employees is not the same in all three categories.
Post Hoc Tests
Quantitative Techniques in Analysis Page 23
Multiple Comparisons
Dependent Variable: Current Salary
LSD
-$3,100.349 $2,023.760 .126 -$7,077.06 $876.37
-$36,139.258* $1,228.352 .000 -$38,552.99 -$33,725.53
$3,100.349 $2,023.760 .126 -$876.37 $7,077.06
-$33,038.909* $2,244.409 .000 -$37,449.20 -$28,628.62
$36,139.258* $1,228.352 .000 $33,725.53 $38,552.99
$33,038.909* $2,244.409 .000 $28,628.62 $37,449.20
(J) Employment CategoryCustodial
Manager
Clerical
Manager
Clerical
Custodial
(I) Employment CategoryClerical
Custodial
Manager
Mean Difference(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.*.
ANOVA
Current Salary
89438483925.943 2 44719241962.972 434.481 .000
48478011510.397 471 102925714.459
137916495436.340 473
Between Groups
Within Groups
Total
Sum of Squares df Mean Square F Sig.
Statistical Applications through SPSS
S. Ali Raza Naqvi
The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The p-value for Clerical – Manager and Custodial – Manager comparison is shown as 0.000, whereas it is 0.126 for Clerical – Custodial comparison. This means that the average current salary of the employees between Clerical and Manager as well as Manager and Custodial are significantly different, whereas the same is not significantly different between Clerical and Custodial.
Conclusion: As our null hypothesis is rejected and we conclude that all three means are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of managers is significantly different from other two means whereas the other two means are insignificant with each other.
Two-Way ANOVA
In two-Way Analysis, we have two independent variables or known factors and we are interested in knowing their effect on the same dependent variable.
Data Source:C:\SPSSEVAL\Carpet
Variables: Here we analyze two different categorical variables with a numerical variable by Two-Way ANOVA, i.e.
A) Preference (Numerical)B) Package design (Categorical)C) Brand (Categorical)
Hypothesis:
For Brand: H0: µi = µj
HA: µi ≠ µj Ұ for all i & j
For Package: H0': µi = µj
HA': µi ≠ µj Ұ for all i & j
SPSS Need:SPSS need two types of variables for analyzing two-way ANOVA.
Numerical Variable (Scale). Two categorical Variables (with more than two levels).
Method:
First of all enter the data in the data editor and the variables are labeled as Preference, brand, package design respectively. Click on Analyze which will produce a drop down menu, choose General Linear model from that and click on Univariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform two-way ANOVA, transfer the dependent variable (Preference) into the box labeled Dependent variable and factor variable (Brand & Package) into the box labeled Fixed Factor. After defining all variables, now click on OK to run the analysis.
Quantitative Techniques in Analysis Page 24
Statistical Applications through SPSS
S. Ali Raza Naqvi
If the null hypothesis is rejected, multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.
Pictorial Representation
Analyze General Linear Model UnivariateDrag Dependent Variable & Fixed Factors Post Hoc OK
Quantitative Techniques in Analysis Page 25
Statistical Applications through SPSS
S. Ali Raza Naqvi
Output:
Between-Subjects Factors
A* 9
B* 6
C* 7
K2R 7
Glory 7
Bissell 8
1.00
2.00
3.00
Packagedesign
1.00
2.00
3.00
Brandname
Value Label N
This table shows the value label under each category and the frequency of each value label. We have totaled 6 value labels under package design and brand name.
Tests of Between-Subjects Effects
Dependent Variable: Preference
a
537.231 2 268.616 16.883 .000
36.108 2 18.054 1.135 .351
206.833 13 15.910
3758.000 22
Source
package
brand
Error
Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .763 (Adjusted R Squared = .617)a.
The above table gives the test results for the analysis of two-way ANOVA. The results are given in four rows.
The first row labeled package gives the variability due to the different package design of the carpets, which may affect the customer's preferences (known reason).
The second row labeled brand gives the variability due to the different brand names (known reason).
The third row labeled error gives the variability due to random error, which also affects the customer's preferences (unknown reasons).
The fourth row gives the total variability in the customer's preferences due to both known and unknown reasons.
Quantitative Techniques in Analysis Page 26
Statistical Applications through SPSS
S. Ali Raza Naqvi
In this case, F-value for package design is 16.883, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis for package design and conclude that the average preference for all packages is not same.
Now the F-value for brand name is 1.135, and the corresponding p-value is greater than 0.05. So we can accept the null hypothesis for brand and conclude that all average brand preferences are found approximately same.
Post Hoc Tests
Package design
As our null hypothesis for package design is rejected, so multiple comparisons are used to assess that which group mean is different from the others. The above table gives the results for multiple comparisons between each value label under package design category.
The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The p-value for A* – B* and A* – C* comparison is shown as 0.000, whereas it is 0.322 for B* – C* comparison. This means that the average preference for package design between A* and B* as well as A* and C* are significantly different, whereas the same is not significantly different between B* & C*.
Conclusion: As our null hypothesis for package design is rejected and we conclude that all mean preferences for package design are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of A* is significantly different from other two means whereas the other two means are insignificant with each other.
But in the case of brand name, our null hypothesis is accepted and we conclude that all mean brand preferences are same. So there is no need for multiple comparisons in the case of brand.
Chi-Square Test
Chi-square test is a test which is commonly used to test the hypothesis regarding;
Goodness of fit test Test for Association / Independence / Attributes
It is denoted by "χ2" and its degree of freedom is "n-1", where
Quantitative Techniques in Analysis Page 27
Multiple Comparisons
Dependent Variable: Preference
LSD
11.5556* 2.10226 .000 7.0139 16.0972
9.2698* 2.01015 .000 4.9272 13.6125
-11.5556* 2.10226 .000 -16.0972 -7.0139
-2.2857 2.21914 .322 -7.0799 2.5085
-9.2698* 2.01015 .000 -13.6125 -4.9272
2.2857 2.21914 .322 -2.5085 7.0799
(J) Package designB*
C*
A*
C*
A*
B*
(I) Package designA*
B*
C*
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
The mean difference is significant at the .05 level.*.
Statistical Applications through SPSS
S. Ali Raza Naqvi
n = Number of categories
It is a positively skewed distribution so that, it has one tailed critical region on the right tail of the curve and the value of χ2 is always positive.
Chi-Square Goodness of Fit Test
Chi-Square goodness of fit test is used when the distribution is non-normal and the sample size is less than 30, so the chi-square goodness of fit test determines whether the distribution follows uniform distribution or not.
Data Source:C:\SPSSEVAL\Carpet
Variables: Here we are interested to analyze a numerical variable i.e.
Price (Numerical)
Hypothesis:
H0: Fit is good. (Data follows Uniform Distribution/ Prices are Uniform)HA: Fit is not good. (Data does not follow Uniform Distribution/ Prices are not uniform)
SPSS Need: SPSS need a categorical variable or a numerical variable for analyzing Chi-Square goodness of fit test.
Graphical Representation:
Quantitative Techniques in Analysis Page 283.503.002.502.001.501.000.50
Price
8
6
4
2
0
Fre
qu
en
cy
Mean = 2.00Std. Dev. = 0.87287N = 22
Statistical Applications through SPSS
S. Ali Raza Naqvi
Explanation of Graph
From the above graph we see that our numerical variable (price) is on x-axis and its frequency on the y-axis. The mean and standard deviation of 22 observations are 2.00 and 0.87287 respectively.
The above graph clearly shows that the selected numerical variable i.e. price does not follow a normal distribution, so we use chi-square goodness of fit test to determine if the sample under investigation has been drawn from a population, which follows some specified distribution.
Method:
First of all enter the data in the data editor and the variables are labeled as price. Click on Analyze which will produce a drop down menu, choose non-parametric test from that and click on Chi-square test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to analyze. When you select the test variable, the arrow between the two boxes will now active and you can transfer the variable on the box labeled test variable list by clicking on the arrow. In this case our test variable is price and it should be transferred to the test variable box. You also click on the options button, if you are interested to know the descriptive statistics of the tested variable. Now click on OK to run the analysis.
Pictorial Representation
Analyze Non-parametric test Chi-square Define test Variable list
OK
Quantitative Techniques in Analysis Page 29
Statistical Applications through SPSS
S. Ali Raza Naqvi
Quantitative Techniques in Analysis Page 30
Statistical Applications through SPSS
S. Ali Raza Naqvi
Output
Price
8 7.3 .7
6 7.3 -1.3
8 7.3 .7
22
$1.19
$1.39
$1.59
Total
Observed N Expected N Residual
First column of the above table shows the three categories in price variable.
The column labeled Observed N gives the actual number of cases falling in different categories of test variable, which is directly obtained from the data given.
The column labeled Expected N gives the expected number of cases that should fall in each category of the test variable.
The column labeled Residual gives the difference between observed and expected frequencies of each category, and it is commonly known as Error.
Test Statistics
.364
2
.834
Chi-Square a
df
Asymp. Sig.
Price
0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 7.3.
a.
Quantitative Techniques in Analysis Page 31
Statistical Applications through SPSS
S. Ali Raza Naqvi
The above table gives the test results for Chi-Square Goodness of Fit Test. In this case the chi-square value is 0.364 with a degree of freedom 2. The p-value for the test is shown as 0.834 which is greater than 0.05, so we can accept our null hypothesis that Fit is good.
Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that our null hypothesis is correct and our test variable (price) follows a uniform distribution, and we are 16.6% confident in our decision and the rejection of the null hypothesis.
Chi-Square Test for Independence
Chi-Square test for independence is used to test the hypothesis that two categorical variables are independent of each other. A small chi-square statistics indicates that the null hypothesis is correct and that the two variables are independent of each other.
Data Source:C:\SPSSEVAL\Employee Data
Variables: Here we analyze two different categorical variables i.e.
A) Gender of the employees (Categorical)B) Designation of the employees (Categorical)
Hypothesis:
H0: Designation is independent of Sex.HA: Designation is not independent of Sex.
SPSS Need:SPSS need two categorical variables for analyzing Chi-Square test for independence.
Method:
First of all enter the data in the data editor and the variables are labeled as Gender, Designation, respectively. Click on Analyze which will produce a drop down menu, choose Descriptive Statistics from that and click on Crosstabs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to create the row of your contingency table and transfer it to the box labeled Row(s), transfer the other variable to the box labeled Column(s). In this case we transfer gender to the box labeled Row(s) and designation to the box labeled column(s). Next, click on the Statistics button, which brings up a dialogue box. Here Tick the first box labeled Chi-Square and click continue to return to the previous screen. Click on OK to run the analysis.
Pictorial Representation
Analyze Descriptive Statistics Crosstabs Drag Row and Column Variables
Tick Chi-Square OK
Quantitative Techniques in Analysis Page 32
Statistical Applications through SPSS
S. Ali Raza Naqvi
Quantitative Techniques in Analysis Page 33
Statistical Applications through SPSS
S. Ali Raza Naqvi
Output
Gender * Employment Category Crosstabulation
Count
206 0 10 216
157 27 74 258
363 27 84 474
Female
Male
Gender
Total
Clerical Custodial Manager
Employment Category
Total
Cross tabulation is used to examine the variation in the categorical data, it is a cross measuring analysis. Above we are cross examine the gender and designation of the employees.
We take designation of the employees in the column and gender of the employees in row, and we have totaled 474 observations.
The results are given in two rows; the first row shows the number of female employees in each employment category.
The second row shows the number of male employees in each employment category.
Chi-Square Tests
79.277a 2 .000
474
Pearson Chi-Square
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)
0 cells (.0%) have expected count less than 5. Theminimum expected count is 12.30.
a.
The above table gives the test results for the chi-square test for independence. The first row labeled Pearson Chi-Square shows that the value of χ2 is 79.277 with 2 degree of freedom. The two-tailed p-value is shown as 0.000, which is less than 0.05, so we can reject our null hypothesis and conclude that the Designation is not independent of Sex.
Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that the designation of the employees is not independent of the sex, and we are almost 100% confident on our decision and the rejection of the null hypothesis.
Second Approach
Consider a case in which the data is not available and only the table labeled as Gender*Employment Category Crosstabulation in the above output, is given. On the basis of the output table you can easily find out the same result as above by using SPSS weight cases options. Below we briefly explain that how to enter the data on the basis of the table and to find out the desired results.
Method
Quantitative Techniques in Analysis Page 34
Statistical Applications through SPSS
S. Ali Raza Naqvi
First of all, in the variable view of SPSS define three variables and labeled them as Gender, Employment Category and Value. Now on the data view of the SPSS, enter the data in a different manner. We see that the table contains two rows and 3 columns. In row we have two categories i.e. Female and Male; similarly in columns we have three categories i.e. Clerical, Custodial, and Manager. Now the female and male employees, both are fall in the three employment category.
So in the data view we simply define the row data i.e. Gender and its opposite define the column data i.e. Employment category and its corresponding frequencies in the Value column. The resulted data view is shown in the picture below.
.
After defining the data just click on Data, which will produce a drop down menu, choose weight cases from that, a dialogue box appears in which all the variables are on the left hand side of that box. Tick weight cases by and drag value in the box labeled Frequency Variable by clicking on the arrow between the two boxes. Now click OK to return to the previous window.
The Further process is same as described above. Just define Gender in row and Employment category in Column. Tick Chi-square by clicking on the Statistics button. Now click OK to run the analysis. When the output appears, you will see that SPSS will give you the same result as we find out earlier through data.
Regression Analysis
Regression is the relationship between selected values of independent variable and observed values of dependent variable, from which the most probable value of dependent variable can be predicted for any value of independent variable.
The use of regression to make quantitative predictions of one variable from the values of another variable is called regression analysis. There are following several types of regression, which may be used by the researcher.
Linear regression Multiple linear regression
Quantitative Techniques in Analysis Page 35
Statistical Applications through SPSS
S. Ali Raza Naqvi
Quadratic / Curvilinear regression Logistic / Binary logistic regression Multivariate logistic regression
Linear Regression
When one dependent variable depends on single independent variable then their dependency called linear regression and its model is given by
y = a + bx
Where, y is a depending variable x is a independent variable a is called the regression constant b is called the regression coefficient
Regression Coefficient
Regression coefficient is a measure of how strongly the independent variable predicts the dependent variable. There are two types of regression coefficient.
Un-standardized coefficients Standardized coefficients commonly known as Beta.
The un-standardized coefficients can be used in the equation as coefficients of different independent variables along with the constant term to predict the value of dependent variable.
The standardized coefficient is, however, measured, in standard deviations. The beta value of 2 associated with a particular independent variable indicates that a change of 1 standard deviation in that particular independent variable will result in change of 2 standard deviations in the dependent variable.
Data Source:C:\SPSSEVAL\Employee Data
Variables: Here we are interested to analyze two numerical variables i.e.
Current salary (Numerical) Beginning salary (Numerical)
Hypothesis:
H0: Regression coefficient is zero.HA: Regression coefficient is not zero.
Quantitative Techniques in Analysis Page 36
Statistical Applications through SPSS
S. Ali Raza Naqvi
SPSS Need:SPSS need two numerical variables and both should be scaled.
Method:
The given data is entered in the data editor and the variables are labeled as current salary and beginning salary. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Linear, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent. Transfer the independent variable into the box labeled Independent(s). In our case, current salary is a dependent variable and beginning salary is an independent variable. Next we have to select the method for analysis in the box labeled Method. SPSS gives five options here: Enter, Stepwise, Remove, Forward, and Backward. In the absence of a strong theoretical reason for using a particular method, Enter should be used. The box labeled Selection variable is used if we want to restrict the analysis to cases satisfying particular selection criteria. The box labeled Case labels is used for designating a variable to identify points on plots.
After making the appropriate selections click on Statistics button. This will produce a dialogue box labeled Linear regression: Statistics. Tick against the statistics you want in the output. The Estimates option gives the estimate of regression coefficient. The Model fit option gives the fit indices for the overall model. Besides these the R-Squared change option is used to get the incremental R-square value when the models change. Other options are not commonly used. Click on the Continue button to return to the main dialogue box.
The Plots button in the main dialogue box may be used for producing histograms and normal probability plots of residual. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression.
Now click on OK in the main dialogue box to run the analysis.
Pictorial RepresentationAnalyze Regression Linear Define DV & IV Plots Tick Histogram & Normal Probability Plot OK
Quantitative Techniques in Analysis Page 37
Statistical Applications through SPSS
S. Ali Raza Naqvi
Quantitative Techniques in Analysis Page 38
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Variables Entered/Removedb
BeginningSalary
a . Enter
Model1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: Current Salaryb.
The above table tells us about the independent variable and the regression method used. Here we see that the independent variable i.e. beginning salary is entered for the analysis as we selected the Enter method.
Model Summaryb
.880a .775 .774 $8,115.356Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Beginning Salarya.
Dependent Variable: Current Salaryb.
This table gives us the R-value, which represents the correlation between the observed values and predicted values of the dependent variable. R-Square is called the coefficient of determination and it gives the adequacy of the model. Here the value of R-Square is 0.775 that means the independent variable in the model can predict 77.5% of the variance in dependent variable. Adjusted R-Square gives the more accurate information about the model fitness if one can further adjust the model by his own.
ANOVAb
106831048750.13 1 106831048750.1 1622.118 .000a
31085446686.216 472 65858997.217
137916495436.34 473
Regression
Residual
Total
Model1
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Beginning Salarya.
Dependent Variable: Current Salaryb.
Quantitative Techniques in Analysis Page 39
Statistical Applications through SPSS
S. Ali Raza Naqvi
The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled Regression gives the variability in the model due to known reasons. The second row labeled Residual gives the variability due to random error or unknown reasons. F-value in this case is 1622.118 and the p-value is given by 0.000 which is less that 0.05, so we reject our null hypothesis and conclude that the mean beginning salary is not equal to the mean current salary of the employees.
Coefficientsa
1928.206 888.680 2.170 .031
1.909 .047 .880 40.276 .000
(Constant)
Beginning Salary
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Current Salarya.
The above table gives the regression constant and coefficient and their significance. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary i.e.
Current salary = 1928.206 + (1.909) (Beginning salary)
Now we test our hypothesis, we see that the p-value for regression coefficient of beginning salary is given by 0.000, which is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero.
Charts
Histogram
Quantitative Techniques in Analysis Page 407.55.02.50.0-2.5-5.0
Regression Standardized Residual
125
100
75
50
25
0
Fre
qu
en
cy
Mean = -3.17E-16Std. Dev. = 0.999N = 474
Dependent Variable: Current Salary
Statistical Applications through SPSS
S. Ali Raza Naqvi
The above histogram of standardized residuals shows the value of mean and standard deviation of the residual in the model. The mean and standard deviation is approximately 0 and 1 respectively, which shows that the fitted model is best and the chances of error is minimum.
Normal P –P Plot of Regression Standardized Residual
1.00.80.60.40.20.0
Observed Cum Prob
1.0
0.8
0.6
0.4
0.2
0.0
Ex
pe
cte
d C
um
Pro
b
Dependent Variable: Current Salary
The above Normal probability plot of regression standardized residual shows the regression line which touches maximum number of points presents in the model and it also shows the accuracy of the fitted model.
Scatter Plot
Quantitative Techniques in Analysis Page 41
Statistical Applications through SPSS
S. Ali Raza Naqvi
7.55.02.50.0-2.5-5.0
Regression Studentized Deleted (Press) Residual
$140,000
$120,000
$100,000
$80,000
$60,000
$40,000
$20,000
$0
Cu
rre
nt
Sa
lary
Dependent Variable: Current Salary
The above scatter plot also shows the adequacy of the fitted model as we can see that the data is scattered and it does not follow any particular pattern, so we can say that the fitted model has minimum chances of error.
Multiple Regression (Hierarchical Method)
Multiple regression is the most commonly used technique to assess the relationship between one dependent variable and several independent variables. There are three major types of multiple regression i.e.
Standard multiple regression. Hierarchical or Sequential regression. Stepwise or statistical regression.
Data Source:C:\SPSSEVAL\Employee Data
Variables: Here we are interested to analyze four numerical variables i.e.
Current salary (Numerical) Beginning salary (Numerical) Educational Level (Numerical) Month since Hire (Numerical)
Hypothesis:
H0: Regression coefficients are zero.HA: Regression coefficients are not zero.
SPSS Need:SPSS need more than two numerical variables that should be scaled.
Quantitative Techniques in Analysis Page 42
Statistical Applications through SPSS
S. Ali Raza Naqvi
Method:
The method for analyzing multiple regression is same as we discuss earlier in the case of linear regression. The only change in the case of multiple regression is that we have one dependent variable along with three independent variables. Here Current salary is the dependent variable, whereas Beginning salary, Educational Level, and Month since hire are the independent variables.
So here we transfer current salary in the box labeled Dependent and beginning salary, educational level, and month since hire in the box labeled Independent(s).
The further procedure and uses of advance options for extra results are discussed earlier in the case of linear regression. Now after making appropriate selections of options for better results click on OK to run the analysis.
OUTPUT
Variables Entered/Removedb
Beginning Salarya . Enter
Educational Level (years)a . Enter
Months since Hirea . Enter
Model1
2
3
Variables EnteredVariablesRemoved Method
All requested variables entered.a.
Dependent Variable: Current Salaryb.
The above table shows that beginning salary was entered in model one followed by educational level in model two followed by months since hire in model three. Note that model one includes only beginning salary as independent variable. Whereas model two includes beginning salary and educational level as independent variables, and so on model three includes beginning salary, educational level, and months since hire as independent variables. Enter method is used to assess all three models.
Quantitative Techniques in Analysis Page 43
Model Summary
.880a .775 .774 $8,115.356 .775 1622.118 1 472 .000
.890b .792 .792 $7,796.524 .018 40.393 1 471 .000
.895c .801 .800 $7,645.998 .008 19.728 1 470 .000
Model1
2
3
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), Beginning Salarya.
Predictors: (Constant), Beginning Salary, Educational Level (years)b.
Predictors: (Constant), Beginning Salary, Educational Level (years), Months since Hirec.
Statistical Applications through SPSS
S. Ali Raza Naqvi
The above table shows different R-values along with change statistics for the three models in different rows. In this table we get some additional statistics under the column change statistics. Under change statistics, the first column labeled R-square change gives change in the R-square value between the three models. The last column labeled Sig. F Change tests whether there is a significant improvement in models as we introduce additional independent variables. In other words it tells us if the inclusion of additional independent variables in different steps helps in explaining significant additional variance in the dependent variable. We can see the R-square change value in row three is 0.008. This means that the inclusion of month since hire after beginning salary and educational level helps in explaining the additional 0.8% variance in the current salary of the employees. The p-value for all three models shows that our value falls in the critical region, so we can reject our null hypothesis that means regression coefficients are not zero.
Coefficientsa
1928.206 888.680 2.170 .031
1.909 .047 .880 40.276 .000
-7808.714 1753.860 -4.452 .000
1.673 .059 .771 28.423 .000
1020.390 160.550 .172 6.356 .000
-19986.5 3236.616 -6.175 .000
1.689 .058 .779 29.209 .000
966.107 157.924 .163 6.118 .000
155.701 35.055 .092 4.442 .000
(Constant)
Beginning Salary
(Constant)
Beginning Salary
Educational Level (years)
(Constant)
Beginning Salary
Educational Level (years)
Months since Hire
Model1
2
3
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Current Salarya.
The above table gives the regression coefficients and related statistics for three models separately in different rows. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary of the employees for three models i.e.
MODEL 1 CS = 1928.206 + (1.909) (BS)MODEL 2 CS = -7808.714 + (1.673) (BS) + (1020.390) (EL)MODEL 3 CS = 19986.50 + (1.689) (BS) + (966.107) (EL) + (155.701) (MSH)
Now we test our hypothesis, we see that the p-value for regression coefficient in all three models is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero.
Quantitative Techniques in Analysis Page 44
Statistical Applications through SPSS
S. Ali Raza Naqvi
Conclusion: By using hierarchal or stepwise method for multiple regression we concluded that model adequacy is being increased by introducing each independent variable but the increased in adequacy by including the independent variable i.e. educational level is more than the adequacy increased by introducing the independent variable, months since hire. But as our p-value lie in the critical region so we can reject our null hypothesis and conclude that the regression coefficients for all three models are not equal to zero.
Charts:
This model also produces three diagrams for the standardized residual i.e. Histogram, Normal Probability Plot, and Scatter Plot. The charts and its interpretation are almost same as we discuss under the case of linear regression. So we are not describing these charts and its interpretations again.
Curvilinear / Quadratic Regression
The relationship between variables when the regression equation is nonlinear i.e. quadratic or higher order then their dependency called curvilinear or quadratic regression. There may be more than one dependent variable depending on one independent variable.
Data Source:C:\SPSSEVAL\Employee Data
Variables: Here we are interested to analyze three numerical variables i.e.
Current salary (Numerical) Beginning salary (Numerical) Educational level (Numerical)
Hypothesis:
H0: Regression coefficient is zero.HA: Regression coefficient is not zero.
SPSS Need:SPSS need two numerical variables and both should be scaled.
Method:
Quantitative Techniques in Analysis Page 45
Statistical Applications through SPSS
S. Ali Raza Naqvi
The given data is entered in the data editor and the variables are labeled as current salary, beginning salary and Educational level. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Curve Estimation, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent(s). Transfer the independent variable into the box labeled Independent. In our case, current salary and Beginning salary are dependent variables and Educational level is an independent variable.
Now choose an appropriate model you want by ticking its box appearing below the window labeled Curve Estimation. In this case we choose Quadratic model by ticking its corresponding box.
The Save button can be used to save statistics like predicted values, residuals, and predicted intervals. Now click on OK in the main dialogue box to run the analysis.
Pictorial RepresentationAnalyze Regression Curve Estimation
Define DVs and IV Tick Quadratic OK
Quantitative Techniques in Analysis Page 46
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Model Description
MOD_2
Current Salary
Beginning Salary
Quadratic
Educational Level (years)
Included
Unspecified
.0001
Model Name
1
2
Dependent Variable
1Equation
Independent Variable
Constant
Variable Whose Values Label Observations inPlots
Tolerance for Entering Terms in Equations
The above table gives the description of the model. In this case we have two dependent variables i.e. Current salary and Beginning salary along with one independent variable i.e. Educational level (years).
Quantitative Techniques in Analysis Page 47
Statistical Applications through SPSS
S. Ali Raza Naqvi
Case Processing Summary
474
0
0
0
Total Cases
Excluded Casesa
Forecasted Cases
Newly Created Cases
N
Cases with a missing value in anyvariable are excluded from the analysis.
a.
The above table shows the number of cases fall in the selected model. In our case the total number of cases is 474, with no excluded or missing cases.
The above table gives the test results for the quadratic regression. R-value shows the correlation between the observed and expected values of the dependent variables. In this case the F-value is given by 337.246, with level of significance equals 0.000 which is less that 0.05. This means that our value falls in the critical region, so we can reject our null hypothesis and conclude this as the regression coefficients are not zero.
Scatter Plots
Quantitative Techniques in Analysis Page 48
Model Summary and Parameter Estimates
Dependent Variable: Current Salary
.589 337.246 2 471 .000 85438.237 -12428.5 612.950EquationQuadratic
R Square F df1 df2 Sig.
Model Summary
Constant b1 b2
Parameter Estimates
The independent variable is Educational Level (years).
Statistical Applications through SPSS
S. Ali Raza Naqvi
$140,000
$120,000
$100,000
$80,000
$60,000
$40,000
$20,000
$0
222018161412108
Educational Level (years)
Quadratic
Observed
Current Salary
$80,000
$60,000
$40,000
$20,000
$0
222018161412108
Educational Level (years)
Quadratic
Observed
Beginning Salary
The above charts for residuals of dependent variables clearly show that the residual values are not scattered and it follows a particular pattern, this means that the fitted model is not good.
Quantitative Techniques in Analysis Page 49
Statistical Applications through SPSS
S. Ali Raza Naqvi
Linker Type ScalingLinker type scaling is a method used for nominations on categorical data in order to make the categorical data meaningful, when we have to apply some statistical test on the data. Through this scaling approach the ranks assign can be treated as numerical values and its statement at once.
Suppose we have to collect the data about the awareness, preferences, usage, likeness, and dislikeness as well as the agreement with any statement that should return in qualitative form and we have to record the responses, which can be analyze statistically. So in this condition we use linker type scaling.
Data Source:
RUN \\temp\temp\Ali Raza\Mateen.sav
Hypothesis:
H0: µMale = µFemale
HA: µMale ≠ µFemale
Variables:
Here we are interested to analyze two categorical variables i.e.
Gender. (Categorical) Preference of cellular service with respect to network coverage.
(Categorical but treated as Numerical)
Here we consider the preference of cellular service as a numerical variable and statistically test the hypothesis that Mean preference of male and female over cellular service with respect to network coverage is same. The method, we use to test the above hypothesis is Independent samples t-test.
Method:
Enter the data in the data editor and the variables are labeled as Gender and preference. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Preference is the dependent variable to be analyzed and should be transferred into test variable box. Gender is the variable which will identify the groups and it should be transferred into the grouping variable box.
Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents Male and group2 represents female. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.
Quantitative Techniques in Analysis Page 50
Statistical Applications through SPSS
S. Ali Raza Naqvi
Pictorial RepresentationAnalyze Compare Means Independent-Samples T TestDrag Test & Grouping Variable Define Groups OK
Quantitative Techniques in Analysis Page 51
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Group Statistics
100 4.12 .891 .089
100 4.32 .618 .062
Gender contain Bothmale and femaleMale
Female
Wide network coveragemotivate the induvidualto prefer a particularcellulare service
N Mean Std. DeviationStd. Error
Mean
This table contains the descriptive statistics for both groups. We have taken 200 observations for the independent samples t-Test in which 100 belongs to male category and 100 to female category. The column labeled Mean shows that the mean preferences of cellular service with respect to network coverage for both groups are approximately 4. This means that both groups are Agree that wide network coverage motivate the individual to prefer a particular cellular service.
The above table contains the test statistics for independent samples t-test.
Levene's Test: The table contains two sets of analysis, the first one assuming equal variances in the two groups and the second one assuming unequal variances. The Levene's test tells us which statistic to consider analyzing the equality of means. The p-value for Levene's test is given by 0.10, which is greater than 0.05. Therefore, the statistic associated with equal variances assumed should be used for the t-test for equality of means of two independent populations.
P-Value: shows that the value of our test statistic does not fall in the critical region i.e. 0.067 > 0.05 so we can accept our Null Hypothesis i.e. µMale = µFemale
Quantitative Techniques in Analysis Page 52
Independent Samples Test
2.730 .100 -1.845 198 .067 -.200 .108 -.414 .014
-1.845 176.31 .067 -.200 .108 -.414 .014
Equal variancesassumed
Equal variancesnot assumed
Wide network coveragemotivate the induvidualto prefer a particularcellulare service
F Sig.
Levene'sTest for
Equality ofVariances
t dfSig.
(2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95%Confidence
Interval of theDifference
t-test for Equality of Means
Statistical Applications through SPSS
S. Ali Raza Naqvi
Conclusion: The test results are statistically significant at 95% confidence level and the data provide sufficient evidence to conclude that the mean preference of cellular service with respect to network coverage for male and female is same and there is only 6.7% chance of rejecting a true Null Hypothesis and we are 93.3% confident in our decision.
Reliability AnalysisReliability analysis is applied to check the reliability of the data, that whether the conclusions and the analysis perform for the data are reliable to understand and forecast. One way to ideally measure reliability is by the test-retest method. However, establishing reliability through test-retest is practically very difficult.
Some of the commonly used techniques for assessing reliability include Cohen's Kappa Coefficient for categorical data and Cronbach's Alpha for internal reliability of the data set.
Data Source:C:\SPSSEVAL\Home Sales [By Neighborhood].sav
Variables:Here we are interested to check the reliability of data set, which includes five numerical variables i.e.
Appraised land Value. Appraised value of improvements. Total Appraised Value. Sales Price. Ratio of Sales price to total Appraised Value.
Note that the data contains one String variable labeled as Neighborhood; we deleted this variable because SPSS does not check the reliability, if the data contains any String or Blank variable.
SPSS Need:For reliability analysis through SPSS, one can use any variable of any nature except the String and Blank Variables.
Method:
Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Scale from that and click on Reliability Analysis, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the reliability analysis, transfer the variables in the box labeled Items by clicking on the arrow between the two boxes. In this case, we have five numerical variables in the data set that should be transferred to the Items box.
Quantitative Techniques in Analysis Page 53
Statistical Applications through SPSS
S. Ali Raza Naqvi
Choose appropriate Model by clicking on that box, here we choose Alpha as model. Now click on Statistics button, a dialogue box appears. Tick the corresponding box which you want to analyze in the output. Now click Continue to return to the main dialogue box. Click on OK to run the analysis.
Pictorial RepresentationAnalyze Scale Reliability AnalysisDrag Items Choose Model Give Statistics OK
Quantitative Techniques in Analysis Page 54
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Case Processing Summary
2440 100.0
0 .0
2440 100.0
Valid
Excludeda
Total
CasesN %
Listwise deletion based on allvariables in the procedure.
a.
The above table shows the total number of cases fall in the data set. We have 2440 observations with no missing and excluded cases.
Quantitative Techniques in Analysis Page 55
Statistical Applications through SPSS
S. Ali Raza Naqvi
Reliability Statistics
.576 5
Cronbach'sAlpha N of Items
The above table shows the test results for the reliability analysis. The value of Cronbach's Alpha is given by 0.576; the number of items in the data set is 5. The value associated with Alpha is said to be Poor and the conclusions draw from this data is not reliable to understand and forecast.
Item-Total Statistics
164151.7603 5533196618 .688 .480
140212.2148 4646009801 .505 .438
132761.3423 5160815037 .314 .533
106454.2587 1928523141 .565 .477
181191.6111 6801537138 -.032 .615
Appraised Land Value
Appraised Value ofImprovements
Total Appraised Value
Sale Price
Ratio of Sale Price toTotal Appraised Value
Scale Mean ifItem Deleted
ScaleVariance if
Item Deleted
CorrectedItem-TotalCorrelation
Cronbach'sAlpha if Item
Deleted
The above table shows the statistics associated with each item. The last column of the table shows the improvement in the value of Alpha, if the corresponding item is deleted from the data set. Now the value associated with the top four items in the data set is less than the current value of Alpha which is 0.576, that means if one of these items is deleted, the value of Cronbach's Alpha is become worst. But the value associated with the item labeled Ratio of sale price to total appraised value is given by 0.615. This means that if this item is deleted from the analysis and retests the reliability of the entire data, the value of Cronbach's Alpha becomes 0.615. So, in order to improve the value of Alpha to make our data set more reliable we delete the last item and retest the value of our Cronbach's Alpha.
Reliability Statistics
.615 4
Cronbach'sAlpha N of Items
Here we retest our data after deletion of one item and our new value of Alpha is given by 0.615. Now the total number of items in the entire data set is 4. The value associated with Alpha in this set of reliability statistics is said to be Acceptable and the conclusions draw from this data is reliable to understand and forecast.
Quantitative Techniques in Analysis Page 56
Statistical Applications through SPSS
S. Ali Raza Naqvi
Item-Total Statistics
164150.57 5533198039 .688 .540
140211.03 4646008210 .505 .493
132760.16 5160813863 .314 .599
106453.07 1928530335 .565 .536
Appraised Land Value
Appraised Value ofImprovements
Total Appraised Value
Sale Price
Scale Mean ifItem Deleted
ScaleVariance if
Item Deleted
CorrectedItem-TotalCorrelation
Cronbach'sAlpha if Item
Deleted
This table shows that if we delete any other item from the data set and retest the reliability, then our value of Alpha becomes Poor. Because all the values associated with the remaining four items in last column of the above table is less than the current value of our Cronbach's Alpha i.e. 0.615. So we don’t need to further retest the reliability of the data set, which means the data is reliable at the current value of our Cronbach's Alpha.
Correlation AnalysisCorrelation refers to the degree of relation between two numerical variables. It is denoted by "r", which is typically known as Correlation Coefficient.
Correlation CoefficientThe Correlation coefficient gives the mathematical value for measuring the strength of the linear relation between two variables. Mathematically the value of "r" always lay between -1 and 1 with:
(a) +1 representing absolute positive linear relationship (as X increases, Y increases).
(b) 0 representing no linear relationship (X and Y have no pattern).(c) -1 representing absolute inverse relationship (as X increases, Y, Decreases).
Bivariate CorrelationBivariate correlation tests the strength of relationship between two variables without giving any consideration to the interference some other variables might cause to the
Quantitative Techniques in Analysis Page 57
Statistical Applications through SPSS
S. Ali Raza Naqvi
relationship between the two variables being tested. For example, while testing the correlation between the Current and Beginning salary of the employees, bivariate correlation will not consider the impact of some other variables like Educational Level and Previous Experience of the employees. In such cases, a bivariate analysis may show us a strong relationship between Current and Beginning salary; but in reality, this strong relationship could be the result of some other extraneous factors like Educational Level and Previous Experience etc.
Data Source:
C:\SPSSEVAL\Employee data
Hypothesis:
H0: There is no Correlation between Variables (r =0)HA: There is some Correlation between Variables (r ≠ 0)
Variables:Here we are interested to analyze three numerical variables i.e.
Current salary (Numerical) Beginning salary (Numerical) Educational Level (years) (Numerical)
Technically correlation analysis can be run with any kind of data, but the output will be of no use if a correlation is run on a categorical variable with more than two categories. For example, in a data set, if the respondents are categorized according to nationalities and religions, correlation between these variables is meaningless.
SPSS Need:
SPSS need two or more numerical variables to perform Correlation Analysis.
Method:
Firstly the data is entered in the data editor and the variables are labeled as Current salary, Beginning salary, Educational Level, and Previous Experience. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Bivariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the bivariate correlation, choose the variables for which the correlation is to be studied from the left-hand side box and move them to the right-hand side box labeled Variables. Once any two variables transferred to the variables box, the OK button becomes active.
In our case we will transfer four numerical variables i.e. Current salary, Beginning salary, Educational Level, and Previous Experience, to the right-hand side box labeled as Variables. There are some default selections at the bottom of the window; that can be
Quantitative Techniques in Analysis Page 58
Statistical Applications through SPSS
S. Ali Raza Naqvi
change by clicking on the appropriate boxes. For our purpose, we will use the most commonly used Pearson's Coefficient.
Next, while choosing between one-tailed and two-tailed test of significance, we have to see if we are making any directional prediction. The one-tailed test is appropriate if we are making predictions about a positive or negative relationship between the variables; however the two-tailed test should be used if there is no prediction about the direction of relationship between the variables to be tested. Finally Flag Significant Correlations asks SPSS to print an asterisk next to each correlation that is significant at the 0.05 significance level and two asterisks next to each correlation that is significant at the 0.01 significant level, so that the output can be read easily. The default selections will serve the purpose for the problem at hand. We may choose Means and Standard Deviations from the Options button if we wish to compute these figures for the given data. After making appropriate selections, click on OK to run the analysis.
Pictorial Representation Analyze Correlate Bivariate
Define Variables Choose appropriate options OK
Quantitative Techniques in Analysis Page 59
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Quantitative Techniques in Analysis Page 60
Statistical Applications through SPSS
S. Ali Raza Naqvi
Correlations
1 .880** .661**
.000 .000
474 474 474
.880** 1 .633**
.000 .000
474 474 474
.661** .633** 1
.000 .000
474 474 474
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Current Salary
Beginning Salary
Educational Level (years)
Current SalaryBeginning
SalaryEducationalLevel (years)
Correlation is significant at the 0.01 level (2-tailed).**.
The above table gives the correlation for all pairs of variables and each correlation is produced twice in the matrix. So here we get following 3 correlations for the given data.
Current salary and Beginning salary Current salary and Educational level Beginning salary and Educational level
The value of correlation coefficient is 1 in the cells where SPSS compare two same variables (Current salary and Current salary and so on). This means that there is a perfect positive correlation between the variables.
In each cell of the correlation matrix, we get Pearson's correlation coefficient, p-value for two-tailed test of significance and the sample size. From the output we can see that the correlation coefficient between Current salary and Beginning salary is 0.88 and the p-value for two-tailed test of significance is less than 0.05. From these figures we can conclude that there is a strong positive correlation between Current salary and beginning salary and that this correlation is significant at the significance level of 0.01.
Similarly, the correlation coefficient for Current salary and Educational level is 0.661. So there is a moderate positive correlation between these variables.
The correlation coefficient for Beginning salary and Educational level is 0.633 and its p-value is given by 0.000, so we can reject our null hypothesis and conclude this as there is some correlation between these two variables.
Conclusion: At 1% level of significance all variables are significantly correlated with each other. In this case our null hypothesis is rejected that there is no correlation between the variables for all pairs of variables. We can conclude this as there is some correlation present between all variables in the given data.
Partial Correlation
Quantitative Techniques in Analysis Page 61
Statistical Applications through SPSS
S. Ali Raza Naqvi
Partial correlation allows us to examine the correlation between two variables while controlling for the effects of one or more of the additional variables without throwing out any of the data.
In other words, it is the degree of relation between the dependent variable and one of the independent variable by controlling the effect of other independent variables, because we know that, in a multiple regression model, one dependent variable depends on two or more independent variables.
Data Source:
C:\SPSSEVAL\Employee data
Hypothesis:
H0: There is no Correlation between Variables (r =0)HA: There is some Correlation between Variables (r ≠ 0)
Variables:Here we are interested to analyze two numerical variables, while controlling one additional variable.
Current salary Beginning salary Educational level (Control variable)
SPSS Need:
SPSS need two or more numerical variables to perform partial Correlation.
Method:
Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Partial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the partial correlation, transfer the variables for which you want to know the correlation between them in the box labeled Variables, while controlling the effect of one or more additional variables by transferring them to the box labeled Controlling for.
In our case, we want to find the correlation between Current salary and beginning salary of the employees, so these variables should be transferred to the box labeled Variables, while controlling for the effect of Educational level and Previous experience of the employees by transferring them to the box labeled Controlling for. Now click on OK to run the analysis.
Pictorial Representation
Quantitative Techniques in Analysis Page 62
Statistical Applications through SPSS
S. Ali Raza Naqvi
Analyze Correlate PartialDrag Variables Drag Controlling Variables OK
Quantitative Techniques in Analysis Page 63
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Correlations
1.000 .795
. .000
0 471
.795 1.000
.000 .
471 0
Correlation
Significance (2-tailed)
df
Correlation
Significance (2-tailed)
df
Current Salary
Beginning Salary
Control VariablesEducational Level (years)
Current SalaryBeginning
Salary
The above table shows the test results for the partial correlation between Current salary and beginning salary of the employees. The variable we are controlling for in the analysis is Educational level, and it is shown in the left-hand side of the table.
We can see that the correlation coefficient between Current salary and Beginning salary is 0.795, which is considerably smaller as compared to 0.88 in case of Bivariate. This means that both the variables are still have positive correlation, but the value of correlation coefficient decreased if we control for the Educational level of the employees and the variables are no longer strongly correlated with each other.
Conclusion: The test results are significant at 5% level of significance and the data provide sufficient evidence to conclude that there is some correlation present between the Current salary and Beginning salary of the employees, but it is considerably smaller in the case of partial correlation than in case of bivariate correlation.
Logistic Regression
Logistic regression starts in 1700. If a categorical variable depends on any numerical or categorical variable then their dependency may called the logistic regression. It is used to predict a discrete outcome based on variables may be discrete, continuous, or mixed. Thus when the dependent variable is categorical with two or more than two discrete outcomes, logistic regression is a commonly used technique. It has the following two types:
Binary logistic regression / Logit Multinomial logistic regression
Coefficient of Logistic Regression
Logistic regression computes the log odds for a particular outcome. The odds of an outcome is given by the ratio of the probability of it happening and not happening as [P / (1-P)], where P is the probability of an event. There are some mathematical problems in reporting these odds, so natural logarithms of these odds are calculated. A positive value indicates that odds are in favor of the event and the event is likely to occur while a negative
Quantitative Techniques in Analysis Page 64
Statistical Applications through SPSS
S. Ali Raza Naqvi
value indicates that odds are against the events and the event is not likely to occur. The formula to do so may be written either
Binary Logistic Regression
If a categorical variable having only two levels E.g. Male and Female, Yes or No, Good and Bad etc, and it is depending on different categorical or numerical independent variables then their relation can be referred as Binary logistic regression. The expression for binary logistic regression may be given as
Y = b0 + b1X1 + b2X2 + ………..bkXk
Data Source:C:\SPSSEVAL\AML Survival
Hypothesis:
H0: Regression coefficients are zeroHA: Regression coefficients are not zero
Variables: Here we are interested to analyze three Different variables i.e.
Status (Categorical) Time (Numerical) Chemotherapy (Categorical)
Here Status is our dependent variable depending on Time and Chemotherapy. As in this case our dependent variable is categorical having only two levels i.e. Censored and Relapsed, so we use binary logistic regression to analyze the dependency between the variables.
SPSS Need:
SPSS need one dependent variable and it must be Categorical, while the independent variables can be categorical as well as numerical.
Method:
Quantitative Techniques in Analysis Page 65
Statistical Applications through SPSS
S. Ali Raza Naqvi
Firstly the data is entered in the data editor and the variables are labeled as Status, Time, and Chemotherapy. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Binary Logistics, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binary Logistics Regression, transfer the dependent variable in the box labeled Dependent and the independent variables in the box labeled Covariates.
In our case, Status is an only dependent variable and should be transfer to the box labeled Dependent. Time and Chemotherapy are independent variables and should be transfer to the box labeled Covariates.
Next we have to select the method of for analysis in the box labeled Method. SPSS gives seven options, of which the Enter method is most commonly used. For common purpose one does not need to use the Save and Options buttons. Advance users may experiment with these. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression.
After making appropriate selections, click on OK to run the analysis.
Pictorial Representation Analyze Regression Binary Logistic Drag Dependent Drag Covariates OK
Quantitative Techniques in Analysis Page 66
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Quantitative Techniques in Analysis Page 67
Statistical Applications through SPSS
S. Ali Raza Naqvi
Case Processing Summary
23 100.0
0 .0
23 100.0
0 .0
23 100.0
Unweighted Casesa
Included in Analysis
Missing Cases
Total
Selected Cases
Unselected Cases
Total
N Percent
If weight is in effect, see classification table for the totalnumber of cases.
a.
The above table gives the description of cases selected for the analysis. We have totaled 23 cases included in the analysis with no missing and unselected cases.
Dependent Variable Encoding
0
1
Original ValueCensored
Relapsed
Internal Value
The above table shows that how the two outcomes or two levels of Status i.e. Censored and Relapsed have been coded by SPSS.
Block 0: Beginning Block
Classification Tablea,b
0 5 .0
0 18 100.0
78.3
ObservedCensored
Relapsed
Status
Overall Percentage
Step 0Censored Relapsed
Status PercentageCorrect
Predicted
Constant is included in the model.a.
The cut value is .500b.
The above table shows the observed or actual number of cases fall in each category of the dependent variable. The last column labeled Percentage Correct shows that our model can predict 0% status of the censored patients and 100% status for the relapsed patients. Overall, our model can predict 78.3% status of the patients.
Quantitative Techniques in Analysis Page 68
Statistical Applications through SPSS
S. Ali Raza Naqvi
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
4.609 2 .100
4.609 2 .100
4.609 2 .100
Step
Block
Model
Step 1Chi-square df Sig.
The above table reports significance levels by the traditional chi-square method. It tests if the model with the predictors is significantly different from the model. The omnibus test may be interpreted as a test of the capability of all predictors in the model jointly to predict the response (dependent) variable. A finding of significance, as in the illustration above, corresponds to the a research conclusion that there is adequate fit of the data to the model, meaning that at least one of the predictors is significantly related to the response variable. In the illustration above, the Enter method is used (all model terms are entered in one step), so there is no difference for Step, Block, or Model, but in a stepwise procedure one would see results for each step.
Model Summary
19.476a .182 .280Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 5 becauseparameter estimates changed by less than .001.
a.
The above table gives the Cox & Snell R-Square value, which gives an approximation about how much variance in the dependent variable can be explained with the hypothesized model. In this case Time and Chemotherapy can explain 18.2% of the patient's current Status.
Classification Tablea
1 4 20.0
0 18 100.0
82.6
ObservedCensored
Relapsed
Status
Overall Percentage
Step 1Censored Relapsed
Status PercentageCorrect
Predicted
The cut value is .500a.
The above Classification table summarizes the results of our predictions about patient's Status based on Time and Chemotherapy. We can see that our model can correctly predict 20% status of censored patients and 100% status of the relapsed patients. Overall, our model predicts 82.6% status of the patients.
Quantitative Techniques in Analysis Page 69
Statistical Applications through SPSS
S. Ali Raza Naqvi
Variables in the Equation
-1.498 1.262 1.409 1 .235 .224
-.024 .024 1.055 1 .304 .976
2.962 1.207 6.025 1 .014 19.332
chemo
time
Constant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: chemo, time.a.
The above table gives the Beta coefficients for the independent variables along with their significance. Negative beta coefficients for time and chemotherapy mean that with increasing chemotherapy and time of the treatment, it chances for the patient of having a relapsed status. Same as Multiple linear regression models, we can construct an OLS equation for the status of the patient by the help of above regression constant and coefficients. The expression for status of the patient is given by:
Status = 2.962 + (-1.498) (Chemotherapy) + (-0.024) (Time)
The last column labeled Exp(B) takes a value of more than one, if the beta coefficients are positive and less than one, if it is negative. In our case, the beta coefficients for Chemotherapy and time are negative, so coefficients are having the values of less than one in column labeled Exp(B). A value of 0.976 for Time indicates that for 1 week increase in the treatment, the odds of a patient having a relapsed status increases by a factor of 0.976. These values can also use to construct an equation for the odds of a patient, and it is given by:
P = 1
1 + e-(19.332 + 0.224 C + 0.976 T)
Non-Parametric Tests
Non-Parametric tests are used to test the hypothesis regarding the population parameters of non-normal data with small sample size (less than 30). These tests are sometimes also referred as "Distribution-Free tests"
Binomial Test
Binomial tests are used to test the hypothesis regarding the population proportion. It runs on a categorical variable having two levels only.
Quantitative Techniques in Analysis Page 70
Statistical Applications through SPSS
S. Ali Raza Naqvi
Data Source:C:\SPSSEVAL\Carpet
Hypothesis:
H0: P = 0.5HA: P ≠ 0.5
Variables:
Here we are interested to analyze a categorical variable i.e. House keeping Seal. In our case a superstore owner claims that 50% of their customers got house keeping seal on the purchase of the product.
SPSS Need:SPSS need one categorical variable (2 levels only).
Method:
Firstly the data is entered in the data editor and the variable is labeled as House keeping seal. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Binomial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binomial test, transfer the test variable in the box labeled Test variable list. In our case House keeping seal is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now give the test value in the box below labeled as Test Proportion. In our case the test value is 0.50.
After making appropriate selections, click on OK to run the analysis.
Pictorial Representation
Analyze Non-Parametric tests Binomial Drag test Variable Give test Proportion OK
Quantitative Techniques in Analysis Page 71
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Quantitative Techniques in Analysis Page 72
Statistical Applications through SPSS
S. Ali Raza Naqvi
NPar Tests
Binomial Test
Yes 8 .36 .50 .286
No 14 .64
22 1.00
Group 1
Group 2
Total
Good Housekeeping sealCategory N
ObservedProp. Test Prop.
Exact Sig.(2-tailed)
The above table gives the test results for the Binomial Non-parametric test.
The first column labeled Category gives the two categories (Yes or No) of the test variable i.e. Good House keeping seal.
The second column labeled as N gives the total number cases analyzed, and also the number of cases fall in each category of our test variable. In this case we have selected the sample of 22 persons out of which 8 persons say Yes they got the house keeping seal and the remaining says No.
The third column labeled as Observed Proportion gives the percentage of the persons saying Yes or No. 36% individuals says Yes they got the house keeping seal while 64% individuals says No.
The last column gives the p-value for the 2-tailed test and it is given by 0.286, which is greater than 0.05, so we can accept our null hypothesis and conclude that the claim of the superstore owner is correct, the proportion is 0.50.
Quantitative Techniques in Analysis Page 73
Statistical Applications through SPSS
S. Ali Raza Naqvi
Runs Test
Runs test is used to test the randomness of the data. This test is best run, if the test variable is numerical. The word RUNS refer the number of time sign is changed.
Data Source:
C:\SPSSEVAL\Carpet
Hypothesis:
H0: Data is randomHA: Data is not random
Variables:
Here we are interested to analyze a numerical variable i.e. Preference.
SPSS Need:
SPSS need a numerical variable with small sample size.
Method:
Firstly the data is entered in the data editor and the variable is labeled as Preference. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Runs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Run test, transfer the test variable in the box labeled Test variable list. In our case Preference is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now in our case the test variable i.e. Preference is a numerical variable, so in the Section labeled Cut Point, we tick the box Median, but if the test variable is categorical, it is appropriate to calculate its Mean by ticking its corresponding box.
After making appropriate selections, click on OK to run the analysis.
Quantitative Techniques in Analysis Page 74
Statistical Applications through SPSS
S. Ali Raza Naqvi
Pictorial Representation
Analyze Non-Parametric tests Runs Drag test Variable Tick Box (Median) OK
Quantitative Techniques in Analysis Page 75
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
Runs Test
11.50
11
11
22
13
.218
.827
Test Valuea
Cases < Test Value
Cases >= Test Value
Total Cases
Number of Runs
Z
Asymp. Sig. (2-tailed)
Preference
Mediana.
The above table gives the test results for Runs Test. The first row labeled as Test Value gives the Median of the data. In this case out of 22 observations 11 values is less than our median or in other words those values having a negative sign, while the remaining values having a positive sign. The row labeled Number of Runs gives a value 13; this means in the given data 13 times a sign is changed. The last row gives the p-value for the Runs test and it is given by 0.827 > 0.05, so we can accept our null hypothesis and conclude that the Data is Random.
Representation of Runs:
Quantitative Techniques in Analysis Page 76
Statistical Applications through SPSS
S. Ali Raza Naqvi
One Sample K-S Test
One sample K-S Test is used to test the goodness of fit of any specific distribution for the given data. This distribution is called "Kolmogrov-Smirnov Z" commonly known as "Non-parametric Chi-square".
Data Source:
C:\SPSSEVAL\Carpet
Hypothesis:
H0: Fit is Good (Data follows the fitted distribution)
HA: Fit is not Good (Data does not follow the fitted distribution)
Variables:
Here we are interested to analyze a numerical variable i.e. Price.
Quantitative Techniques in Analysis Page 77
Statistical Applications through SPSS
S. Ali Raza Naqvi
SPSS Need:
SPSS need a numerical variable with small sample size.
Method:
Firstly the data is entered in the data editor and the variable is labeled as Price. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on 1-sample K-S, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the K-S test, transfer the test variable in the box labeled Test variable list. In our case Price is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now tick the box in the section labeled Test Distribution at the bottom of the dialogue box. In our case the fitted distribution is Poisson, so we tick the corresponding box labeled Poisson.
After making appropriate selections, click on OK to run the analysis.
Pictorial Representation
Analyze Non-Parametric tests Runs Drag test Variable Tick Box (Median) OK
Quantitative Techniques in Analysis Page 78
Statistical Applications through SPSS
S. Ali Raza Naqvi
OUTPUT
One-Sample Kolmogorov-Smirnov Test
22
2.0000
.143
.143
-.135
.670
.760
N
MeanPoisson Parametera,b
Absolute
Positive
Negative
Most ExtremeDifferences
Kolmogorov-Smirnov Z
Asymp. Sig. (2-tailed)
Price
Test distribution is Poisson.a.
Calculated from data.b.
The above table gives the test results for the one sample K-S test. We have taken 22 observations for the analysis. The mean of Poisson distribution calculated from data is given by 2.
The row labeled Absolute gives the difference between extreme values i.e. extremely high values and Extremely Low values and it given by 0.143.
Quantitative Techniques in Analysis Page 79
Statistical Applications through SPSS
S. Ali Raza Naqvi
The row labeled Positive gives the difference between the Maximum and Minimum values, when we subtract Minimum value from maximum value and it is 0.143.
The row labeled Negative also gives the same as row labeled Positive, but here we subtract Maximum value from the Minimum value, so the resulted value is given by (-0.135).
The Kolmogrov-Smirnov Z value is given by 0.67, which calculated from the formula.
The last row gives the p-value for the analysis. In our case the p-value is given by 0.76, and it is greater than 0.05. So we can accept our null hypothesis and conclude this test as the Fit is Good and the data follows the Poisson distribution.
Quantitative Techniques in Analysis Page 80