Download - Theory of Decision

QTIA Statistical Applications

through SPSS

Statistical Applications through SPSS

S. Ali Raza Naqvi

Variables:

A quantity which changes its value time to time, place to place and person to person is called variable and if the

corresponding probabilities are attached with the values of variable then it is called a random variable.

For example

If we say x= 1 or x=7 or x=-6 then x is a variable but if a variable appears in the following way then it is known as

a random variable.

x P(x)

1 0.2

2 0.3

3 0.1

4 0.4

Population:

A large count or the whole count of the object related things is called population. There are two types of population

it may be finite or infinite. If the population elements are countable then it is known as finite population but if the

population elements are uncountable then it is called an infinite population.

For example:

Population of MBA students at IUGC (Finite Population)

Population of the University teachers in Pakistan (Finite Population)

Population of trees (Infinite Population)

Population of sea life (Infinite Population)

The population is also categorized in two ways.

1. Homogeneous population

2. Heterogeneous population

Homogeneous Population:

If all the population elements have the same properties then the population is known as homogeneous population.

Quantitative Techniques in Analysis


S. Ali Raza Naqvi

For example: Population of shops, Population of houses, Population of boys, Population of rice in a box etc.

Heterogeneous Population:

If all the population elements do not have the same properties then the population is known as homogeneous

population.

For example: Population of MBA students (Male and Female), Population of plants, etc.

Parameter:

A constant computed from the population or a population characteristic is known as parameter.

For Example:

Population Mean µ, Population standard deviation δ, coefficient of skewness and kurtosis for the population.

Statistic:

A constant computed from the Sample or a sample characteristic is known as parameter.

For Example:

Sample mean , sample standard deviation s, coefficient of skewness and kurtosis for the sample.

Estimator:

A Sample statistic used to estimate the population parameter is known as estimator.

For Example:

Sample mean is used to estimate the population mean. So sample mean is also called an estimator of population

mean.

Sample variance is used to estimate the population variance. So sample variance is also called an estimator of

population variance.

Hypothesis:

An assumption about the population parameter tested on the basis of sample information is called hypothesis or

hypothesis testing.

These assumptions are established in the way that we generate two alternative statements say “null and alternative

hypothesis” in such a manner if one statement is found wrong automatically other one is selected as correct

statement.



S. Ali Raza Naqvi

Types of Hypothesis:

1) Null Hypothesis:

A Statement or the first think about the parameter value is called a null hypothesis. But statistically we can say that

a null hypothesis is a statement should consist equality sign such as:

H0: µ = µ0

H0: µ ≤ µ0

H0: µ ≥ µ0

As it is clear from above statements there are two types of null hypothesis.

1- Simple null hypothesis

2- Composite null hypothesis

1-Simple Null Hypothesis:

If a null hypothesis is based on the single value (or it consist of only equal sign) then the null hypothesis is called a

simple null hypothesis

For Example H0: µ = µ0

Phrases

Average rain fall in United States of America during 1999 was 200 mm.

The average concentrations of two substances are same

The IQ level of MBA and BBA students are same.

IQ level is independent from education level.

2-Composite Null Hypothesis:

If a null hypothesis is based on the interval of the parameter value (or it consist of less then or greater then sign with

equal sign) then the null hypothesis is called a Composite null hypothesis

For Example H0: µ ≤ µ0

H0: µ ≥ µ0

Phrases



S. Ali Raza Naqvi

The mean height of BBA students are at most 70 inches

The performance of PHD students is at most same as MBA students

Variability in a data set must be positive (Greater or equal to zero)

2) Alternative Hypothesis:

An Automatically generated statement against the established null hypothesis is called an alternative hypothesis.

For Example:

Null Hypothesis Alternatives Hypothesis

H0: µ = µ0 (H1:µ ≠ µ0, H1:µ > µ0, H1: µ < µ0)

H0: µ ≤ µ0 (H1:µ ≠ µ0, H1:µ > µ0, H1: µ < µ0)

H0: µ ≥ µ0 (H1:µ ≠ µ0, H1:µ > µ0, H1: µ < µ0)

It is clear from the above stated alternatives that there are two different types of alternatives.

1- One tailed or One sided alternative hypothesis

2- Two tailed or two sided alternative hypothesis

1-One tailed Alternative Hypothesis:

If an alternative is based on either the greater then (>) or a less then (<) sign in the statement then the alternative

hypothesis is known as the one tailed hypothesis.

For Example: H1:µ > µ0, Or H1: µ < µ0

Phrases

Average rain fall in Pakistan is more then from average rain fall in Jakarta.

Inzamam is more consistent player then Shaid Afridi.

Waseem Akram is a better bowler then McGrath.

Gold prices are dependent on oil prices.

2-Two tailed Alternative Hypothesis:

If an alternative is based on only an unequal (≠) sign in the statement then the alternative hypothesis is known as the

two tailed hypothesis.

For Example: H1: µ ≠ µ0,

Phrases



S. Ali Raza Naqvi

The Concentration of two substances is not same.

There is a significant difference between the wheat production of Sind and Punjab.

The consistency of KSE and SSE is not same.

In this type of alternatives the total chance of type I error remain in only one side of the normal curve

In this type of alternatives the total chance of type I error is divided in two sides of the normal curve

Probabilities Associated with Decisions:

Ho is True Ho is False

Accept HoCorrect Decision

1-β

False Decision

Type II Error

β

Reject Ho

False Decision

Type I Error

α

Correct Decision

1-α



S. Ali Raza Naqvi

It is clear from the above figures that both the errors can not be minimized at the same time. An increase is observed

in the type II error when type I error is minimized.

P- Value:

It is the minimum value of alpha “α” which is needed to reject a true null hypothesis. As it is the value of “α” so it

can be explain as the minimum value of type I error which is associated with a hypothesis while it is testing.

Therefore, it is used in two ways, one in decision making and the other to determine the probability of type I error

associated with the testing.

Decision Rule on the basis of p - value:

Reject Ho if p – value < 0.05

Accept Ho if p – value ≥ 0.05

For example:

If the p-value for any test appears 0.01. It is indicating that our null hypothesis is to be rejected and there is only 1%

chance of rejecting a true null hypothesis. That further can explain as we are 99% confident in rejection of the null

hypothesis. Or we can say that we can reject our this null hypothesis up to α = 1% or 99% confidence level

If the p-value for any test appears 0.21. It is indicating that our null hypothesis is to be accepted and there is 21%

chance of rejecting a true null hypothesis. That further can explain as we are 79% confident in our decision and

rejection of the null hypothesis. Or we can say that our this true null hypothesis may be rejected at α = 21%.


True Population

Other Population


S. Ali Raza Naqvi

T-test:-

A t-test is a statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.

Uses of T-test:-

Among the most frequently used t tests are:

A test of whether the mean of a normally distributed population has a value specified in a null hypothesis.

A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points. We can use some kind of t-test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. There are different versions of the t- test depending on whether the two samples are

o Unpaired, independent of each other (e.g., individuals randomly assigned into two groups, measured after an intervention and compared with the other group), or

o Paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention.

Interpretation of the results:-

If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis which usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.

A test of whether the slope of a regression line differs significantly from 0.

Statistical Analysis of the t-test:-

The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-to-noise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group


http://www.socialresearchmethods.net/kb/expclass.php

http://en.wikipedia.org/wiki/Statistical_significance

http://en.wikipedia.org/wiki/Linear_regression

http://en.wikipedia.org/wiki/Statistical_significance

http://en.wikipedia.org/wiki/P-value

http://en.wikipedia.org/wiki/Statistical_independence

http://en.wikipedia.org/wiki/Standard_deviation

http://en.wikipedia.org/wiki/Mean

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Mean

http://en.wikipedia.org/wiki/Null_hypothesis

http://en.wikipedia.org/wiki/Standard_deviation

http://en.wikipedia.org/wiki/Sample_size

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Null_hypothesis

http://en.wikipedia.org/wiki/Student's_t-distribution

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing


S. Ali Raza Naqvi

difference. Figure shows the formula for the t-test and how the numerator and denominator are related to the distributions.

The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square rootThe t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value we have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, we need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred we would find a statistically significant difference between the means even if there was none (i.e., by "chance"). We also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, we can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, we can conclude that the difference between the means for the two groups is different (even given the variability.

Calculations:-

a) Independent one-sample t -test

In testing the null hypothesis that the population means is equal to a specified value μ0, one uses the statistic.


http://www.socialresearchmethods.net/kb/power.php

http://www.socialresearchmethods.net/kb/statdesc.htm#Dispersion


S. Ali Raza Naqvi

Where “s” is the sample standard deviation of the sample and “n” is the sample size. The degrees of freedom used in this test is “n – 1”.

b) Independent two-sample t -test:-

A) Equal sample sizes, equal variance

This test is only used when both:

the two sample sizes (that is, the n or number of participants of each group) are equal;

It can be assumed that the two distributions have the same variance.

Violations of these assumptions are discussed below.

The t statistic to test whether the means are different can be calculated as follows:

Where;

Here “ ” is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 = group two. The denominator of “t” is the standard error of the difference between two means.

For significance testing, the degrees of freedom for this test is “n1 + n2 − 2” where n1 = # of participants of group # “1” and “n2”= # of participants of group # “2”

B) Unequal sample sizes, unequal variance


http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)

http://en.wikipedia.org/wiki/Standard_error_(statistics)

http://en.wikipedia.org/wiki/Pooled_standard_deviation

http://en.wikipedia.org/wiki/Standard_deviation#Estimating_population_standard_deviation_from_sample_standard_deviation

http://en.wikipedia.org/wiki/Standard_deviation#Estimating_population_standard_deviation_from_sample_standard_deviation


S. Ali Raza Naqvi

This test is used only when the two sample sizes are unequal and the variance is assumed to be different. See also Welch's t test. The t statistic to test whether the means are different can be calculated as follows:

Where n1 = number of participants of group “1” and n2 is number of participants group two.

In this case, variance is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as being an ordinary Student's t distribution with the degrees of freedom calculated using

This is called the Welch-Satterthwaite equation. Note that the true distribution of the test statistic actually depends (slightly) on the two unknown variances.

This test can be used as either a one-tailed or two-tailed test.

c) Dependent t-test for paired samples:-

This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired".

For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The average (XD) and standard deviation (sD) of those differences are used in the equation. The constant μ0 is non-zero if you want to test whether the average of the difference is significantly different than μ0. The degree of freedom used is “N – 1”.

Example # 01


http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation

http://en.wikipedia.org/wiki/Student's_t

http://en.wikipedia.org/wiki/Welch's_t_test


S. Ali Raza Naqvi

Analysis through SPSS:-

A) One-sample t-test:-

SPSS need:-

1) The data should be in the form of numerical (i.e the numerical variable)2) A test value which is our hypothetical value to which we are going to test.

To analyze the “one-sample t-test” I have use the employees’ salaries of an organization. For this purpose, I have select the sample of “474” employees of the company.

The hypotheses are:

a) The null hypothesis states that the average salary of the employee is equal to “30,000”.

H0 : 30,000

b) The alternative hypothesis states that the average salary of the employee is not equal to “30,000”.

HA: 30,000

Method:

Enter the data in the data editor and the variable is labeled as employee's current salary. Now click on Analyze which will produce a drop down menu, choose Compare means from that and click on one-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select a variable, which is to be computed. The variable computed in our case is Current salaries of the employees. The variables can be selected for analysis by transferring them to the test variable box. Next, change the value in the test value box, which originally appears as 0, to the one against which you are testing the sample mean. In this case, this value would be 35000. Now click on OK to run the analysis.

Pictorial Representation

Analyze Compare Means One-Sample T Test Drag Test Variable (Scale) Give Test Value OK



S. Ali Raza Naqvi

SPSS output:-

One-Sample Statistics

N Mean Std. DeviationStd. Error

Mean

Current Salary 474 $34,419.57 $17,075.661 $784.311

Interpretation:-



S. Ali Raza Naqvi

In above table “N” shows the total number of observation. The average salary of total employees is “34,419.57”. The standard deviation of the data is “17,075.661”and the standard error of the mean is “784.311”.

One-Sample Test

Test Value = 30000

t df Sig. (2-tailed)Mean

Difference

95% Confidence Interval of the Difference

Lower Upper

Current Salary 5.635 473 .000 $4,419.568 $2,878.40 $5,960.73

Interpretation:-

Through above table we can observe that,

i) “T” value is positive which show that our estimated mean value is greater than actual value of mean.

ii) Degree of freedom is (N – 1) = 473.

iii) The “P-value” is “0.000” which is less than “0.05”.

iv) The difference between the estimated & actual mean is “4,419.568”.

v) Confidence interval has the lower & upper limit 2,878.4 & 5,960.73 respectively. The confidence interval limits does not contains zero.

Decision:-

On the basis of following observation I reject my “Null hypothesis” and accept the “Alternative hypothesis”. I am almost “100%” sure on my decision.

i) The “P-value” is “0.000” which is less than “0.05”.ii) The confidence interval limits does not contains zero.

Comments:-

The average salary of employees is not equal to “30,000”.

Example # 02

B) Independent t-test:-

SPSS need:-



S. Ali Raza Naqvi

1) Two variable are required one should be numerical and other should be categorical with two levels.

To analyze the “independent t-test” I have use the employees’ salaries of an organization. For this purpose, I have select the sample of “474” employees of the company containing the both males and females. In my analysis I assigned males as “m” and female as “f”.

The hypotheses are:

a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee.

H0 :

i.e

b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee.

HA :

i.e

Method:

Enter the data in the data editor and the variables are labeled as employee's beginning salary and employee's designations respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Beginning salary of the employees is the dependent variable to be analyzed and should be transferred into test variable box by clicking on the first arrow in the middle of the two boxes. Job category is the variable which will identify the groups of the employees and it should be transferred into the grouping variable box.

Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents the employees belong to clerical category and group2 represents the employees belong to the custodial category. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.


Analyze Compare Means Independent-Samples T Test Drag Test & Grouping Variable Define Groups OK



S. Ali Raza Naqvi

SPSS output:-



S. Ali Raza Naqvi

Group Statistics

Gender N Mean Std. Deviation Std. Error Mean

Current Salary Male

Female

258

216

$41,441.78

$26,031.92

$19,499.214

$7,558.021

$1,213.968

$514.258

Interpretation:-


i) Total number of male is “258” and the female is “216”.ii) The mean value of salaries of male employee is 41,441.78 & the female employee is

26,031.92.

iii) Standard deviation of salaries of male employee is 19,449.214 & the female employee is 7,558.021.

iv) Standard error of mean of salaries of male employees is 1,213.968 & the Standard error of mean of salaries of female employees is 514.258.

Independent Samples Test

Current Salary

Levene's Test for Equality of Variances

t-test for Equality of Means

F Sig. t dfSig. (2-tailed)

Mean Difference

Std. Error Difference


Lower Upper

Equal variances assumed

Equal variances not assumed

119.669 .000

10.945

11.688

472

344.262

.000

.000

$15,409.862

$15,409.862

$1,407.906

$1,318.400

$12,643.322

$12,816.728

$18,176.401

$18,002.996

Interpretation:-

In above table we have two parts (a) f-test, (b) t-test, through which we can observe that,

i) “F” value is “119.669” with significant value of “0.00” which is less than “0.05”.ii) On the basis of P-value of F-test part we assume that that the variance of the two

populations is not equal.

iii) “T” value is positive which show that the mean value of salaries of male employees is greater than the mean value of salaries of female employees



S. Ali Raza Naqvi

iv) Degree of freedom is “344.262”.

v) The “P-value” is “0.000” which is less than “0.05”.

vi) The difference between the two population mean is “15,409.862”.

vii) The standard error difference between the two population mean is “1,318.400”.

viii) Confidence interval has the lower & upper limit “12,816.728” & “18,002.996” respectively. The confidence interval limits does not contains zero.

Decision:-


i) The “P-value” is “0.000” which is less than “0.05”.ii) The confidence interval limits does not contains zero.

Comments:-

The average salaries of male & female employees are not equal.

Example # 03

C) Paired t-test:-

SPSS need:-

1) Two numerical variables are required which should be equal in numbers.

To analyze the “paired t-test” I used the begging & ending salaries of the employees’ of an organization. For this purpose, I have select the sample of “474” employees of the organization.

The hypotheses are:

a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee.

H0 :



S. Ali Raza Naqvi

i.e 0

b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee.

HA :

i.e 0

Method:

Enter the data in the data editor and the variables are labeled as employee's current and beginning salary respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on Paired-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select variables, which are to be computed. The two variables computed in our case are Current and Beginning salaries. Select these together and they will immediately appear in the box at the bottom labeled current selection. They are simultaneously highlighted in the box in which they originally appeared. Once the variables are selected the arrow at the center becomes active. The variables can be transferred to the Paired-Variables box by clicking on this arrow. They will appear in the box as Current-Beginning. Now click on OK to run the analysis.


Analyze Compare Means Paired-Samples T Test Drag Paired Variables (Scale) OK



S. Ali Raza Naqvi

SPSS output:-

Paired Samples Statistics

Mean N Std. DeviationStd. Error

MeanPair 1 Current Salary $34,419.57 474 $17,075.661 $784.311

Beginning Salary $17,016.09 474 $7,870.638 $361.510

Interpretation:-


i) The mean vale of current & beginning salary is “34,419.57” & “17,016.09” respectively.

ii) Total number of both groups is “474” individually.

iii) The standard deviation of current & beginning salary is “17,075.661” & “7,870.638” respectively.

iv) The standard error mean of current & beginning salary is “784.331” & “361.510” respectively.

Paired Samples Correlations

N Correlation Sig.Pair 1 Current Salary &

Beginning Salary 474 .880 .000

Interpretation:-


i) The total number of pair is “474”.ii) “0.88” show that the both values of group are highly co-related, which indicate that

the employees who has greater begging salary has also greater current salary.

iii) The P-value is “0.00” which is less than “0.05”.

Paired Samples Test

t df Sig. (2-tailed)Mean

Std. Deviation

Std. Error Mean


Lower Upper


Analyze


S. Ali Raza Naqvi

Pair 1 Current Salary - Beginning Salary

$17,403.481 $10,814.620 $496.732 $16,427.407 $18,379.555 35.036 473 .000

Interpretation:-

In above table we have two parts (a) f-test, (b) t-test, through which we can observe that,

i) The mean value of pair is “17,403.481”.ii) The standard deviation of pair is “10,814.620”.

iii) The standard error mean of pair is “496.732”.

iv) Confidence interval has the lower & upper limit “16,427.407” & “18,379.555” respectively. The confidence interval limits does not contains zero.

v) T- Value is “35.036”.

vi) Degree of freedom is (N-1) = “473”.

vii) P-vale is “0.00” which is less than “0.05”.

Decision:-


iii) The “P-value” is “0.000” which is less than “0.05”.iv) The confidence interval limits does not contains zero.

Comments:-

The mean difference of the two paired variables i.e. current and beginning salary is significant or not same.

One-Way ANOVA

ANOVA is a commonly used statistical method for making simultaneous comparisons between two or more population means, that yield values that can be tested to determine whether a significant relation exist between variables or not. Its simplest form is One-Way ANOVA, it involves only one dependent variable and one or more independent variables.

Data Source:C:\SPSSEVAL\Employee Data



S. Ali Raza Naqvi

Variables: Here we analyze two different variables by One-Way ANOVA, i.e.

A) Current salary of the employees.B) Employment Category.

Hypothesis:

H0: µ1 = µ2 = µ3

HA: at least one mean is not equal.

SPSS Need: SPSS need two types of variables for analyzing one-way ANOVA.

Numerical Variable (Scale). Categorical Variable (with more than two categories).

Method:

First of all enter the data in the data editor and the variables are labeled as employee's current salary and employment category respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on One-Way ANOVA, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform one-way ANOVA, transfer the dependent variable into the box labeled Dependent List and all factoring variable into the box labeled Factor. In our case Current salary is the dependent variable and should be transferred to the dependent list box by clicking on the first arrow in the middle of the two boxes. Employment Category is the factoring variable and should be transferred to the factor box by clicking on the second arrow and then click OK to run the analysis.

If the null hypothesis is rejected, ANOVA only tells us that all population means are not equal. Multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.


Analyze Compare Means One-Way ANOVA Drag Dependent List & Factors Post Hoc (Optional) OK



S. Ali Raza Naqvi

Output:

The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled between groups gives the variability due to the different designations of the employees (known reasons). The second row labeled within groups gives the variability due to random error (unknown reasons), and the third row gives the total variability. In this case, F-value is 434.481, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis and conclude that the average salary of the employees is not the same in all three categories.

Post Hoc Tests


Multiple Comparisons

Dependent Variable: Current Salary

LSD

-$3,100.349 $2,023.760 .126 -$7,077.06 $876.37

-$36,139.258* $1,228.352 .000 -$38,552.99 -$33,725.53

$3,100.349 $2,023.760 .126 -$876.37 $7,077.06

-$33,038.909* $2,244.409 .000 -$37,449.20 -$28,628.62

$36,139.258* $1,228.352 .000 $33,725.53 $38,552.99

$33,038.909* $2,244.409 .000 $28,628.62 $37,449.20

(J) Employment CategoryCustodial

Manager

Clerical

Manager

Clerical

Custodial

(I) Employment CategoryClerical

Custodial

Manager

Mean Difference(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

The mean difference is significant at the .05 level.*.

ANOVA

Current Salary

89438483925.943 2 44719241962.972 434.481 .000

48478011510.397 471 102925714.459

137916495436.340 473

Between Groups

Within Groups

Total

Sum of Squares df Mean Square F Sig.


S. Ali Raza Naqvi

The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The p-value for Clerical – Manager and Custodial – Manager comparison is shown as 0.000, whereas it is 0.126 for Clerical – Custodial comparison. This means that the average current salary of the employees between Clerical and Manager as well as Manager and Custodial are significantly different, whereas the same is not significantly different between Clerical and Custodial.

Conclusion: As our null hypothesis is rejected and we conclude that all three means are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of managers is significantly different from other two means whereas the other two means are insignificant with each other.

Two-Way ANOVA

In two-Way Analysis, we have two independent variables or known factors and we are interested in knowing their effect on the same dependent variable.

Data Source:C:\SPSSEVAL\Carpet

Variables: Here we analyze two different categorical variables with a numerical variable by Two-Way ANOVA, i.e.

A) Preference (Numerical)B) Package design (Categorical)C) Brand (Categorical)

Hypothesis:

For Brand: H0: µi = µj

HA: µi ≠ µj Ұ for all i & j

For Package: H0': µi = µj

HA': µi ≠ µj Ұ for all i & j

SPSS Need:SPSS need two types of variables for analyzing two-way ANOVA.

Numerical Variable (Scale). Two categorical Variables (with more than two levels).

Method:

First of all enter the data in the data editor and the variables are labeled as Preference, brand, package design respectively. Click on Analyze which will produce a drop down menu, choose General Linear model from that and click on Univariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform two-way ANOVA, transfer the dependent variable (Preference) into the box labeled Dependent variable and factor variable (Brand & Package) into the box labeled Fixed Factor. After defining all variables, now click on OK to run the analysis.



S. Ali Raza Naqvi

If the null hypothesis is rejected, multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.


Analyze General Linear Model UnivariateDrag Dependent Variable & Fixed Factors Post Hoc OK



S. Ali Raza Naqvi

Output:

Between-Subjects Factors

A* 9

B* 6

C* 7

K2R 7

Glory 7

Bissell 8

1.00

2.00

3.00

Packagedesign

1.00

2.00

3.00

Brandname

Value Label N

This table shows the value label under each category and the frequency of each value label. We have totaled 6 value labels under package design and brand name.

Tests of Between-Subjects Effects

Dependent Variable: Preference

a

537.231 2 268.616 16.883 .000

36.108 2 18.054 1.135 .351

206.833 13 15.910

3758.000 22

Source

package

brand

Error

Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .763 (Adjusted R Squared = .617)a.

The above table gives the test results for the analysis of two-way ANOVA. The results are given in four rows.

The first row labeled package gives the variability due to the different package design of the carpets, which may affect the customer's preferences (known reason).

The second row labeled brand gives the variability due to the different brand names (known reason).

The third row labeled error gives the variability due to random error, which also affects the customer's preferences (unknown reasons).

The fourth row gives the total variability in the customer's preferences due to both known and unknown reasons.



S. Ali Raza Naqvi

In this case, F-value for package design is 16.883, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis for package design and conclude that the average preference for all packages is not same.

Now the F-value for brand name is 1.135, and the corresponding p-value is greater than 0.05. So we can accept the null hypothesis for brand and conclude that all average brand preferences are found approximately same.

Post Hoc Tests

Package design

As our null hypothesis for package design is rejected, so multiple comparisons are used to assess that which group mean is different from the others. The above table gives the results for multiple comparisons between each value label under package design category.

The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The p-value for A* – B* and A* – C* comparison is shown as 0.000, whereas it is 0.322 for B* – C* comparison. This means that the average preference for package design between A* and B* as well as A* and C* are significantly different, whereas the same is not significantly different between B* & C*.

Conclusion: As our null hypothesis for package design is rejected and we conclude that all mean preferences for package design are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of A* is significantly different from other two means whereas the other two means are insignificant with each other.

But in the case of brand name, our null hypothesis is accepted and we conclude that all mean brand preferences are same. So there is no need for multiple comparisons in the case of brand.

Chi-Square Test

Chi-square test is a test which is commonly used to test the hypothesis regarding;

Goodness of fit test Test for Association / Independence / Attributes

It is denoted by "χ2" and its degree of freedom is "n-1", where


Multiple Comparisons

Dependent Variable: Preference

LSD

11.5556* 2.10226 .000 7.0139 16.0972

9.2698* 2.01015 .000 4.9272 13.6125

-11.5556* 2.10226 .000 -16.0972 -7.0139

-2.2857 2.21914 .322 -7.0799 2.5085

-9.2698* 2.01015 .000 -13.6125 -4.9272

2.2857 2.21914 .322 -2.5085 7.0799

(J) Package designB*

C*

A*

C*

A*

B*

(I) Package designA*

B*

C*

MeanDifference

(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

The mean difference is significant at the .05 level.*.


S. Ali Raza Naqvi

n = Number of categories

It is a positively skewed distribution so that, it has one tailed critical region on the right tail of the curve and the value of χ2 is always positive.

Chi-Square Goodness of Fit Test

Chi-Square goodness of fit test is used when the distribution is non-normal and the sample size is less than 30, so the chi-square goodness of fit test determines whether the distribution follows uniform distribution or not.


Variables: Here we are interested to analyze a numerical variable i.e.

Price (Numerical)

Hypothesis:

H0: Fit is good. (Data follows Uniform Distribution/ Prices are Uniform)HA: Fit is not good. (Data does not follow Uniform Distribution/ Prices are not uniform)

SPSS Need: SPSS need a categorical variable or a numerical variable for analyzing Chi-Square goodness of fit test.

Graphical Representation:

Quantitative Techniques in Analysis 3.503.002.502.001.501.000.50

Price

8

6

4

2

0

Fre

qu

en

cy

Mean = 2.00Std. Dev. = 0.87287N = 22


S. Ali Raza Naqvi

Explanation of Graph

From the above graph we see that our numerical variable (price) is on x-axis and its frequency on the y-axis. The mean and standard deviation of 22 observations are 2.00 and 0.87287 respectively.

The above graph clearly shows that the selected numerical variable i.e. price does not follow a normal distribution, so we use chi-square goodness of fit test to determine if the sample under investigation has been drawn from a population, which follows some specified distribution.

Method:

First of all enter the data in the data editor and the variables are labeled as price. Click on Analyze which will produce a drop down menu, choose non-parametric test from that and click on Chi-square test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to analyze. When you select the test variable, the arrow between the two boxes will now active and you can transfer the variable on the box labeled test variable list by clicking on the arrow. In this case our test variable is price and it should be transferred to the test variable box. You also click on the options button, if you are interested to know the descriptive statistics of the tested variable. Now click on OK to run the analysis.


Analyze Non-parametric test Chi-square Define test Variable list

OK



S. Ali Raza Naqvi



S. Ali Raza Naqvi

Output

Price

8 7.3 .7

6 7.3 -1.3

8 7.3 .7

22

$1.19

$1.39

$1.59

Total

Observed N Expected N Residual

First column of the above table shows the three categories in price variable.

The column labeled Observed N gives the actual number of cases falling in different categories of test variable, which is directly obtained from the data given.

The column labeled Expected N gives the expected number of cases that should fall in each category of the test variable.

The column labeled Residual gives the difference between observed and expected frequencies of each category, and it is commonly known as Error.

Test Statistics

.364

2

.834

Chi-Square a

df

Asymp. Sig.

Price

0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 7.3.

a.



S. Ali Raza Naqvi

The above table gives the test results for Chi-Square Goodness of Fit Test. In this case the chi-square value is 0.364 with a degree of freedom 2. The p-value for the test is shown as 0.834 which is greater than 0.05, so we can accept our null hypothesis that Fit is good.

Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that our null hypothesis is correct and our test variable (price) follows a uniform distribution, and we are 16.6% confident in our decision and the rejection of the null hypothesis.

Chi-Square Test for Independence

Chi-Square test for independence is used to test the hypothesis that two categorical variables are independent of each other. A small chi-square statistics indicates that the null hypothesis is correct and that the two variables are independent of each other.


Variables: Here we analyze two different categorical variables i.e.

A) Gender of the employees (Categorical)B) Designation of the employees (Categorical)

Hypothesis:

H0: Designation is independent of Sex.HA: Designation is not independent of Sex.

SPSS Need:SPSS need two categorical variables for analyzing Chi-Square test for independence.

Method:

First of all enter the data in the data editor and the variables are labeled as Gender, Designation, respectively. Click on Analyze which will produce a drop down menu, choose Descriptive Statistics from that and click on Crosstabs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to create the row of your contingency table and transfer it to the box labeled Row(s), transfer the other variable to the box labeled Column(s). In this case we transfer gender to the box labeled Row(s) and designation to the box labeled column(s). Next, click on the Statistics button, which brings up a dialogue box. Here Tick the first box labeled Chi-Square and click continue to return to the previous screen. Click on OK to run the analysis.


Analyze Descriptive Statistics Crosstabs Drag Row and Column Variables

Tick Chi-Square OK



S. Ali Raza Naqvi



S. Ali Raza Naqvi

Output

Gender * Employment Category Crosstabulation

Count

206 0 10 216

157 27 74 258

363 27 84 474

Female

Male

Gender

Total

Clerical Custodial Manager

Employment Category

Total

Cross tabulation is used to examine the variation in the categorical data, it is a cross measuring analysis. Above we are cross examine the gender and designation of the employees.

We take designation of the employees in the column and gender of the employees in row, and we have totaled 474 observations.

The results are given in two rows; the first row shows the number of female employees in each employment category.

The second row shows the number of male employees in each employment category.

Chi-Square Tests

79.277a 2 .000

474

Pearson Chi-Square

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. Theminimum expected count is 12.30.

a.

The above table gives the test results for the chi-square test for independence. The first row labeled Pearson Chi-Square shows that the value of χ2 is 79.277 with 2 degree of freedom. The two-tailed p-value is shown as 0.000, which is less than 0.05, so we can reject our null hypothesis and conclude that the Designation is not independent of Sex.

Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that the designation of the employees is not independent of the sex, and we are almost 100% confident on our decision and the rejection of the null hypothesis.

Second Approach

Consider a case in which the data is not available and only the table labeled as Gender*Employment Category Crosstabulation in the above output, is given. On the basis of the output table you can easily find out the same result as above by using SPSS weight cases options. Below we briefly explain that how to enter the data on the basis of the table and to find out the desired results.

Method



S. Ali Raza Naqvi

First of all, in the variable view of SPSS define three variables and labeled them as Gender, Employment Category and Value. Now on the data view of the SPSS, enter the data in a different manner. We see that the table contains two rows and 3 columns. In row we have two categories i.e. Female and Male; similarly in columns we have three categories i.e. Clerical, Custodial, and Manager. Now the female and male employees, both are fall in the three employment category.

So in the data view we simply define the row data i.e. Gender and its opposite define the column data i.e. Employment category and its corresponding frequencies in the Value column. The resulted data view is shown in the picture below.

.

After defining the data just click on Data, which will produce a drop down menu, choose weight cases from that, a dialogue box appears in which all the variables are on the left hand side of that box. Tick weight cases by and drag value in the box labeled Frequency Variable by clicking on the arrow between the two boxes. Now click OK to return to the previous window.

The Further process is same as described above. Just define Gender in row and Employment category in Column. Tick Chi-square by clicking on the Statistics button. Now click OK to run the analysis. When the output appears, you will see that SPSS will give you the same result as we find out earlier through data.

Regression Analysis

Regression is the relationship between selected values of independent variable and observed values of dependent variable, from which the most probable value of dependent variable can be predicted for any value of independent variable.

The use of regression to make quantitative predictions of one variable from the values of another variable is called regression analysis. There are following several types of regression, which may be used by the researcher.

Linear regression Multiple linear regression



S. Ali Raza Naqvi

Quadratic / Curvilinear regression Logistic / Binary logistic regression Multivariate logistic regression

Linear Regression

When one dependent variable depends on single independent variable then their dependency called linear regression and its model is given by

y = a + bx

Where, y is a depending variable x is a independent variable a is called the regression constant b is called the regression coefficient

Regression Coefficient

Regression coefficient is a measure of how strongly the independent variable predicts the dependent variable. There are two types of regression coefficient.

Un-standardized coefficients Standardized coefficients commonly known as Beta.

The un-standardized coefficients can be used in the equation as coefficients of different independent variables along with the constant term to predict the value of dependent variable.

The standardized coefficient is, however, measured, in standard deviations. The beta value of 2 associated with a particular independent variable indicates that a change of 1 standard deviation in that particular independent variable will result in change of 2 standard deviations in the dependent variable.


Variables: Here we are interested to analyze two numerical variables i.e.

Current salary (Numerical) Beginning salary (Numerical)

Hypothesis:

H0: Regression coefficient is zero.HA: Regression coefficient is not zero.



S. Ali Raza Naqvi

SPSS Need:SPSS need two numerical variables and both should be scaled.

Method:

The given data is entered in the data editor and the variables are labeled as current salary and beginning salary. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Linear, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent. Transfer the independent variable into the box labeled Independent(s). In our case, current salary is a dependent variable and beginning salary is an independent variable. Next we have to select the method for analysis in the box labeled Method. SPSS gives five options here: Enter, Stepwise, Remove, Forward, and Backward. In the absence of a strong theoretical reason for using a particular method, Enter should be used. The box labeled Selection variable is used if we want to restrict the analysis to cases satisfying particular selection criteria. The box labeled Case labels is used for designating a variable to identify points on plots.

After making the appropriate selections click on Statistics button. This will produce a dialogue box labeled Linear regression: Statistics. Tick against the statistics you want in the output. The Estimates option gives the estimate of regression coefficient. The Model fit option gives the fit indices for the overall model. Besides these the R-Squared change option is used to get the incremental R-square value when the models change. Other options are not commonly used. Click on the Continue button to return to the main dialogue box.

The Plots button in the main dialogue box may be used for producing histograms and normal probability plots of residual. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression.

Now click on OK in the main dialogue box to run the analysis.

Pictorial RepresentationAnalyze Regression Linear Define DV & IV Plots Tick Histogram & Normal Probability Plot OK



S. Ali Raza Naqvi



S. Ali Raza Naqvi

OUTPUT

Variables Entered/Removedb

BeginningSalary

a . Enter

Model1

VariablesEntered

VariablesRemoved Method

All requested variables entered.a.

Dependent Variable: Current Salaryb.

The above table tells us about the independent variable and the regression method used. Here we see that the independent variable i.e. beginning salary is entered for the analysis as we selected the Enter method.

Model Summaryb

.880a .775 .774 $8,115.356Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Beginning Salarya.


This table gives us the R-value, which represents the correlation between the observed values and predicted values of the dependent variable. R-Square is called the coefficient of determination and it gives the adequacy of the model. Here the value of R-Square is 0.775 that means the independent variable in the model can predict 77.5% of the variance in dependent variable. Adjusted R-Square gives the more accurate information about the model fitness if one can further adjust the model by his own.

ANOVAb

106831048750.13 1 106831048750.1 1622.118 .000a

31085446686.216 472 65858997.217

137916495436.34 473

Regression

Residual

Total

Model1

Sum of Squares df Mean Square F Sig.





S. Ali Raza Naqvi

The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled Regression gives the variability in the model due to known reasons. The second row labeled Residual gives the variability due to random error or unknown reasons. F-value in this case is 1622.118 and the p-value is given by 0.000 which is less that 0.05, so we reject our null hypothesis and conclude that the mean beginning salary is not equal to the mean current salary of the employees.

Coefficientsa

1928.206 888.680 2.170 .031

1.909 .047 .880 40.276 .000

(Constant)

Beginning Salary

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Current Salarya.

The above table gives the regression constant and coefficient and their significance. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary i.e.

Current salary = 1928.206 + (1.909) (Beginning salary)

Now we test our hypothesis, we see that the p-value for regression coefficient of beginning salary is given by 0.000, which is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero.

Charts

Histogram

Quantitative Techniques in Analysis 7.55.02.50.0-2.5-5.0

Regression Standardized Residual

125

100

75

50

25

0

Fre

qu

en

cy

Mean = -3.17E-16Std. Dev. = 0.999N = 474



S. Ali Raza Naqvi

The above histogram of standardized residuals shows the value of mean and standard deviation of the residual in the model. The mean and standard deviation is approximately 0 and 1 respectively, which shows that the fitted model is best and the chances of error is minimum.

Normal P –P Plot of Regression Standardized Residual

1.00.80.60.40.20.0

Observed Cum Prob

1.0

0.8

0.6

0.4

0.2

0.0

Ex

pe

cte

d C

um

Pro

b


The above Normal probability plot of regression standardized residual shows the regression line which touches maximum number of points presents in the model and it also shows the accuracy of the fitted model.

Scatter Plot



S. Ali Raza Naqvi

7.55.02.50.0-2.5-5.0

Regression Studentized Deleted (Press) Residual

$140,000

$120,000

$100,000

$80,000

$60,000

$40,000

$20,000

$0

Cu

rre

nt

Sa

lary


The above scatter plot also shows the adequacy of the fitted model as we can see that the data is scattered and it does not follow any particular pattern, so we can say that the fitted model has minimum chances of error.

Multiple Regression (Hierarchical Method)

Multiple regression is the most commonly used technique to assess the relationship between one dependent variable and several independent variables. There are three major types of multiple regression i.e.

Standard multiple regression. Hierarchical or Sequential regression. Stepwise or statistical regression.


Variables: Here we are interested to analyze four numerical variables i.e.

Current salary (Numerical) Beginning salary (Numerical) Educational Level (Numerical) Month since Hire (Numerical)

Hypothesis:

H0: Regression coefficients are zero.HA: Regression coefficients are not zero.

SPSS Need:SPSS need more than two numerical variables that should be scaled.



S. Ali Raza Naqvi

Method:

The method for analyzing multiple regression is same as we discuss earlier in the case of linear regression. The only change in the case of multiple regression is that we have one dependent variable along with three independent variables. Here Current salary is the dependent variable, whereas Beginning salary, Educational Level, and Month since hire are the independent variables.

So here we transfer current salary in the box labeled Dependent and beginning salary, educational level, and month since hire in the box labeled Independent(s).

The further procedure and uses of advance options for extra results are discussed earlier in the case of linear regression. Now after making appropriate selections of options for better results click on OK to run the analysis.

OUTPUT

Variables Entered/Removedb

Beginning Salarya . Enter

Educational Level (years)a . Enter

Months since Hirea . Enter

Model1

2

3

Variables EnteredVariablesRemoved Method

All requested variables entered.a.


The above table shows that beginning salary was entered in model one followed by educational level in model two followed by months since hire in model three. Note that model one includes only beginning salary as independent variable. Whereas model two includes beginning salary and educational level as independent variables, and so on model three includes beginning salary, educational level, and months since hire as independent variables. Enter method is used to assess all three models.


Model Summary

.880a .775 .774 $8,115.356 .775 1622.118 1 472 .000

.890b .792 .792 $7,796.524 .018 40.393 1 471 .000

.895c .801 .800 $7,645.998 .008 19.728 1 470 .000

Model1

2

3

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics


Predictors: (Constant), Beginning Salary, Educational Level (years)b.

Predictors: (Constant), Beginning Salary, Educational Level (years), Months since Hirec.


S. Ali Raza Naqvi

The above table shows different R-values along with change statistics for the three models in different rows. In this table we get some additional statistics under the column change statistics. Under change statistics, the first column labeled R-square change gives change in the R-square value between the three models. The last column labeled Sig. F Change tests whether there is a significant improvement in models as we introduce additional independent variables. In other words it tells us if the inclusion of additional independent variables in different steps helps in explaining significant additional variance in the dependent variable. We can see the R-square change value in row three is 0.008. This means that the inclusion of month since hire after beginning salary and educational level helps in explaining the additional 0.8% variance in the current salary of the employees. The p-value for all three models shows that our value falls in the critical region, so we can reject our null hypothesis that means regression coefficients are not zero.

Coefficientsa

1928.206 888.680 2.170 .031

1.909 .047 .880 40.276 .000

-7808.714 1753.860 -4.452 .000

1.673 .059 .771 28.423 .000

1020.390 160.550 .172 6.356 .000

-19986.5 3236.616 -6.175 .000

1.689 .058 .779 29.209 .000

966.107 157.924 .163 6.118 .000

155.701 35.055 .092 4.442 .000

(Constant)

Beginning Salary

(Constant)

Beginning Salary

Educational Level (years)

(Constant)

Beginning Salary


Months since Hire

Model1

2

3

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Current Salarya.

The above table gives the regression coefficients and related statistics for three models separately in different rows. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary of the employees for three models i.e.

MODEL 1 CS = 1928.206 + (1.909) (BS)MODEL 2 CS = -7808.714 + (1.673) (BS) + (1020.390) (EL)MODEL 3 CS = 19986.50 + (1.689) (BS) + (966.107) (EL) + (155.701) (MSH)

Now we test our hypothesis, we see that the p-value for regression coefficient in all three models is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero.



S. Ali Raza Naqvi

Conclusion: By using hierarchal or stepwise method for multiple regression we concluded that model adequacy is being increased by introducing each independent variable but the increased in adequacy by including the independent variable i.e. educational level is more than the adequacy increased by introducing the independent variable, months since hire. But as our p-value lie in the critical region so we can reject our null hypothesis and conclude that the regression coefficients for all three models are not equal to zero.

Charts:

This model also produces three diagrams for the standardized residual i.e. Histogram, Normal Probability Plot, and Scatter Plot. The charts and its interpretation are almost same as we discuss under the case of linear regression. So we are not describing these charts and its interpretations again.

Curvilinear / Quadratic Regression

The relationship between variables when the regression equation is nonlinear i.e. quadratic or higher order then their dependency called curvilinear or quadratic regression. There may be more than one dependent variable depending on one independent variable.


Variables: Here we are interested to analyze three numerical variables i.e.

Current salary (Numerical) Beginning salary (Numerical) Educational level (Numerical)

Hypothesis:

H0: Regression coefficient is zero.HA: Regression coefficient is not zero.

SPSS Need:SPSS need two numerical variables and both should be scaled.

Method:



S. Ali Raza Naqvi

The given data is entered in the data editor and the variables are labeled as current salary, beginning salary and Educational level. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Curve Estimation, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent(s). Transfer the independent variable into the box labeled Independent. In our case, current salary and Beginning salary are dependent variables and Educational level is an independent variable.

Now choose an appropriate model you want by ticking its box appearing below the window labeled Curve Estimation. In this case we choose Quadratic model by ticking its corresponding box.

The Save button can be used to save statistics like predicted values, residuals, and predicted intervals. Now click on OK in the main dialogue box to run the analysis.

Pictorial RepresentationAnalyze Regression Curve Estimation

Define DVs and IV Tick Quadratic OK



S. Ali Raza Naqvi

OUTPUT

Model Description

MOD_2

Current Salary

Beginning Salary

Quadratic


Included

Unspecified

.0001

Model Name

1

2

Dependent Variable

1Equation

Independent Variable

Constant

Variable Whose Values Label Observations inPlots

Tolerance for Entering Terms in Equations

The above table gives the description of the model. In this case we have two dependent variables i.e. Current salary and Beginning salary along with one independent variable i.e. Educational level (years).



S. Ali Raza Naqvi

Case Processing Summary

474

0

0

0

Total Cases

Excluded Casesa

Forecasted Cases

Newly Created Cases

N

Cases with a missing value in anyvariable are excluded from the analysis.

a.

The above table shows the number of cases fall in the selected model. In our case the total number of cases is 474, with no excluded or missing cases.

The above table gives the test results for the quadratic regression. R-value shows the correlation between the observed and expected values of the dependent variables. In this case the F-value is given by 337.246, with level of significance equals 0.000 which is less that 0.05. This means that our value falls in the critical region, so we can reject our null hypothesis and conclude this as the regression coefficients are not zero.

Scatter Plots


Model Summary and Parameter Estimates


.589 337.246 2 471 .000 85438.237 -12428.5 612.950EquationQuadratic

R Square F df1 df2 Sig.

Model Summary

Constant b1 b2

Parameter Estimates

The independent variable is Educational Level (years).


S. Ali Raza Naqvi

$140,000

$120,000

$100,000

$80,000

$60,000

$40,000

$20,000

$0

222018161412108


Quadratic

Observed

Current Salary

$80,000

$60,000

$40,000

$20,000

$0

222018161412108


Quadratic

Observed

Beginning Salary

The above charts for residuals of dependent variables clearly show that the residual values are not scattered and it follows a particular pattern, this means that the fitted model is not good.



S. Ali Raza Naqvi

Linker Type ScalingLinker type scaling is a method used for nominations on categorical data in order to make the categorical data meaningful, when we have to apply some statistical test on the data. Through this scaling approach the ranks assign can be treated as numerical values and its statement at once.

Suppose we have to collect the data about the awareness, preferences, usage, likeness, and dislikeness as well as the agreement with any statement that should return in qualitative form and we have to record the responses, which can be analyze statistically. So in this condition we use linker type scaling.

Data Source:

RUN \\temp\temp\Ali Raza\Mateen.sav

Hypothesis:

H0: µMale = µFemale

HA: µMale ≠ µFemale

Variables:

Here we are interested to analyze two categorical variables i.e.

Gender. (Categorical) Preference of cellular service with respect to network coverage.

(Categorical but treated as Numerical)

Here we consider the preference of cellular service as a numerical variable and statistically test the hypothesis that Mean preference of male and female over cellular service with respect to network coverage is same. The method, we use to test the above hypothesis is Independent samples t-test.

Method:

Enter the data in the data editor and the variables are labeled as Gender and preference. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Preference is the dependent variable to be analyzed and should be transferred into test variable box. Gender is the variable which will identify the groups and it should be transferred into the grouping variable box.

Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents Male and group2 represents female. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.



S. Ali Raza Naqvi

Pictorial RepresentationAnalyze Compare Means Independent-Samples T TestDrag Test & Grouping Variable Define Groups OK



S. Ali Raza Naqvi

OUTPUT

Group Statistics

100 4.12 .891 .089

100 4.32 .618 .062

Gender contain Bothmale and femaleMale

Female

Wide network coveragemotivate the induvidualto prefer a particularcellulare service

N Mean Std. DeviationStd. Error

Mean

This table contains the descriptive statistics for both groups. We have taken 200 observations for the independent samples t-Test in which 100 belongs to male category and 100 to female category. The column labeled Mean shows that the mean preferences of cellular service with respect to network coverage for both groups are approximately 4. This means that both groups are Agree that wide network coverage motivate the individual to prefer a particular cellular service.

The above table contains the test statistics for independent samples t-test.

Levene's Test: The table contains two sets of analysis, the first one assuming equal variances in the two groups and the second one assuming unequal variances. The Levene's test tells us which statistic to consider analyzing the equality of means. The p-value for Levene's test is given by 0.10, which is greater than 0.05. Therefore, the statistic associated with equal variances assumed should be used for the t-test for equality of means of two independent populations.

P-Value: shows that the value of our test statistic does not fall in the critical region i.e. 0.067 > 0.05 so we can accept our Null Hypothesis i.e. µMale = µFemale


Independent Samples Test

2.730 .100 -1.845 198 .067 -.200 .108 -.414 .014

-1.845 176.31 .067 -.200 .108 -.414 .014

Equal variancesassumed

Equal variancesnot assumed

Wide network coveragemotivate the induvidualto prefer a particularcellulare service

F Sig.

Levene'sTest for

Equality ofVariances

t dfSig.

(2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95%Confidence

Interval of theDifference

t-test for Equality of Means


S. Ali Raza Naqvi

Conclusion: The test results are statistically significant at 95% confidence level and the data provide sufficient evidence to conclude that the mean preference of cellular service with respect to network coverage for male and female is same and there is only 6.7% chance of rejecting a true Null Hypothesis and we are 93.3% confident in our decision.

Reliability AnalysisReliability analysis is applied to check the reliability of the data, that whether the conclusions and the analysis perform for the data are reliable to understand and forecast. One way to ideally measure reliability is by the test-retest method. However, establishing reliability through test-retest is practically very difficult.

Some of the commonly used techniques for assessing reliability include Cohen's Kappa Coefficient for categorical data and Cronbach's Alpha for internal reliability of the data set.

Data Source:C:\SPSSEVAL\Home Sales [By Neighborhood].sav

Variables:Here we are interested to check the reliability of data set, which includes five numerical variables i.e.

Appraised land Value. Appraised value of improvements. Total Appraised Value. Sales Price. Ratio of Sales price to total Appraised Value.

Note that the data contains one String variable labeled as Neighborhood; we deleted this variable because SPSS does not check the reliability, if the data contains any String or Blank variable.

SPSS Need:For reliability analysis through SPSS, one can use any variable of any nature except the String and Blank Variables.

Method:

Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Scale from that and click on Reliability Analysis, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the reliability analysis, transfer the variables in the box labeled Items by clicking on the arrow between the two boxes. In this case, we have five numerical variables in the data set that should be transferred to the Items box.



S. Ali Raza Naqvi

Choose appropriate Model by clicking on that box, here we choose Alpha as model. Now click on Statistics button, a dialogue box appears. Tick the corresponding box which you want to analyze in the output. Now click Continue to return to the main dialogue box. Click on OK to run the analysis.

Pictorial RepresentationAnalyze Scale Reliability AnalysisDrag Items Choose Model Give Statistics OK



S. Ali Raza Naqvi

OUTPUT


2440 100.0

0 .0

2440 100.0

Valid

Excludeda

Total

CasesN %

Listwise deletion based on allvariables in the procedure.

a.

The above table shows the total number of cases fall in the data set. We have 2440 observations with no missing and excluded cases.



S. Ali Raza Naqvi

Reliability Statistics

.576 5

Cronbach'sAlpha N of Items

The above table shows the test results for the reliability analysis. The value of Cronbach's Alpha is given by 0.576; the number of items in the data set is 5. The value associated with Alpha is said to be Poor and the conclusions draw from this data is not reliable to understand and forecast.

Item-Total Statistics

164151.7603 5533196618 .688 .480

140212.2148 4646009801 .505 .438

132761.3423 5160815037 .314 .533

106454.2587 1928523141 .565 .477

181191.6111 6801537138 -.032 .615

Appraised Land Value

Appraised Value ofImprovements

Total Appraised Value

Sale Price

Ratio of Sale Price toTotal Appraised Value

Scale Mean ifItem Deleted

ScaleVariance if

Item Deleted

CorrectedItem-TotalCorrelation

Cronbach'sAlpha if Item

Deleted

The above table shows the statistics associated with each item. The last column of the table shows the improvement in the value of Alpha, if the corresponding item is deleted from the data set. Now the value associated with the top four items in the data set is less than the current value of Alpha which is 0.576, that means if one of these items is deleted, the value of Cronbach's Alpha is become worst. But the value associated with the item labeled Ratio of sale price to total appraised value is given by 0.615. This means that if this item is deleted from the analysis and retests the reliability of the entire data, the value of Cronbach's Alpha becomes 0.615. So, in order to improve the value of Alpha to make our data set more reliable we delete the last item and retest the value of our Cronbach's Alpha.

Reliability Statistics

.615 4

Cronbach'sAlpha N of Items

Here we retest our data after deletion of one item and our new value of Alpha is given by 0.615. Now the total number of items in the entire data set is 4. The value associated with Alpha in this set of reliability statistics is said to be Acceptable and the conclusions draw from this data is reliable to understand and forecast.



S. Ali Raza Naqvi

Item-Total Statistics

164150.57 5533198039 .688 .540

140211.03 4646008210 .505 .493

132760.16 5160813863 .314 .599

106453.07 1928530335 .565 .536

Appraised Land Value

Appraised Value ofImprovements

Total Appraised Value

Sale Price

Scale Mean ifItem Deleted

ScaleVariance if

Item Deleted

CorrectedItem-TotalCorrelation

Cronbach'sAlpha if Item

Deleted

This table shows that if we delete any other item from the data set and retest the reliability, then our value of Alpha becomes Poor. Because all the values associated with the remaining four items in last column of the above table is less than the current value of our Cronbach's Alpha i.e. 0.615. So we don’t need to further retest the reliability of the data set, which means the data is reliable at the current value of our Cronbach's Alpha.

Correlation AnalysisCorrelation refers to the degree of relation between two numerical variables. It is denoted by "r", which is typically known as Correlation Coefficient.

Correlation CoefficientThe Correlation coefficient gives the mathematical value for measuring the strength of the linear relation between two variables. Mathematically the value of "r" always lay between -1 and 1 with:

(a) +1 representing absolute positive linear relationship (as X increases, Y increases).

(b) 0 representing no linear relationship (X and Y have no pattern).(c) -1 representing absolute inverse relationship (as X increases, Y, Decreases).

Bivariate CorrelationBivariate correlation tests the strength of relationship between two variables without giving any consideration to the interference some other variables might cause to the



S. Ali Raza Naqvi

relationship between the two variables being tested. For example, while testing the correlation between the Current and Beginning salary of the employees, bivariate correlation will not consider the impact of some other variables like Educational Level and Previous Experience of the employees. In such cases, a bivariate analysis may show us a strong relationship between Current and Beginning salary; but in reality, this strong relationship could be the result of some other extraneous factors like Educational Level and Previous Experience etc.

Data Source:

C:\SPSSEVAL\Employee data

Hypothesis:

H0: There is no Correlation between Variables (r =0)HA: There is some Correlation between Variables (r ≠ 0)

Variables:Here we are interested to analyze three numerical variables i.e.

Current salary (Numerical) Beginning salary (Numerical) Educational Level (years) (Numerical)

Technically correlation analysis can be run with any kind of data, but the output will be of no use if a correlation is run on a categorical variable with more than two categories. For example, in a data set, if the respondents are categorized according to nationalities and religions, correlation between these variables is meaningless.

SPSS Need:

SPSS need two or more numerical variables to perform Correlation Analysis.

Method:

Firstly the data is entered in the data editor and the variables are labeled as Current salary, Beginning salary, Educational Level, and Previous Experience. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Bivariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the bivariate correlation, choose the variables for which the correlation is to be studied from the left-hand side box and move them to the right-hand side box labeled Variables. Once any two variables transferred to the variables box, the OK button becomes active.

In our case we will transfer four numerical variables i.e. Current salary, Beginning salary, Educational Level, and Previous Experience, to the right-hand side box labeled as Variables. There are some default selections at the bottom of the window; that can be



S. Ali Raza Naqvi

change by clicking on the appropriate boxes. For our purpose, we will use the most commonly used Pearson's Coefficient.

Next, while choosing between one-tailed and two-tailed test of significance, we have to see if we are making any directional prediction. The one-tailed test is appropriate if we are making predictions about a positive or negative relationship between the variables; however the two-tailed test should be used if there is no prediction about the direction of relationship between the variables to be tested. Finally Flag Significant Correlations asks SPSS to print an asterisk next to each correlation that is significant at the 0.05 significance level and two asterisks next to each correlation that is significant at the 0.01 significant level, so that the output can be read easily. The default selections will serve the purpose for the problem at hand. We may choose Means and Standard Deviations from the Options button if we wish to compute these figures for the given data. After making appropriate selections, click on OK to run the analysis.

Pictorial Representation Analyze Correlate Bivariate

Define Variables Choose appropriate options OK



S. Ali Raza Naqvi

OUTPUT



S. Ali Raza Naqvi

Correlations

1 .880** .661**

.000 .000

474 474 474

.880** 1 .633**

.000 .000

474 474 474

.661** .633** 1

.000 .000

474 474 474

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Current Salary

Beginning Salary


Current SalaryBeginning

SalaryEducationalLevel (years)

Correlation is significant at the 0.01 level (2-tailed).**.

The above table gives the correlation for all pairs of variables and each correlation is produced twice in the matrix. So here we get following 3 correlations for the given data.

Current salary and Beginning salary Current salary and Educational level Beginning salary and Educational level

The value of correlation coefficient is 1 in the cells where SPSS compare two same variables (Current salary and Current salary and so on). This means that there is a perfect positive correlation between the variables.

In each cell of the correlation matrix, we get Pearson's correlation coefficient, p-value for two-tailed test of significance and the sample size. From the output we can see that the correlation coefficient between Current salary and Beginning salary is 0.88 and the p-value for two-tailed test of significance is less than 0.05. From these figures we can conclude that there is a strong positive correlation between Current salary and beginning salary and that this correlation is significant at the significance level of 0.01.

Similarly, the correlation coefficient for Current salary and Educational level is 0.661. So there is a moderate positive correlation between these variables.

The correlation coefficient for Beginning salary and Educational level is 0.633 and its p-value is given by 0.000, so we can reject our null hypothesis and conclude this as there is some correlation between these two variables.

Conclusion: At 1% level of significance all variables are significantly correlated with each other. In this case our null hypothesis is rejected that there is no correlation between the variables for all pairs of variables. We can conclude this as there is some correlation present between all variables in the given data.

Partial Correlation



S. Ali Raza Naqvi

Partial correlation allows us to examine the correlation between two variables while controlling for the effects of one or more of the additional variables without throwing out any of the data.

In other words, it is the degree of relation between the dependent variable and one of the independent variable by controlling the effect of other independent variables, because we know that, in a multiple regression model, one dependent variable depends on two or more independent variables.

Data Source:

C:\SPSSEVAL\Employee data

Hypothesis:

H0: There is no Correlation between Variables (r =0)HA: There is some Correlation between Variables (r ≠ 0)

Variables:Here we are interested to analyze two numerical variables, while controlling one additional variable.

Current salary Beginning salary Educational level (Control variable)

SPSS Need:

SPSS need two or more numerical variables to perform partial Correlation.

Method:

Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Partial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the partial correlation, transfer the variables for which you want to know the correlation between them in the box labeled Variables, while controlling the effect of one or more additional variables by transferring them to the box labeled Controlling for.

In our case, we want to find the correlation between Current salary and beginning salary of the employees, so these variables should be transferred to the box labeled Variables, while controlling for the effect of Educational level and Previous experience of the employees by transferring them to the box labeled Controlling for. Now click on OK to run the analysis.




S. Ali Raza Naqvi

Analyze Correlate PartialDrag Variables Drag Controlling Variables OK



S. Ali Raza Naqvi

OUTPUT

Correlations

1.000 .795

. .000

0 471

.795 1.000

.000 .

471 0

Correlation

Significance (2-tailed)

df

Correlation

Significance (2-tailed)

df

Current Salary

Beginning Salary

Control VariablesEducational Level (years)

Current SalaryBeginning

Salary

The above table shows the test results for the partial correlation between Current salary and beginning salary of the employees. The variable we are controlling for in the analysis is Educational level, and it is shown in the left-hand side of the table.

We can see that the correlation coefficient between Current salary and Beginning salary is 0.795, which is considerably smaller as compared to 0.88 in case of Bivariate. This means that both the variables are still have positive correlation, but the value of correlation coefficient decreased if we control for the Educational level of the employees and the variables are no longer strongly correlated with each other.

Conclusion: The test results are significant at 5% level of significance and the data provide sufficient evidence to conclude that there is some correlation present between the Current salary and Beginning salary of the employees, but it is considerably smaller in the case of partial correlation than in case of bivariate correlation.

Logistic Regression

Logistic regression starts in 1700. If a categorical variable depends on any numerical or categorical variable then their dependency may called the logistic regression. It is used to predict a discrete outcome based on variables may be discrete, continuous, or mixed. Thus when the dependent variable is categorical with two or more than two discrete outcomes, logistic regression is a commonly used technique. It has the following two types:

Binary logistic regression / Logit Multinomial logistic regression

Coefficient of Logistic Regression

Logistic regression computes the log odds for a particular outcome. The odds of an outcome is given by the ratio of the probability of it happening and not happening as [P / (1-P)], where P is the probability of an event. There are some mathematical problems in reporting these odds, so natural logarithms of these odds are calculated. A positive value indicates that odds are in favor of the event and the event is likely to occur while a negative



S. Ali Raza Naqvi

value indicates that odds are against the events and the event is not likely to occur. The formula to do so may be written either

Binary Logistic Regression

If a categorical variable having only two levels E.g. Male and Female, Yes or No, Good and Bad etc, and it is depending on different categorical or numerical independent variables then their relation can be referred as Binary logistic regression. The expression for binary logistic regression may be given as

Y = b0 + b1X1 + b2X2 + ………..bkXk

Data Source:C:\SPSSEVAL\AML Survival

Hypothesis:

H0: Regression coefficients are zeroHA: Regression coefficients are not zero

Variables: Here we are interested to analyze three Different variables i.e.

Status (Categorical) Time (Numerical) Chemotherapy (Categorical)

Here Status is our dependent variable depending on Time and Chemotherapy. As in this case our dependent variable is categorical having only two levels i.e. Censored and Relapsed, so we use binary logistic regression to analyze the dependency between the variables.

SPSS Need:

SPSS need one dependent variable and it must be Categorical, while the independent variables can be categorical as well as numerical.

Method:



S. Ali Raza Naqvi

Firstly the data is entered in the data editor and the variables are labeled as Status, Time, and Chemotherapy. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Binary Logistics, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binary Logistics Regression, transfer the dependent variable in the box labeled Dependent and the independent variables in the box labeled Covariates.

In our case, Status is an only dependent variable and should be transfer to the box labeled Dependent. Time and Chemotherapy are independent variables and should be transfer to the box labeled Covariates.

Next we have to select the method of for analysis in the box labeled Method. SPSS gives seven options, of which the Enter method is most commonly used. For common purpose one does not need to use the Save and Options buttons. Advance users may experiment with these. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression.

After making appropriate selections, click on OK to run the analysis.

Pictorial Representation Analyze Regression Binary Logistic Drag Dependent Drag Covariates OK



S. Ali Raza Naqvi

OUTPUT



S. Ali Raza Naqvi


23 100.0

0 .0

23 100.0

0 .0

23 100.0

Unweighted Casesa

Included in Analysis

Missing Cases

Total

Selected Cases

Unselected Cases

Total

N Percent

If weight is in effect, see classification table for the totalnumber of cases.

a.

The above table gives the description of cases selected for the analysis. We have totaled 23 cases included in the analysis with no missing and unselected cases.

Dependent Variable Encoding

0

1

Original ValueCensored

Relapsed

Internal Value

The above table shows that how the two outcomes or two levels of Status i.e. Censored and Relapsed have been coded by SPSS.

Block 0: Beginning Block

Classification Tablea,b

0 5 .0

0 18 100.0

78.3

ObservedCensored

Relapsed

Status

Overall Percentage

Step 0Censored Relapsed

Status PercentageCorrect

Predicted

Constant is included in the model.a.

The cut value is .500b.

The above table shows the observed or actual number of cases fall in each category of the dependent variable. The last column labeled Percentage Correct shows that our model can predict 0% status of the censored patients and 100% status for the relapsed patients. Overall, our model can predict 78.3% status of the patients.



S. Ali Raza Naqvi

Block 1: Method = Enter

Omnibus Tests of Model Coefficients

4.609 2 .100

4.609 2 .100

4.609 2 .100

Step

Block

Model

Step 1Chi-square df Sig.

The above table reports significance levels by the traditional chi-square method. It tests if the model with the predictors is significantly different from the model. The omnibus test may be interpreted as a test of the capability of all predictors in the model jointly to predict the response (dependent) variable. A finding of significance, as in the illustration above, corresponds to the a research conclusion that there is adequate fit of the data to the model, meaning that at least one of the predictors is significantly related to the response variable. In the illustration above, the Enter method is used (all model terms are entered in one step), so there is no difference for Step, Block, or Model, but in a stepwise procedure one would see results for each step.

Model Summary

19.476a .182 .280Step1

-2 Loglikelihood

Cox & SnellR Square

NagelkerkeR Square

Estimation terminated at iteration number 5 becauseparameter estimates changed by less than .001.

a.

The above table gives the Cox & Snell R-Square value, which gives an approximation about how much variance in the dependent variable can be explained with the hypothesized model. In this case Time and Chemotherapy can explain 18.2% of the patient's current Status.

Classification Tablea

1 4 20.0

0 18 100.0

82.6

ObservedCensored

Relapsed

Status

Overall Percentage

Step 1Censored Relapsed

Status PercentageCorrect

Predicted

The cut value is .500a.

The above Classification table summarizes the results of our predictions about patient's Status based on Time and Chemotherapy. We can see that our model can correctly predict 20% status of censored patients and 100% status of the relapsed patients. Overall, our model predicts 82.6% status of the patients.



S. Ali Raza Naqvi

Variables in the Equation

-1.498 1.262 1.409 1 .235 .224

-.024 .024 1.055 1 .304 .976

2.962 1.207 6.025 1 .014 19.332

chemo

time

Constant

Step1

a

B S.E. Wald df Sig. Exp(B)

Variable(s) entered on step 1: chemo, time.a.

The above table gives the Beta coefficients for the independent variables along with their significance. Negative beta coefficients for time and chemotherapy mean that with increasing chemotherapy and time of the treatment, it chances for the patient of having a relapsed status. Same as Multiple linear regression models, we can construct an OLS equation for the status of the patient by the help of above regression constant and coefficients. The expression for status of the patient is given by:

Status = 2.962 + (-1.498) (Chemotherapy) + (-0.024) (Time)

The last column labeled Exp(B) takes a value of more than one, if the beta coefficients are positive and less than one, if it is negative. In our case, the beta coefficients for Chemotherapy and time are negative, so coefficients are having the values of less than one in column labeled Exp(B). A value of 0.976 for Time indicates that for 1 week increase in the treatment, the odds of a patient having a relapsed status increases by a factor of 0.976. These values can also use to construct an equation for the odds of a patient, and it is given by:

P = 1

1 + e-(19.332 + 0.224 C + 0.976 T)

Non-Parametric Tests

Non-Parametric tests are used to test the hypothesis regarding the population parameters of non-normal data with small sample size (less than 30). These tests are sometimes also referred as "Distribution-Free tests"

Binomial Test

Binomial tests are used to test the hypothesis regarding the population proportion. It runs on a categorical variable having two levels only.



S. Ali Raza Naqvi


Hypothesis:

H0: P = 0.5HA: P ≠ 0.5

Variables:

Here we are interested to analyze a categorical variable i.e. House keeping Seal. In our case a superstore owner claims that 50% of their customers got house keeping seal on the purchase of the product.

SPSS Need:SPSS need one categorical variable (2 levels only).

Method:

Firstly the data is entered in the data editor and the variable is labeled as House keeping seal. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Binomial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binomial test, transfer the test variable in the box labeled Test variable list. In our case House keeping seal is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now give the test value in the box below labeled as Test Proportion. In our case the test value is 0.50.



Analyze Non-Parametric tests Binomial Drag test Variable Give test Proportion OK



S. Ali Raza Naqvi

OUTPUT



S. Ali Raza Naqvi

NPar Tests

Binomial Test

Yes 8 .36 .50 .286

No 14 .64

22 1.00

Group 1

Group 2

Total

Good Housekeeping sealCategory N

ObservedProp. Test Prop.

Exact Sig.(2-tailed)

The above table gives the test results for the Binomial Non-parametric test.

The first column labeled Category gives the two categories (Yes or No) of the test variable i.e. Good House keeping seal.

The second column labeled as N gives the total number cases analyzed, and also the number of cases fall in each category of our test variable. In this case we have selected the sample of 22 persons out of which 8 persons say Yes they got the house keeping seal and the remaining says No.

The third column labeled as Observed Proportion gives the percentage of the persons saying Yes or No. 36% individuals says Yes they got the house keeping seal while 64% individuals says No.

The last column gives the p-value for the 2-tailed test and it is given by 0.286, which is greater than 0.05, so we can accept our null hypothesis and conclude that the claim of the superstore owner is correct, the proportion is 0.50.



S. Ali Raza Naqvi

Runs Test

Runs test is used to test the randomness of the data. This test is best run, if the test variable is numerical. The word RUNS refer the number of time sign is changed.

Data Source:

C:\SPSSEVAL\Carpet

Hypothesis:

H0: Data is randomHA: Data is not random

Variables:

Here we are interested to analyze a numerical variable i.e. Preference.

SPSS Need:

SPSS need a numerical variable with small sample size.

Method:

Firstly the data is entered in the data editor and the variable is labeled as Preference. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Runs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Run test, transfer the test variable in the box labeled Test variable list. In our case Preference is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now in our case the test variable i.e. Preference is a numerical variable, so in the Section labeled Cut Point, we tick the box Median, but if the test variable is categorical, it is appropriate to calculate its Mean by ticking its corresponding box.




S. Ali Raza Naqvi


Analyze Non-Parametric tests Runs Drag test Variable Tick Box (Median) OK



S. Ali Raza Naqvi

OUTPUT

Runs Test

11.50

11

11

22

13

.218

.827

Test Valuea

Cases < Test Value

Cases >= Test Value

Total Cases

Number of Runs

Z

Asymp. Sig. (2-tailed)

Preference

Mediana.

The above table gives the test results for Runs Test. The first row labeled as Test Value gives the Median of the data. In this case out of 22 observations 11 values is less than our median or in other words those values having a negative sign, while the remaining values having a positive sign. The row labeled Number of Runs gives a value 13; this means in the given data 13 times a sign is changed. The last row gives the p-value for the Runs test and it is given by 0.827 > 0.05, so we can accept our null hypothesis and conclude that the Data is Random.

Representation of Runs:



S. Ali Raza Naqvi

One Sample K-S Test

One sample K-S Test is used to test the goodness of fit of any specific distribution for the given data. This distribution is called "Kolmogrov-Smirnov Z" commonly known as "Non-parametric Chi-square".

Data Source:

C:\SPSSEVAL\Carpet

Hypothesis:

H0: Fit is Good (Data follows the fitted distribution)

HA: Fit is not Good (Data does not follow the fitted distribution)

Variables:

Here we are interested to analyze a numerical variable i.e. Price.



S. Ali Raza Naqvi

SPSS Need:

SPSS need a numerical variable with small sample size.

Method:

Firstly the data is entered in the data editor and the variable is labeled as Price. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on 1-sample K-S, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the K-S test, transfer the test variable in the box labeled Test variable list. In our case Price is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now tick the box in the section labeled Test Distribution at the bottom of the dialogue box. In our case the fitted distribution is Poisson, so we tick the corresponding box labeled Poisson.



Analyze Non-Parametric tests Runs Drag test Variable Tick Box (Median) OK



S. Ali Raza Naqvi

OUTPUT

One-Sample Kolmogorov-Smirnov Test

22

2.0000

.143

.143

-.135

.670

.760

N

MeanPoisson Parametera,b

Absolute

Positive

Negative

Most ExtremeDifferences

Kolmogorov-Smirnov Z

Asymp. Sig. (2-tailed)

Price

Test distribution is Poisson.a.

Calculated from data.b.

The above table gives the test results for the one sample K-S test. We have taken 22 observations for the analysis. The mean of Poisson distribution calculated from data is given by 2.

The row labeled Absolute gives the difference between extreme values i.e. extremely high values and Extremely Low values and it given by 0.143.



S. Ali Raza Naqvi

The row labeled Positive gives the difference between the Maximum and Minimum values, when we subtract Minimum value from maximum value and it is 0.143.

The row labeled Negative also gives the same as row labeled Positive, but here we subtract Maximum value from the Minimum value, so the resulted value is given by (-0.135).

The Kolmogrov-Smirnov Z value is given by 0.67, which calculated from the formula.

The last row gives the p-value for the analysis. In our case the p-value is given by 0.76, and it is greater than 0.05. So we can accept our null hypothesis and conclude this test as the Fit is Good and the data follows the Poisson distribution.


Download - Theory of Decision

Top Related