workshop on spss: basic to intermediate level

SPSS: Basic to Intermediate

Hiram Ting & Ernest Cyril de Run16-17 May 2015, Kuching

Organized by Sarawak Research Society

Acknowledgement

Gratitude to Prof Ernest Cyril de Run and

Prof Thurasamy Ramayah for providing useful information

during the preparation of the workshop slides.

Content

Installation of SPSS

Introduction to SPSS

Understanding of Analysis

Preliminary Decision

Data Entry

Data Cleaning

Frequency

Cross-tabulation

Normality Test

Reliability Test

Validity Test

Handling Qualitative Data

Test of Independence

Test for Goodness of Fit

Test of Difference

• T-test

• ANOVA

Test of Relationship

• Pearson Correlation

• Linear Regressions

• Multiple Regressions

Factor Analysis

Presentation of Findings

Syntax

Preparation

• Install SPSS.

• Download workshop materials folder.

• Open SPSS Workshop 16-17 May 2015 file in the folder.

• Open SPSS to check whether it works as a full version.

Preparation

Hands-on Exercise

• Install SPSS (set-up)

• Click ‘OK’ for every step.

• Copy and paste license number, or

• Copy and paste crack files in your program folder.


What is SPSS?

• Statistical Package for the Social Sciences (SPSS) is a widely used

program for statistical analysis in social sciences. It is used by

market researchers, health researchers, survey companies,

government, education researchers, marketing organizations, data

miners and others. It is regarded as the first generation technique.


What is SPSS?

Statistics included in the software:

• Descriptive statistics: Cross tabulation, Frequencies, Descriptives,

Explore, Descriptive Ratio Statistics.

• Bivariate statistics: Means, t-test, ANOVA, Correlation.

• Prediction for numerical outcomes: Linear regression.

• Prediction for identifying groups: Factor analysis, cluster

analysis, Discriminant analysis.

• Non-parametric tests and others.

When is SPSS Useful

SPSS is useful for:

• Data entry

• Data cleaning

• Descriptive analysis and output

• Parametric and non-parametric test – tests of relationship and

difference

• Data division based on factors and groups

• Quantitative research with observed variables

• Qualitative research with coded themes


Before using SPSS, it is important to understand some of the

fundamental things in research and data analysis techniques.

• Types of data

• Levels of measurement

• Types of variable

• Key terms in research

• Types of analysis

• Missing values


Types of data

• Numeric

• String (Categorical)

Levels of measurement

• Nominal

• Ordinal

• Interval

• Ratio

• Continuous


Types of variable

• Independent

• Dependent

• Moderating

• Mediating

• Control

• Endogenous, exogenous


A

B

C

D G

F

E

J1

J2

J3

J4 H I


Key terms in research

• Theory

• Concept

• Construct

• Variable, item/indicator

• Model, framework

• Operational definition


• A theory of systematically interrelated concepts, definitions, and

propositions that are advanced to explain and predict phenomena

(facts).

• A model is defined as a representation of a system that is

constructed to study some aspect of the system or the system as a

whole.

• Theory’s role is explanation whereas a model’s role is

representation.

• While theoretical framework is the theory on which the study is

based, conceptual framework is the operationalization of the theory.

It is the researcher’s own position on the problem and gives

direction to the study.


• A concept is a generally accepted collection of meanings or

characteristics associated with certain events, objects, conditions,

situations and behavior.

• A construct is an image or abstract idea specifically invented for a

given research and/or theory building purpose.

• A variable can be defined as any aspect of a theory that can vary or

change as part of the interaction within the theory.

• An operational definition is a definition stated in terms of specific

criteria for testing or measurement. Their characteristics and how

they are to be observed must be specified.


Types of Analysis

• Parametric

Normal distribution is assumed

• Non-parametric

Distribution free

• Types of variables involved

Univariate, bivariate, multivariate


Handling blank responses/missing values

• Initial screening

If the whole page is missing, discard the questionnaire.

If the whole section is missing, discard the questionnaire.

If important responses are missing (e.g. key questions using single

item), discard the questionnaire.

If straight-lining or answering pattern is found, discard the

questionnaire.


Handling blank responses/missing values

• Data cleaning

If > 25% missing, remove the observation.

Hair et al. (2014) advocate for less than15%.

Using the midpoint of the scale.

Replacing blank responses with a value.

Mean of those responding or respondents.

Using Expected Maximization (EM).


Instrument design

• Levels of measurement

• Types of scale

• Single or multiple items

• Positive or negative worded statements

• Structured, semi-structured or unstructured

• Wordings (e.g. double negatives, double-barrelled, culture-specific

terms, long complex questions)


Distribution and collection of data

Sampling technique

Paper questionnaire, mail or online

Interview or self-administered questionnaire

Response rate: distributed, collected and usable copies


In any report the first thing that is normally reported is the response

rates. When the response rate is low it raises question about the

representativeness of the sample. Another reason is the problem of non

response. Would the responses of those who have not responded be

different form those who responded?


Data analysis and interpretation

Confidence level

Significant level

One-tailed or two-tailed

Types of analytical method

Hypothesis development and testing


Addressing Errors

Random sampling errors

Systematic errors/non-sampling errors

• Administrative errors: sample selection, administrator, data

processing

• Respondent errors: non-response, response bias - deliberate

falsification, unconscious misinterpretation

Common method variance and social desirability


Pre-test

The purpose is to ensure instrument is well-designed, hence the

statements/questions would be understood and responded to the

manner which they were developed for.

Using pilot study.

Using debriefing or protocol method.

Issue with sample size.

Data Entry

An Overview

• SPSS Data Editor

• SPSS Viewer (Output)

• Variable View

Includes Name, Type, Width, Decimals, Label, Values, Missing,

Columns, Align, Measure

• Data View

Data Entry

Data Entry

Rules for naming of variables

• Variable names:

• Must be unique (i.e. each variable must have a different name)

• Must begin with a letter (not a number)

• Cannot include full stops, spaces or symbols (! , ? * “)

• Cannot include words used as commands by SPSS (all, ne, eq, to, le,

lt, by, or, gt, and, not, ge, with)

• Cannot exceed 64 characters.

Data Entry

Hands-on Exercise

• Open SPSS.

• Open Questionnaire Sample.

• Begin with ‘Variable View’, fill up the first row with information

provided in Data Entry Exercise.

• Continue with the second and third rows.

• Continue with the fourth to sixth rows.

• Move to ‘Data View’, fill up the blanks with responses of five

respondents.

Data Cleaning

Hands-on Exercise

• Go to ‘Analyze’, click ‘Descriptive Statistics’ and ‘Frequencies’.

• Move every variable from left column to right column, click ‘OK’.

• Read the output and check.

• Addressing missing values using EM.

Useful Features

Hands-on Exercise

• Sort the data file

Go to ‘Data’, click ‘Sort Cases’, choose ‘Ascending’ or ‘Descending’

• Split the data file

Go to ‘Data’, click ‘Split File’ and ‘Compare Group’

• Select cases

Go to ‘Data’, click ‘Select Cases’, ‘If Condition is Satisfied’ and ‘If’. For

example, GEN = 1 to select only male respondents

Useful Features

Data Transformation

Reason for transformation

to improve interpretation and compatibility with other data sets

to enhance symmetry and stabilize spread

improve linear relationship between the variables (Standardized

score)

Data Transformation

Hands-on Exercise

• Recode

The purpose is to

redefine categories of

data.

Go to ‘Transform’, click

‘Recode into Different

Variables’.

Data Transformation

Hands-on Exercise

• Compute

The purpose is to

create a new variable.

Go to ‘Transform’, click

‘Compute Variable’.

Descriptive Analysis

• The purpose is to describe the distribution of the variable of interest.

• It includes Frequencies and Cross-tabulation for nominal or

categorical data, and Descriptives (Mean and Standard Deviation) for

continuous data.

Frequencies

• The purpose is to provide frequency counts. It is useful in presenting

respondents profile and categorical findings.

Hands-on Exercise

• Open Data Analysis Exercise

• Go to ‘Analyze’, click ‘Descriptive Statistics’ and ‘Frequencies’.

• Splitting dataset is useful when presenting findings based on

categories in separation. Go to ‘Data’, click ‘Split File’

SAMPLE

Cross-tabulation

• The purpose is a joint frequency distribution of cases based on two or

more categorical variables.

• Chi-square will be explained in later slides.

Hands-on Exercise

• Go to ‘Analyze’, click ‘Descriptive Statistics’ and ‘Crosstabs’. Select

the variables on for ‘Row’ and ‘Column’. In ‘Cell’, click ‘Percentages’.

Cross-tabulation

Descriptives

• The purpose is to provide statistical summary of descriptive findings.

• ‘Kurtosis’ and ‘Skewness’ are useful to assess data distribution.

Hands-on Exercise

• Go to ‘Analyze’, click ‘Descriptive Statistics’ and ‘Descriptives’.

• Click ‘Option’, check ‘Mean’ and ‘Std. Deviation’.

Descriptives

SAMPLE

Normality Test

• Parametric test assumes data is normally distributed.

• Assessing normality using Q-Q Plots.

• Hands-on Exercise: Go to ‘Analyze’ and click Q-Q Plots.

• Assessing normality using Explore.

• Hands-on Exercise: Go to ‘Analyze’, and click ‘Explore’.

• Assessing outliers using Scatterplot.

• Hands-on Exercise: Go to ‘Graphs’, and click ‘Legacy Dialogs’ and

‘Scatter/Dot’.

Normality Test

• Skewness value provides an indication of the symmetry of the

distribution. Kurtosis, on the other hand, provides information about

the ‘peakedness’ of the distribution.

• If the distribution is perfectly normal, you would obtain a skewness

and kurtosis value of 0 (rather an uncommon occurrence in the social

sciences).

• With reasonably large samples, skewness will not ‘make a

substantive difference in the analysis’ (Tabachnick & Fidell 2007, p.

80). Kurtosis can result in an underestimate of the variance, but this

risk is also reduced with a large sample (200+ cases: see Tabachnick

& Fidell 2007, p. 80).

Normality Test

Normality Test

General guideline

• From 5% Trimmed Mean, compare the original mean and the new

trimmed mean to assess whether extreme scores are having a strong

influence on the mean.

• The values for asymmetry and kurtosis between -2 and +2 are

considered acceptable in order to prove normal univariate distribution

(George & Mallery, 2010).

• For sample more than 100, use Kolmogorov-Smirnoff test; for sample

less than 100, use Shapiro Wilk test. A non-significant result (Sig.

value of more than .05) indicates normality.

• Shape of histogram, Q-Q plots and boxplot.

• Outliers appear as little circles with a number attached. Outliers are

cases with scores that are quite different from the remainder of the

sample, either much higher or much lower.

Normality Test

Goodness of Measures

Reliability and Validity


Reliability

• Typically, in any research we use a number of questions (sometimes)

referred to as items to measure a particular variable.

• Cronbach's Alpha is a measure of how well each individual item in a

scale correlates with the sum of the remaining items. It measures

consistency among individual items in a scale.

• Reliability refers to the degree of consistency, as Kerlinger (1986)

puts it; if a scale possesses a high reliability the scale is

homogeneous. According to Nunnally (1978) alpha values equal to or

greater than 0.70 are considered to be a sufficient condition. Thus, it

can be concluded that these measures possess sufficient reliability.


• Go to ‘Analyze’, click ‘Scale’ and ‘Reliability Analysis’.


Several types of validity

• Content/Face validity

• Convergent validity

• Discriminant validity

• Criterion-related validity

SAMPLE


Content validity

• Content validity refers to the extent to which an instrument covers the

meanings included in the concept (Babbie, 1992). Researchers,

rather than by statistical testing, subjectively judge content validity

(Chow and Lui, 2001). The content validity of the proposed instrument

is at least sufficient because the instrument is carefully refined from a

proven instrument with an exhaustive literature review process (Chow

and Lui, 2001). This can also be tested during the pre-test by using

subjects who are qualified (academicians and practitioners) to rate

whether the content of each factor was well represented by the

measurement items (Saraph et al., 1989). As Nunnally (1967) put it

content validity depends on how well the researchers created

measurement items to cover the domain of the variable being

measured.


Convergent validity

• According to Campbell and Fiske (1959) convergent validity refers to

all items measuring a construct actually loading on a single construct.

• The criteria used by Igbaria et al., 1995 to identify and interpret

factors were: each item should load 0.50 or greater on one factor and

0.35 or lower on the other factor.

• These results confirm that each of these constructs is unidimensional

and factorially distinct and that all items used to measure a particular

construct loaded on a single factor.


Discriminant validity

• Discriminant validity refers to the extent to which measures of 2

different constructs are relatively distinctive, that their correlation

values were neither an absolute value of 0 nor 1 (Campbell and Fiske,

1959). Correlation analysis is used. If all the factors are not perfectly

correlated where their correlation coefficients range between 0 or 1,

we can conclude that discriminant validity has been established.


Criterion-related validity

• Criterion related validity refers to the extent to which the factors

measured are related to pre-specified criteria (Saraph et al., 1986).

This is also called as nomological validity or external validity. We can

also do this by running a multiple regression analysis and looking at

the Multiple R value (correlation coefficient), the values we are looking

for are any values higher that 0.5.


• If data is collected using qualitative methods, such as interview and

focus group, coding process is required to quantify the themes before

using SPSS to perform any analysis.


Example: Beliefs about the use of Instagram

1. I enjoy using Instagram coz it is fun.

2. Taking and uploading pictures are what attracts me.

3. When I am bored, I play Instagram.

4. I am enthralled by its ease of use.

5. I find hashtag a useful function of Instagram.

6. It is easy to use, even my little brother is using it.

How many theme(s) can you identify?


• Chi-square test is used when you wish to explore the relationship

between 2 categorical variables with each having 2 or more

categories.

• It is also a statistical method assessing the goodness of fit between a

set of observed values and those expected hypothetically. It is used

when the parameter to be tested is proportion and there is no

assumption of normality.

• If the level of significance is set at 0.05, then p-value of less than 0.05

means rejection of null hypothesis.


Chi square test for independence

• Example: Is the proportion of male employees with high intention to

share information the same as the proportion of female with high

intention to share information?

For Gender, we have (1= Male/2=Female) whereas for Level, we

have(1=Low/2=High)

As such, we will have a (2 X 2) contingency table


Chi square test for goodness of fit

• Example: A researcher would like to test the association between

cigarette smoking and lung cancer. After randomly selecting smokers,

it is found that 25 out of 65 heavy smokers are at high risk of

developing lung cancer while for light smokers the figure is 20 out of

124.

• Ho: There is no association

Ha: There is association

• Findings: X2 = 11.66; p-value = 0.0016

• Decision: Reject null hypothesis

• Conclusion: Smoking and lung cancer risk are associated

SAMPLE


Hands-on Exercise

Test for Independence/Relatedness

• Go to ‘Analyze’, click ‘Cross-tabulation’.

Test for Goodness of Fit

• Go to ‘Analyze’ , click ‘Non-parametric Test’ and ‘Dialog Legacy’.

Test of Difference

Test of Difference

• Parametric Techniques: t-test; paired t-test, one-way ANOVA, two-

way ANOVA

• Non-Parametric Techniques: Mann-Whitney/Wilcoxon rank sum test;

Wilcoxon signed rank sum test, Kruskal Wallis; Friedman test

Test of Difference

Independent Sample T-test

• Comparing two populations/groups using Mean

Paired Samples T-test

• Comparing the Mean of two related populations/groups

One-way ANOVA

• Comparing the Mean of more than two populations/groups

Hands-on Exercise

• Go to ‘Analyze’, click ‘Compare Means’.

SAMPLE

Test of Difference

How to interpret the findings.

For Independent t-test, if the Levene test is significant (Sig. value is

less than 0.05), this indicates the variance of the two samples is

significantly different.

For Paired t-test, the two variables correlate if Sig. value is less than

0.05.

If the t-test is significant (Sig. 2-tailed value is less than 0.05), this

indicates the two samples are significantly different in the variable

under investigation.

Test of Difference

• Example: A one-way between-group ANOVA was conducted to test

whether intention to share differed by level of education.

Test of Difference

Test of Difference

How to interpret the findings.

• There was a statistically significant differences at the p< 0.05 level in

intention scores for the 4 educational levels [F(3,188) = 2.728,

p=0.045]. Despite reaching statistical significance, the actual

difference in mean scores between the groups was quite small. The

effect size, calculated using the eta squared, was 0.04. Post-hoc

comparison using the Duncan’s range test indicated that the mean

score for Masters (M=3.51, SD=0.92) and First degree (M=3.82,

SD=0.62) was statistically different from PhD (M=4.40, SD=0.46).

Those with Diploma education (M=3.89, SD=0.47) did not differ

statistically from the PhD group.

SAMPLE


Correlation

• Correlation is used to denote association between two quantitative

variables, assuming that the association is linear. It provides

information about the strength and direction of relationship.

• Strength: 0.10-0.29 (small), 0.30-0.49 (medium), 0.50 (large)

Hands-on Exercise

• Go to ‘Analyze’, click ‘Correlate’ and ‘Bivariate’.

• Select ‘Pearson’ (for continuous data) and ‘One-tailed’.

• If the Sig. value is less than 0.05, then the two variables are

significantly correlated. Only then the strength and direction of

relationship are looked at.


How to present the findings.

• There was a strong positive correlation between intention to share

and actual sharing [r=0.76, n=192, p<0.01] with high levels of

intention associated with high levels of actual sharing.

SAMPLE


Regressions

• Simple linear regression is used when we would like to see the impact

of a single independent variable on a dependent variable.

• Multiple linear regression is used when we would like to see the

impact of more than one independent variable on a dependent

variable.

• Multiple regression analysis is a statistical technique that can be used

to analyze the relationship between a single dependent variable

(continuous) and several independent variables (continuous or even

nominal). In the case of nominal independent variables, dummy

variables are introduced.


• In standard multiple regression, all of the independent variables are entered into the regression equation at the same time

• Multiple R and R² measure the strength of the relationship between the set of independent variables and the dependent variable. An F test is used to determine if the relationship can be generalized to the population represented by the sample.

• A t-test is used to evaluate the individual relationship between each independent variable and the dependent variable.


Things to consider:

• Strong Theory (conceptual or theoretical)

• Measurement Error

The degree to which the variable is an accurate and consistent

measure of the concept being studied. If the error is high than even

the best predictors may not be able to achieve sufficient predictive

accuracy.

• Specification error

Inclusion of irrelevant variables or the omission of relevant variables

from the set of independent variables.


Assumptions

• Normality

One of the basic assumptions is the normality which can be assessed

by plotting the histogram. If the histogram shows not much deviation

then we can assume the data follows a normal distribution.


• Normality of the error terms

The second assumption is that the

error term must be normally

distributed. This can be assessed by

looking at the normal P-P plot. The

idea is that the points should be as

close as possible to the diagonal

line. If they are then we can assume

that the error terms are normally

distributed.


• Linearity

The third assumption is the relationship between the independent

variables and the dependent variable must be linear. This is

assessed by looking at the partial plots. The idea is to see if we can

draw a straight line on the scatter plot that is generated.


• Constant Variance –

Homoscedasticity

The fourth assumption is that the

variance must be constant

(Homoscedasticity) as opposed to

not constant (Heterosedasciticity).

Heterosedasciticity is generally

observed when we see a

consistent pattern when we plot

the studentized residual (SRESID)

against the predicted value of Y

(ZPRED).


• Multicollinearity

The fifth assumption is the collinearity problem. This is a problem

when the independent variables are highly correlated among one

another, generally at r > 0.8 to 0.9 which is termed as

multicollinearoty. To assess this assumption we will look at two

indicators. The first one is the VIF and tolerance. A low tolerance

value of < 0.1 will result in a VIF value of > 10 as VIF is actually

1/Tolerance. If the value is more than 10 we can suspect there is a

problem of multicollinearity.


• Multicollinearity

The second value that we should look at is the conditional index. If

this value exceeds 30 we can also suspect the presence of

multicollinearity. When the value is more than 30 we should also

look across the variance proportions and see if we can spot any 2 or

more variables with a value of 0.9 and above excluding the

constant. If there are 2 or more than only we can conclude there is

multicollinearity.


• Independence of the error term - Autocorrelation

This is an assumption that is particularly a problem with time series

data and not for cross sectional data. We assume that each

predicted value is independent, which means that the predicted

value is not related to any other prediction; that is, they are not

sequenced by any variable such as time. This can be assessed by

looking at the Durbin Watson value. If the D-W value is between 1.5

– 2.5 then we can assume there is no problem.


• Outliers

These are values which are extremely large and influential that they

can influence the results of the regression. Usually the threshold is

set at ± 3 standard deviations. Although this is the default some

researchers may set a threshold of ± 2.5 to get better predictive

power. This assumption can be easily identified by looking at

whether there are casewise diagnostics.

SAMPLE


Hands-on Exercise

• Go to ‘Analyze’, click

‘Regression’ and

‘Linear’

Descriptive Statistics

3.15 2.653 113

2.12 1.084 113

2.90 1.575 113

HOW OFTEN R ATTENDS

RELIGIOUS SERVICES

STRENGTH OF

AFFILIATION

HOW OFTEN DOES R

PRAY

Mean Std. Dev iation N

The minimum ratio of valid cases to independent variables for multiple regression is 5 to 1. With 113 valid cases and 2 independent variables, the ratio for this analysis is 56.5 to 1, which satisfies the minimum requirement. Different authors tend to give different guidelines concerning the number of cases required for multiple regression. Stevens (1996, p. 72) recommends that ‘for social science research, about 15 participants per predictor are needed for a reliable equation’.


ANOVAb

374.757 2 187.379 49.824 .000a

413.685 110 3.761

788.442 112

Regress ion

Res idual

Total

Model

1

Sum of

Squares df Mean Square F Sig.

Predic tors : (Constant), HOW OFTEN DOES R PRAY, STRENGTH OF AFFILIATIONa.

Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICESb.

The probability of the F statistic (49.824) for the overall regression relationship is <0.001, less than or equal to the level of significance of 0.05. We reject the null hypothesis that there is no relationship between the set of independent variables and the dependent variable (R² = 0). We support the research hypothesis that there is a statistically significant relationship between the set of independent variables and the dependent variable.


Model Summary

.689a .475 .466 1.939

Model

1

R R Square

Adjusted

R Square

Std. Error of

the Estimate

Predic tors : (Constant), HOW OFTEN DOES R PRAY,

STRENGTH OF AFFILIATION

a.

Look in the Model Summary box and check the value given under the heading R Square. This tells you how much of the variance in the dependent variable is explained by the model. The rule of thumb: a correlation less than or equal to 0.20 is characterized as very weak; greater than 0.20 and less than or equal to 0.40 is weak; greater than 0.40 and less than or equal to 0.60 is moderate; greater than 0.60 and less than or equal to 0.80 is strong; and greater than 0.80 is very strong.You will notice an Adjusted R Square value in the output. When a small sample is involved, the R square value in the sample tends to be a rather optimistic overestimation of the true value in the population (see Tabachnick & Fidell 2007). The Adjusted R square statistic ‘corrects’ this value to provide a better estimate of the true population value.


Coefficientsa

7.167 .442 16.206 .000

-1.138 .194 -.465 -5.857 .000

-.554 .134 -.329 -4.145 .000

(Constant)

STRENGTH OF

AFFILIATION

HOW OFTEN

DOES R PRAY

Model

1

B Std. Error

Unstandardized

Coeff icients

Beta

Standardized

Coeff icients

t Sig.

Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICESa.

For the independent variable strength of affiliation, the probability of the t statistic (-5.857) for the b coefficient is <0.001 which is less than or equal to the level of significance of 0.05. We reject the null hypothesis that the slope associated with strength of affiliation is equal to zero (b = 0) and conclude that there is a statistically significant relationship between strength of affiliation and frequency of attendance at religious services.


Coefficientsa

7.167 .442 16.206 .000

-1.138 .194 -.465 -5.857 .000

-.554 .134 -.329 -4.145 .000

(Constant)

STRENGTH OF

AFFILIATION

HOW OFTEN

DOES R PRAY

Model

1

B Std. Error

Unstandardized

Coeff icients

Beta

Standardized

Coeff icients

t Sig.

Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICESa.

The beta coefficient associated with strength of affiliation is negative, indicating an inverse relationship in which higher numeric values for strength of affiliation are associated with lower numeric values for frequency of attendance at religious services. To compare the different variables it is important that you look at the standardised coefficients, not the unstandardised ones. ‘Standardised’ means that these values for each of the different variables have been converted to the same scale so that you can compare them. If you were interested in constructing a regression equation, you would use the unstandardisedcoefficient values listed as B.

Factor Analysis

• The purpose is to define the underlying structure in a data matrix;

analyze the structure of interrelationships among a large number of

variables by defining a set of common underlying dimensions called

factors.

• Factor analysis in SPSS is exploratory in function. The analysis is

driven by data, rather than theory.

• Sample size: preferably >100 cases or the ratio of 20:1

(case/variable).

Factor Analysis

• Important decisions includes:

Correlation matrix: KMO and Barlett’s test, Anti-image

Methods of extracting factors: Principal component

Latent root/eigenvalues criterion (>1)

Apriori criterion on number to be extracted

Percentage of variance explained (>50)

Rotation: Varimax, Promax

Loading significance (> 0.3 if 350 cases, > 0.5 if 120)

Factor Analysis

SAMPLE

Factor Analysis

Hands-on Exercise

• Go to ‘Analyze’, click ‘Data Reduction’ and ‘Factor’.

Using Syntax

• Syntax in SPSS is the program language.

• If you need to repeat your analysis, you can save the command

language in a ‘Syntax’ file so that you can run an analysis at a later

date or to repeat various analyses.

• Whenever you run an analysis, you will notice that there is a Paste

button. When you click on the paste button, a syntax file will open with

the syntax for the analysis that you intended to do.

Hands-on Exercise

• Go to ‘File’, click ‘New/Open’ and ‘Syntax’.

Using Syntax

Thank You

Thank You

Next workshop

22 May : Advanced PLS-SEM

23-24 May : Advanced SPSS, Process

Join us at Sarawak Research Society on

Thank You

Hiram Ting, PhDEmail: [email protected]

Ernest Cyril de Run, PhDEmail: [email protected]

workshop on spss: basic to intermediate level

Education

spss useful spss

statistical analysis

data analysis techniques

cluster analysis

discriminant analysis

analysis key terms

open spss workshop

social sciences spss