a guide to pasw (spss) statistics 18
TRANSCRIPT
1
A Guide to PASW (SPSS) Statistics 18
A GUIDE TO PASW (SPSS) STATISTICS 18
Ken Deal
McMaster University
2
A Guide to PASW (SPSS) Statistics 18
A Guide to PASW (SPSS) Statistics 18©
Copyright 1996, 1997, 1999, 2000, 2002, 2005, 2007, 2009 by Ken Deal and marketPOWER research inc.
First published 1996. Revised Editions 1997, 1999, 2000, 2002, 2005, 2007 and 2009.
All rights reserved. No parts of this publication may be reproduced, stored in a retrieval system, or transmit-
ted, in any form or by any means, without prior permission in writing of Ken Deal.
PASW Statistics 18 (formerly SPSS) are trademarks of IBM.
ISBN-13:
ISBN-10:
1 2 3 4 5 6 7 8 9 10 CP 0 9 8 7
Printed and bound in Canada.
Editorial Director:
Publisher:
Senior Sponsoring Editor:
Marketing Manager:
Editorial Associate:
Production Coordinator:
Senior Supervising Editor:
Cover Design:
Cover Design Credit:
Printer:
Library and Archives Canada Cataloguing in Publication
Deal, Kenneth R., 1944-
A guide to PASW (SPSS) Statistics 18 / Ken Deal
ISBN-13:
ISBN-10:
1. PASW (SPSS) Statistics 18 for Windows (Computer file) 2. Social sciences — Statistical methods
— Computer programs. 3. Statistics — Computer programs. I. Title.
HA32.D42 2006
3
A Guide to PASW (SPSS) Statistics 18
Dedicated with love to my wife, Barbara.
4
A Guide to PASW (SPSS) Statistics 18
5
A Guide to PASW (SPSS) Statistics 18
Contents
Preface 7
Beginning PASW Statistics 18 9
Analyzing a Real Data Set 15
Frequency Distributions and Charts 18
Cross-Tabulations 31
T-Tests for Differences Between Means 44
T-Tests for Differences Between Paired Answers 50
Analysis of Variance (ANOVA) 55
Linear Regression 63
Setting Up a Data Set 72
Entering Data into PASW 77
Manipulating the Data File 82
Non-Parametric Analysis 91
Summary 109
Appendix: Questionnaire 111
6
A Guide to PASW (SPSS) Statistics 18
7
A Guide to PASW (SPSS) Statistics 18
Preface
One of the primary functions within marketing research is the
analysis of survey data. The mass of data is so great in all but the
very simplest of surveys that summarizing the data with anything
other than a major statistical package is tremendously time-
consuming and error-prone. PASW (SPSS) Statistics® is one of the
leading computer packages for analyzing survey data and is the
principle package for analyzing data from surveys conducted in
the social sciences.
PASW (SPSS) Statistics® Student Version is a somewhat scaled-
down version of PASW (SPSS) Statistics®. The main limitation is
that a maximum of 50 variables and 1500 cases can be analyzed.
In addition, several of the more powerful statistical procedures
from PASW (SPSS) Statistics® are included. However, there is
certainly enough power within PASW (SPSS) Statistics® Student
Version for a great deal of statistical calculation and data
manipulation.
The purpose of this introduction to PASW (SPSS) Statistics® is to
provide the beginning analyst with the knowledge and
experience to use the basic capabilities of SPSS. If you have the
Student Version, the manual that is included with the software
explains how to install the program and how to begin using SPSS.
Most people find PASW (SPSS) Statistics® to be a very easy
program to use. It is almost self-explanatory. Experience has
8
A Guide to PASW (SPSS) Statistics 18
shown that the basics can be learned in three hours or less by
most people who have had a recent course in statistics.
The best way to use this tutorial is to simply follow the directions
in the book with your actions on the computer. If you don't get
exactly the same screens as the ones found in this tutorial,
simply back up until you find where your computer actions
diverged from the text. In many cases, it's critical that you read
and follow each word or symbol. If versions other than PASW
(SPSS) Statistics® Version 18.0 are used, screen response might
be somewhat different from what is presented in this manual.
However, most commands and output are so similar that the
translation to Version 18.0 takes minimal adjustment. In fact,
this manual can be used with any version, from 7.0 to 18.0.
Beginning with version 18, the application is referred to as PASW
Statistics 18.
The main data set used in this tutorial was obtained from a
survey conducted for the Hamilton Central Public Library,
Hamilton, Ontario, by Marketing Decision Research Inc. The
actual survey was substantially longer than the one provided in
this tutorial. This was done to keep the example within the
constraints of the SPSS Student Version. A questionnaire that
was shortened from the original to coincide with the data set is
provided in the Appendix. The Hamilton Central Public Library is
gratefully acknowledged for releasing this data to the author for
publication purposes, both educational and research-related.
9
A Guide to PASW (SPSS) Statistics 18
Beginning PASW
(SPSS) Statistics®
At this point, we'll turn to the data set extracted from a survey
conducted for the Hamilton Central Public Library and work our
way through a basic analysis of that data. Before we discuss the
library data, let's open SPSS and set the backdrop for the rest of
this book.
A few assumptions are in order:
1. You are now sitting at a computer. There is no good
alternative to hands-on learning of computer applications.
2. Windows® 2000, XP, Vista or 7 is installed on your computer;
3. PASW (SPSS) Statistics 18® Student Version or PASW (SPSS)
Statistics 18® is installed on your computer (the screen captures
in this book will look very similar for any version of SPSS from
Version 7.0 through 18.0);
3. You have had a course in basic statistics or data analysis, or
have developed a good working understanding of statistical data
analysis; and
4. You understand the basics of using Windows® 2000, XP, Vista,
7 on a modern personal computer.
Your first action is to take the CD from the back of this manual or
from the book's website, insert it into your computer's CD drive,
10
A Guide to PASW (SPSS) Statistics 18
and copy the files "Library SPSS" and "Arnold" to a new folder on
your computer's hard drive named "Marketing Research." We're
assuming that folder is "C:\Marketing Research."
Find PASW (SPSS) Statistics® in the Program Manager. Double
click on the SPSS icon and the program will open. The window in
Exhibit 1 should be seen on your computer. If you are using the
regular version of SPSS rather than the Student Version, almost
all of the screens will be identical. Those that are not identical
will have very minor differences that will not impede your
learning.
Exhibit 1
11
A Guide to PASW (SPSS) Statistics 18
This window provides several options for your next step. As you
can see from the top button on the right side of Exhibit 1, SPSS
does come with a built-in tutorial, which you might decide to
follow. You could also decide to type in a set of data. We'll do
this later, so pass over this option for now. We'll ignore the two
"query" options and focus our attention on the last option,
"Open an existing file."
If you had opened the Library SPSS.sav data file before now, it
would show in the window at the upper left of Exhibit 1.
Whatever data file you click on will be opened by SPSS, as long as
it is a proper SPSS file. If you have never accessed the Library
SPSS.sav data file with SPSS before now, double-click the "More
Files ..." line and work through your folders and files until you
find the Library SPSS.sav file in the Marketing Research folder on
your C drive. It will look similar to the following window.
Exhibit 2
12
A Guide to PASW (SPSS) Statistics 18
Another path that you can follow for opening an SPSS data set is
shown in Exhibit 3. You just need to pull down the File menu,
then slide open the Open menu and click on Data. Once again,
navigate to the C:\Marketing Research folder and click on the
Library SPSS file.
Now just double-click on the Library SPSS file and the data sheet
in Exhibit 4 will open. The data sheet, or spreadsheet, can be
considered "home base," that place where you typically begin an
SPSS session and often where you exit from SPSS when you have
finished your work. The application window is the total window
frame within which resides the Menu Bar, the Tool Bar, the Data
Editor, the Output Window, and several other helpful features of
SPSS.
The Menu Bar is at the top of the window in Exhibit 4. It reads
horizontally as File, Edit, View, Data, Transform, Analyze,
Applications, Graphs, Utilities, Add-ons, Window, and Help. The
pull down menu under each Menu Bar heading contains many
features, some of which have their own pull down, or pull to the
Exhibit 3
13
A Guide to PASW (SPSS) Statistics 18
side, features. These features will be used regularly throughout
your data analysis.
The Tool Bar is the horizontal bar of diagrams directly below the
Menu Bar. These Tool Bar icons act as short cuts so that you
don't have to pull down the menus for several of the most-often-
used tasks. Some of the Tool Bar icons are self-explanatory,
others you'll need to learn. Whenever you want to learn the
function of a Toolbar icon, just hold the cursor over the icon
and the basic description of the icon appears below the
icon.
The first of the Tool bar items in Exhibit 3 shows the File
menu being opened, and that is exactly what it does: opens
files, saves output files, and syntax files and several other
helpful tasks. If you click on the icon, you'll be able to
Exhibit 4
Value Labels. You should see “Last
week” in the first row below
“Time_last”, as in Exhibit 4. If 2
appears instead, pull down the
View menu and click on Value La-
bels.
Exhibit 5
14
A Guide to PASW (SPSS) Statistics 18
perform some tasks more easily than by pulling down the
respective menu and selecting the specific action.
Begin exploring the flexibility of SPSS by pulling down each of the
menus on the Menu Bar, and viewing their commands. The
Analyze menu is shown in Exhibit 6. While SPSS is capable of an
incredibly wide range of functions, the Analyze menu lists the
essentials that make the existence of SPSS valuable. These are
the statistical data analysis procedures that you can use to
analyze data sets and understand the nature and attitudes of
survey respondents. Exhibit 6
15
A Guide to PASW (SPSS) Statistics 18
The cursor in Exhibit 6 points to Frequencies which, along with
Descriptives and Crosstabs will be the most frequently used
methods in this book and, probably, for most of the data analysis
that you will conduct when learning how to use SPSS.
Analyzing a Real Data Set
The Hamilton Central Public Library (HCPL) data file for the
survey conducted on customers in the library is labeled Library
SPSS.sav. Exhibit 4 shows the data for the first 10 respondents
and for the first 7 variables, RID through Spend. Now, look at the
questionnaire for the Hamilton Central Public Library Community
Opinion Study in the appendix.
Actually, several surveys and focus groups were executed during
the course of the Library study. The data used in this book was
obtained from library users who completed a questionnaire in
the library. Copies of the questionnaire were distributed to
library users on all five public floors of the library over the course
of one week. The appendix questionnaire is an abbreviated
version of the actual survey used in the study.
Since PASW (SPSS) Statistics 18® Student Version is limited to 50
variables, this library data set has been pruned to contain only
46 of the many variables that were in the original survey. Each
column in the Data Editor contains the values of the variable
listed in the heading. For example, "RID" is the variable that
simply numbers each of the cases (respondents) in the data set.
16
A Guide to PASW (SPSS) Statistics 18
By using the vertical scroll bar on the right side of the Data Editor
window, you can see that the respondent identification (RID)
variable lists 1, 2, 3 ... to the total number in the data set, 779.
The second variable (second column) is "Time_last", which is
actually the first question in the questionnaire (see Appendix).
Respondent 1 answered "last week" (code 2) to the question
"When was the last time that you were in this Library before
today?" Person 2 indicated "Week before last" (code 3), while
respondents 3 and 4 both answered "Last week" (code 2 again).
Reading across the first row in Exhibit 5, person 1 said the
following: that she (slide the scroll bar at the bottom to the right
until you see the variable "Sex" which indicates the female
gender for this respondent) visited the library last week, that she
visited this library twice during the past month, that she got to
the library via a personal vehicle, that she was at home just
before going to the library, that the library was the one special
reason for the trip, and she indicated that she will have spent
$40.00 in the downtown core during the library trip. That's an
interesting story and it becomes even more fascinating to read
through the answers of several respondents and then combine
the answers of all 779 individuals who were kind enough to
complete the survey questionnaire.
The Data Editor can provide the numerical values that
correspond to the respondents' answers or it can show the
answers in words. Your Data Editor probably shows words (value
labels) under the variable "time_last_visited." If this is true, pull
down the View menu and notice that the Value Labels option is
checked (Exhibit 5). Drag your cursor down to Value Labels and
17
A Guide to PASW (SPSS) Statistics 18
click to uncheck this option. Notice how the value labels have
been replaced by numbers. Instead of "Last week" for person #1,
the number 2 shows under "time_last_visited." We'll come back
to the Value Labels feature later.
The small icons to the left of
each variable name indicates
the information content of the
variable. The three circles
clustered together indicate
nominal variables. The verti-
cal bar chart signifies that the
variable was specified as hav-
ing ordinal properties and the
yellow ruler means that the
variable was specified as hav-
ing scale or metric properties.
Exhibit 8
18
A Guide to PASW (SPSS) Statistics 18
Frequency
Distributions and Charts
Now we're ready to direct SPSS to analyze the data, i.e.,
summarize the information so that it's easier to get an overall
story of the 779 respondents. Exhibit 7 shows that we have
pulled down the Analyze menu, pulled out the Descriptive
Statistics menu, and that "Frequencies ..." is now highlighted.
Exhibit 7
19
A Guide to PASW (SPSS) Statistics 18
This Descriptive Statistics section contains several analysis tools
that will allow you to conduct a tremendous amount of basic
analysis that is very important for understanding data. You’ll be
able to get frequency distributions along with basic statistics and
graphs, descriptive statistics, crosstabulations with related
statistics, and we'll be able to explore the information contained
in the data by using the "Explore..." feature. For now,
Frequencies will be enough.
In the Frequencies window (Exhibit 8), highlight the variable(s)
you want to analyze and move them to the currently blank
"Variable(s):" window by clicking on the arrowhead. (You can
also move the variables by double-clicking on each one.) The
window should now look like Exhibit 9. You can allow SPSS to
analyze the data using preset "default" options or you can
choose your own special features by clicking on the "Statistics,"
"Charts," and "Format" buttons.
The default settings on the "time_last_visited" variable produce
the frequency distribution shown in the Output window of
Exhibit 8
20
A Guide to PASW (SPSS) Statistics 18
Exhibit 10. This is a very useful output and we'll soon see how we
can improve its appearance and obtain more information.
The Statistics table shows that 769 customers provided valid
answers to question 1 and that the answers for 10 people were
missing (i.e., they did not answer or their answers were
unintelligible).
The frequency distribution appears next in Exhibit 10. Check
back to the questionnaire to better understand the meaning of
values in the frequency distribution for "time_last_visited."
These are times when the respondent last used the Central
Public Library. "Earlier this week" was reported by 215
customers, "Last week" by 249, and so on.
As the values increase, the period between the time of the
survey and the last time the library was visited lengthens. Notice
that the distances between successive values are not equal to
the same lengths of time. The difference between "Earlier this
Exhibit 9
21
A Guide to PASW (SPSS) Statistics 18
week" and "Last week" is much less than the difference between
the "Week before last" and "three to four weeks ago."
This means that the "Time_last" variable has ordinal information
properties (i.e., the "Week before last" designates a longer time
period than does "Last week"). Please note that
"time_last_visited" does not have interval properties since the
time distances between answers provided are not equal. Also,
"time_last_visited" does not have ratio properties since the
length of time denoted by "last week," or 2, is not twice the time
denoted by "earlier this week" or 1, and the length of time
Exhibit 10
If a variable has ratio
properties, one value can
be divided by another
and the quotient makes
sense. Also, there is a
natural zero point.
22
A Guide to PASW (SPSS) Statistics 18
represented by "one year ago or longer," or 8, is not twice that
indicated by "three to four weeks ago," or 4, and so on.
We've now discussed the top three levels of information content
of data: ordinal, interval, and ratio. The lowest level is nominal;
this is typically an object or a number that is used to represent a
name or object in the data set rather than a value having a
specific numerical attribute. The legitimate measures of central
tendency for these levels of information content are:
information level measure of central tendency
nominal mode
ordinal mode, median
interval mode, median, mean
ratio mode, median, mean
Before we re-enter the Frequencies menu to get more
information, let's work a little to understand how to assign labels
to the values of "Time_last." This is very easy to do in SPSS
Windows by following these steps:
1) In the Data Editor View (Exhibit 4) window, select the tab at
the bottom left corner of the window labelled "Variable
View" (see Exhibit 11), or double-click on the box that contains
the variable name "RID" (first column). The Variable View
window in Exhibit 12 will appear.
23
A Guide to PASW (SPSS) Statistics 18
2) This Variable View window provides substantial flexibility for
configuring the data in the Data Editor. All of the variables in the
data set are listed in this window. Scroll down the window and
you will see the complete list of variables and their
characteristics.
3) Each column allows you to set or change some characteristic
of the variable. The Name and Label columns allow you to
establish or change the name that you want to assign to a
variable. For "Time_Last”, you may change the name to "Q1" or
any other name that you like that abides by SPSS rules. You can
also change the Label for that variable. Since the variable "id"
does not have a label, you may type a suitable variable label into
the Label box for the "id" row; for example, "respondent
Exhibit 11
Exhibit 12
24
A Guide to PASW (SPSS) Statistics 18
identification". When you do this, think of how your output
tables will look.
4) Explore the other boxes in the "Time_Last" row by clicking in
the right part of each box. You can adjust the Type, Width,
number of Decimals, Values (Value Labels), Missing, the number
of Columns, Align, and the Measure features.
5) Click on the Values box for the "Time_Last" variable, and then
click again on the gray segment in the rightmost part of that box.
You should see Exhibit 13. Each value that "Time_Last" can take
has a description that has been typed into the Variable View
window. Click on the first label and you will see that the number
1 appears in the Value window and "This week" appears in the
Label section of the window. If you like, you can change the label
for 1 to another that you might feel is more appropriate. Notice
that the last five variables, "live_where," "occupation,"
"work_where," "birth_year," and "language_first," do not have
Exhibit 13
25
A Guide to PASW (SPSS) Statistics 18
value labels. Try typing in value labels for at least one of those
variables.
6) You may change these labels as you see fit. You can switch
back and forth between the Variable View and the Data View
screens by clicking the tabs having those names in the lower left
corner of the screen.
7) "Missing Values" is a very important function found in the
Variable View window. Notice in Exhibit 10 that Missing System
has the frequency of 10. This means that 10 people did not
provide any answer to the "Time_Last" question. In the Data
View window scroll down and you will see that 10 respondents'
values for "Time_Last" are not recorded.
Now, click in the far right of the box for Missing in
"Time_Last" and you will see Exhibit 14. Notice that there are
"No missing values" for "Time_Last." From the Value Labels
window, you can determine that "Never before today" has the
value 9. If you wanted to make this a missing value, click in the
circle for "Discrete missing values" and type 9 into the leftmost
box. Click OK (see Exhibit 15).
Exhibit 14
26
A Guide to PASW (SPSS) Statistics 18
Now, let’s produce a frequency distribution for this altered
"Time_Last" variable. Pull down the Analyze menu, go to the
Descriptive Statistics menu and select Frequencies. Move
"Time_Last" into the Frequencies window and click OK. You
should get the frequency distribution shown in Exhibit 16.
"Never before today," value 9, is now listed as Missing. Under
the Valid Percent column, notice that the 13 people who said
"Never before today," and the 10 people who gave no response
Exhibit 15
Exhibit 16
27
A Guide to PASW (SPSS) Statistics 18
to this question, are not listed. Now reverse the process and
include 9 as a valid response for "Time_Last".
Back to our analysis path. Re-enter the Frequencies window
and select "Time_Last" as the variable to analyze. Click on the
"Statistics..." button and Exhibit 17 will appear. SPSS presents a
wide range of statistics that can be calculated and displayed
under the frequency distribution. While some of these statistics
are relevant, others are not appropriate for some types of
variables. Remember that "Time_Last" is an ordinal variable. The
mode and median have been checked in the Statistics window Exhibit 17
28
A Guide to PASW (SPSS) Statistics 18
(Exhibit 17), the mean is not relevant here. Now our frequency
distribution will look much more legible and provide more
information than it did in our first attempt. The frequency
distribution with a variable label and value labels is the same as
in Exhibit 10, with the addition of the mode and median being
presented in the Statistics box, as is shown in Exhibit 18.
You probably also noticed the Charts button in the Frequencies
window. Click the Charts button and select the Bar and
Percentages options; the window in Exhibit 19 will appear. There
are several options available at this point. We have chosen a bar
chart displaying percentages. You might want to play around
with different options until you get the chart you like best.
Exhibit 18
Exhibit 19
29
A Guide to PASW (SPSS) Statistics 18
Exhibit 20a shows the bar chart with percentages for each of the
bars, and the value labels displayed along the horizontal axis.
The output from SPSS can be saved and used in other programs,
such as word processors like MS Word or page layout programs
like Pagemaker and MS Publisher.
Bar charts should normally be displayed with horizontal bars and
the value labels typed horizontally to the left of the bars. To
change the appearance of the graph, double-click on the graph
and the SPSS Chart Editor will appear. This gives substantial
flexibility in changing the graph to suit your taste. Work with the
editor until you get Exhibit 20b.
Exhibit 20a
30
A Guide to PASW (SPSS) Statistics 18
Exhibit 20b
31
A Guide to PASW (SPSS) Statistics 18
Cross-Tabulations Now we're ready to move on to the next level of analysis. This
involves investigating the relationships between pairs of
variables. For example, library customers were asked to indicate
why they used the library: for leisure reading, for school, for
learning how to do things, for their jobs, or for personal
research. Customers were also asked to indicate their sex.
During the analysis of survey research data, dependent or
criterion variables, such as "why they used the library," are
investigated to see if they are related to other variables. It is
especially important to determine if the relationship is such that
particular patterns can be identified, and predictions advanced.
In this library survey, we will now investigate whether there is
any relationship between "why they used the library" and the
sex of the respondents. Pull down the Analyze menu, select
Summarize and then "Crosstabs..." as shown in Exhibit 21.
The Crosstabs window will appear next. This window has a few
more options than the Frequencies window did, but it operates
in much the same way. For ease of reading and interpretation,
always place the dependent variable (if there is one) in the row
position, sometimes called the stub, and the independent
variable in the column position, referred to as the banner.
Now, select "Why_use_library" from the list of variables and
send this variable over to the "Row(s):" box by clicking on the
higher arrow button. Send "sex" to the "Column(s):" box by
32
A Guide to PASW (SPSS) Statistics 18
selecting "sex" and clicking on the "Column(s):" arrow button.
(See Exhibit 22.) If you click on the OK button, a crosstab will be
produced. But let's do a bit more before getting a table.
We'll specify some statistics to accompany and help interpret the
table by clicking on the "Statistics..." button. The window in
Exhibit 23 should now be visible on the computer screen.
Although you may ask for all of the statistics displayed in the
window, it's better to ask for only those statistics that are
Exhibit 21
33
A Guide to PASW (SPSS) Statistics 18
relevant to the analysis. In this case, click the box for Chi-Square
Exhibit 22
Exhibit 23
34
A Guide to PASW (SPSS) Statistics 18
and the box for Lambda. These statistics will be explained only
briefly here. You'll need to read about these in a marketing
research text or a statistics text to obtain a more detailed
explanation.
Click on the Continue button in the Crosstabs: Statistics window
and then click on the Cell button in the Crosstabs window. The
Crosstabs: Cell Display window, in Exhibit 24, will now be visible
on the screen. The Observed box under Counts will already be
checked. Now, check the Column box under Percentages and
click the Continue button.
Your last necessary action is to click the OK button in the
Crosstabs window and wait a few seconds. Part 1 of the
Exhibit 24
35
A Guide to PASW (SPSS) Statistics 18
Crosstabs output is the table of "Why use library by Sex" in
Exhibit 25.
You should have no problems interpreting the table. Of the 348
females in the study, 136, or 39.1%, said they used the library for
leisure reading. Only 29.3% of the males reported this reason.
The mode for the men was personal research, at 31.9%, just
slightly larger than the 29.3% for leisure reading. Overall, leisure
reading was the most often cited reason for using the library
(33.7%).
Notice in the crosstab that there seem to be some substantial
differences between the ways in which women and men use the
library. Women reported using the library more for leisure
reading and for school. The responses from men indicate that
personal research is their primary use for the library and that this
Exhibit 25
36
A Guide to PASW (SPSS) Statistics 18
was much higher than for women. The second most frequently
mentioned use by men was for leisure reading (29%) and this
was 10 percentage points lower than for women. Men used the
library for jobs and for learning how to do things more often
than did women.
Reporting differences between groups is a very important
function of a marketing research analysis. Sometimes these
explanations are more important than the actual statistical
analysis. However, this reporting of differences should
concentrate primarily on contrasts that are large enough to be
substantial for business reasons and to be significant statistically.
In marketing research, the primary focus is on highlighting
information about the business environment, competitive
landscape, and customer attitudes and behaviour that can help
managers make better marketing and business decisions.
Statistical significance that supports information that is
substantial to assisting with decisions is important. Sometimes,
differences between data values are statistically significant, but
those differences might be so small that they make little or no
difference to running the business.
Exhibit 26
When preparing tables for re-
ports, the style of Exhibit 26 with
column percents is more readable
for most audiences. The base size,
i.e., the total number of respon-
dents on which the table is built
should always be provided.
37
A Guide to PASW (SPSS) Statistics 18
This discussion leads us to looking at the statistical output for the
last crosstab. Remember that you selected to see the Chi-Square
test and the Lambda test results. The Pearson Chi-Square test
provides a value of 60.184, 4 degrees of freedom, and a
Significance of 0.000 (Exhibit 27). Unless you're familiar with this
test, you're probably not very excited by these findings. The
60.184 is a statistical calculation that measures the relative
differences between the observed frequencies in the table and
those frequencies that would have been expected to be found in
the survey data if the null hypothesis of statistical independence
between the two variables were true.
This expected frequency for the first cell of the table can be
calculated as (260 x 348)/771, or 117.35. So, based on the null
hypothesis, we would expect 117.35 women to say that they use
the library primarily for leisure reading. The survey found that
136 women actually said leisure reading was their main reason.
You could calculate these expected frequencies for all the cells in
the table and visually try to figure out if the two sets of numbers
are very similar or very different. (In fact, SPSS will provide a
crosstab with expected and observed values. To use this feature,
we would have checked the Expected box in Exhibit 24.) In some
Exhibit 27
Ho: Why use library and Gender
are statistically independent.
Ha: Why use library and Gender
are not statistically dependent,
i.e., they are related
Alpha or Type 1 risk = 0.05
38
A Guide to PASW (SPSS) Statistics 18
tables, these differences between observed and expected counts
might be obviously close or obviously dissimilar. In most cases,
it's very hard to tell which is the correct answer. This is why the
Significance value is provided in the Pearson row under Chi-
Square.
To tell if the observed and expected frequencies are close or not,
you just have to compare the Significance to the benchmark
value of 0.05 for the risk of a Type 1 error. In our example, the
Significance of 0.000 is much smaller than 0.05, so we conclude
that the actual observed survey frequencies are very different
from those frequencies that would be expected to occur if the
null hypotheses of independence between sex and primary use
of the library were true. Therefore, we should reject
independence between the two variables and conclude that
there is a dependency or relationship between sex and use of
the library.
This finding tends to confirm our visual inspection of the table as
we discussed previously. When your visual inspection is
confirmed by the statistical test, you should feel good. Many
cases will not be this obvious.
So, we've concluded that a relationship exists between primary
use of the library and sex. The next question is whether this is
helpful for anything. Your first concern should always be
whether the data itself helps you to explain something important
to your client, even if the difference is not statistically significant.
(I know. This might sound like statistical heresy. However, your
#1 concern is to explain the phenomenon that is being studied in
a way that is of value to your business client. Statistical testing
39
A Guide to PASW (SPSS) Statistics 18
exists to help you when your visual observation might not allow
you to make definitive conclusions about the data story. Don't
let the statistics lead you into making statements that might
appear to be silly to a business manager. They will want findings
that help them make good business decisions. These managers
are only rarely concerned about "statistically significant"
findings. That's reality!)
Strength of the Relationship
Next, how strong is the relationship between Why use the library
and sex? Please don't confuse this question with "Is the data of
any value?" The data might have value to your client even
though no statistically significant relationship exists. It better
have some value ... they've paid for the research!
The strength of the relationship can be measured in several
ways; the basic measure is asymmetric Lambda with ‘Why use
library’ specified as the dependent variable. This gives the
percentage reduction in the error of predicting "Why do you use
the Library?" if you use the information about sex compared to
the error of predicting ‘Why use library’ without using any
information about an independent variable.
Exhibit 28 contains information helpful for deciding on the
strength of the relationship. Lambda with ’Why use library’
dependent has a value of 0.022. This means that the error of
predicting the respondent's reason for using the library can be
40
A Guide to PASW (SPSS) Statistics 18
reduced by 2.2% if information about the respondent's sex is
used. Doesn’t that seem like a really minor benefit?
If information about the independent variable ‘sex’ allows you to
always accurately predict the person's reason for using the
library, then the value of Lambda would be 1.0, its maximum. For
example, Lambda would be 1.0 and you would have perfect
ability to predict if you knew that the respondent was a man, he
would use the library for personal research, and if the
respondent were female, she would use the library for leisure
reading. (When would this be true? Of course, only when all
males used the library for personal research and all females used
the library for leisure reading.)
If information about sex did not at all help you to better predict
why a person used the library, then Lambda would have the
value 0.00. Lambda would be zero in this example if the most
often mentioned use of the library by women is, say, leisure
reading and men also used the library most often for leisure
reading. In this case, it really doesn't matter what the sex is,
Exhibit 28
Female Male
Leisure Reading 348 0
School 0 0
How to do 0 0
Job 0 0
Personal research 0 423
Total 348 423
Female Male
Leisure Reading 348 423
School 0 0
How to do 0 0
Job 0 0
Personal research 0 0
Total 348 423
41
A Guide to PASW (SPSS) Statistics 18
you'd still guess that leisure reading is the main use of the library
for males and for females.
Notice that we observed that the most often cited reason for
using the library by all 771 respondents was leisure reading.
Women most often mentioned leisure reading, but men stated
personal research most frequently. Our guess changes
depending on whether the person is male or female. Whenever
this switch occurs in the guess of the value of the dependent
variable based on the value of the independent variable, the
value of Lambda must be greater than zero. The larger the value
of Lambda, the stronger the relationship is between the two
variables. Once again, the largest value of Lambda is 1.0, when
the prediction would be perfect.
In our example, the value of 0.022 indicates a very weak
relationship between sex and ‘Why use library’, using ‘Why use
library’ and as the dependent variable. Notice that although a
higher percentage of men used the library for personal research,
this 31.9% is just slightly higher than the 29.3% who said leisure
reading, which was the reason most often cited by women.
When these highest percentages in a column are very close in
value, expect Lambda to be small.
We've looked at just two of many statistics that are available
from Crosstabs. Depending on the types of variables being
analyzed, you might decide to select other statistics to be
presented by SPSS. These are discussed in the Help menu, in
your marketing research text, and in statistics texts.
42
A Guide to PASW (SPSS) Statistics 18
The Sparse Cells Rule
One more item before we move on. Notice the line in Exhibit 29
that says "a. 0 cells (.0%) have expected counts less than 5. The
minimum expected count is 23.02". This might seem innocuous,
but it's really quite important. This information is used to
indicate whether the Chi-Square analysis can be used, or
whether it must be scrapped.
In this example, the value of 23.02 is perfectly adequate and no
further thought would have to be paid to any constraints on the
analysis. Without getting into a lot of details, this value,
"minimum expected frequency," must be 1.0 or more or the Chi-
Square analysis is not valid. The first line above that says "0 cells
(.0%) have expected count less than 5" and is also very good
news. Look for that value to be less than 20% or you could be
getting bad information from the Chi-Square test.
A rule-of-thumb for the Chi-Square test is that no more than
20% of the cells in the crosstab should have expected
frequencies less than 5 and none should be less than 1. Please
realize that this is a rule-of-thumb, not a hard-and-fast rule.
Exhibit 29
The Sparse Cells Rule
43
A Guide to PASW (SPSS) Statistics 18
There is some flexibility for interpretation. One should be
cautious if between 20% and 25% of the cells have an expected
frequency of less than 5. If the data looks highly unusual, don't
rely on Chi-Square. If the data is fairly "normal" looking, then
consider using Chi-Square. If between 25% and 30% of the cells
have expected frequencies less than 5, be very hesitant to use
the Chi-Square test results. If 30% or more of the cells have
expected frequencies of less than 5 then do not use the Chi-
Square test.
If the minimum expected frequency is less than 1.0 or more than
20% of the cells have expected frequencies less than 5,
investigate the crosstab to identify the reason for these numbers
being so low. Typically, there is at lease one row or column that
contains very few expected frequencies. If combining two
columns or two rows makes sense for those variables, this could
solve the problem with the sparse cells rule and open the door
for using the Chi-Square test. Remember that this rule applies to
expected frequencies, not to the observed frequencies. It would
be helpful for you to return to Exhibit 24 , select expected
frequencies and relate the expected frequencies in the resulting
crosstab to the sparse cells indicator under the Chi-Square table.
Do this for several crosstabs and relate the expected frequencies
to the sparse cells rule.
44
A Guide to PASW (SPSS) Statistics 18
T-tests for Differences
Between Means
The variables that were analyzed in the last section, use of the
library and sex, were both nominal in their information content.
Now, we'll investigate what can be done when the dependent
variable is interval or ratio. From the questionnaire, we see that
customers were asked whether they agree or disagree with the
statement that the Central Public Library "has too many rules." A
6-point scale was used that went from "Strongly disagree" to
"Strongly agree." Let's assume that this scale has interval
properties, although some might disagree. With this assumption,
we can calculate means, variances, and other parametric
statistics.
We will now ask the question "Is the mean level of agreement
for women the same as it is for men?" A t-test for independent
samples will be used for the analysis. To access this test, pull
down the Analyze menu, then slide out the Compare means
menu, and select "Independent-Samples T Test." (see Exhibit 30).
The Independent-Samples T Test window pops up and is shown
in Exhibit 31. Now, select the variable "rules" as the Test Variable
45
A Guide to PASW (SPSS) Statistics 18
and "sex" as the Grouping Variable. You'll see that "sex(? ?)"
appears in the grouping variable window. Press the "Define
Groups..." button and you'll be asked to provide the User-
Specified values for Group 1. Insert 1 and for Group 2, insert 2
(see Exhibit 32). Press the Continue button and then the OK
button.
The table in Exhibit 33 shows the descriptive statistics. The mean
for Females was 1.86 for "Too many rules in the Library," slightly
less than "somewhat disagree" on the questionnaire scale. For
Exhibit 30
46
A Guide to PASW (SPSS) Statistics 18
males the mean was 2.09, slightly more positive than "somewhat
disagree." Although both females and males "somewhat
disagreed" with this statement, the males tended not to disagree
as much as did the females that there were too many rules at
the library. Is this difference of 0.23 of a scale point significant
regarding the way in which the library management should serve
females compared to males? Is the 0.23 point difference
significantly different in a statistical sense?
Exhibit 31
Exhibit 32
Exhibit 33
47
A Guide to PASW (SPSS) Statistics 18
As we had a hypothesis for the Chi-Square test, we should state
one for this test as well. An appropriate null hypothesis is that
the mean level of agreement with "The Central Public Library has
too many rules," is the same for men as for women. The
alternative hypothesis is that the means are different. Let's use a
5% risk of a Type I error once again.
The t-test gives us the information we need for deciding whether
to reject or not reject the null hypothesis. Before we test
whether the two means are the same or different, there's an
intermediate test that must be done. This involves testing
whether the variances of the two distributions, the variance of
the distribution for men and the variance for women, are equal
or different.
This test is conducted by using Levene's Test for Equality of
Variances. This, of course, involves another null hypothesis: that
the variance of the two distributions are the same. This is tested
using the F value and P value from Levene's test as shown in
Exhibit 34. The F value can be treated like we did the Chi-Square
value: it's a statistic that's presented by SPSS, but it's number
alone does not provide much information to most of us.
The "Sig." value, sometimes called the P value, produced by
Levene's test tells us where to find the information regarding the
Exhibit 34
Please note that this
is just the left half of
the SPSS output for
the t-test.
Levene’s Test Hypotheses
Ho: The variance for the distribution
of ‘Rules’ for males is statistically the
same as the variance for females.
Ha: The two variances are statistically
different for males and females.
alpha or Type I risk = 0.05
48
A Guide to PASW (SPSS) Statistics 18
independent samples t-test for the difference between two
means. If the P value is greater than 0.05 (5%) then conclude
that the two variances are not different, i.e., do not reject the
null hypothesis, and the row in the table that begins "Equal
variances assumed," in Exhibit 35. (In SPSS, Exhibits 34, 35, and
36 are one table. They were segmented here for ease of
presentation.)
If the P value is less than or equal to 0.05, then continue to the
"Equal variances not assumed" row under Variances. In our
example, Levene's test provides F=0.387 and P=0.534. Therefore,
we can obtain the appropriate t-test information by reading the
Equal row of the last block of the table in Exhibit 35.
The Equal Variances row of the table provides a t-value of -2.41
and a 2-tailed Significance of 0.016 in Exhibit 35. Since this
significance level is less than 0.05 (5% risk), we should reject the
null hypothesis and conclude that the means for men and
women are different. In fact, we can see that the mean for men
is significantly higher than the mean for women. If the 95%
"Confidence Interval of the Difference" shown in Exhibit 36
contained zero, we could not conclude that the means are
significantly different.
Exhibit 35
49
A Guide to PASW (SPSS) Statistics 18
Now, how should we interpret this for the library? In non-
statistical language, we should say that men disagreed
significantly less than did women with the statement "The
Hamilton Central Public Library has too many rules." Men stated
that they "somewhat disagreed" with that statement while the
women's rating was significantly lower than the men's rating.
However, the substantive difference between 1.86 and 2.09 on a
6-point attitude scale seems trivial from a business perspective.
What could you do with this finding if you were the CEO of the
library? Probably not very much. You might simply conclude that
both male and female customers felt that there were not too
many rules in the library. Perhaps this finding might allow the
library to justifiably introduce new rules that it might feel would
benefit customers and the library overall.
Exhibit 36
50
A Guide to PASW (SPSS) Statistics 18
T-tests for Differences
Between Paired Answers
Library customers were asked to state their answers to
"How important to you personally are your visits to this Library?"
and "How important to your career are your visits to this
Library?" These are questions 13 and 14 in the questionnaire in
the appendix. The answers were on 6-point importance scales
that went from "Extremely important" (6) to "Not at all
important" (1).
13. How important to you personally are your visits to this
library?
[ ]6 EXTREMELY IMPORTANT
[ ]5 VERY IMPORTANT
[ ]4 FAIRLY IMPORTANT
[ ]3 SOMEWHAT IMPORTANT
[ ]2 SLIGHTLY IMPORTANT
[ ]1 NOT IMPORTANT AT ALL
14. How important to your career are your visits to this library?
[ ]6 EXTREMELY IMPORTANT
[ ]5 VERY IMPORTANT
[ ]4 FAIRLY IMPORTANT
[ ]3 SOMEWHAT IMPORTANT
[ ]2 SLIGHTLY IMPORTANT
[ ]1 NOT IMPORTANT AT ALL
The average for "personal importance" was 4.71 and for "career
importance" was 3.63. It seems natural to ask if there is a
51
A Guide to PASW (SPSS) Statistics 18
difference between these two means. However, imagine yourself
being asked these two questions in sequence. Do you think that
you might look back-and-forth between these two questions and
calibrate your answers between them? For example, if you said
that you library visits are “very important” to you personally, do
you think your answer to career importance might be refereced
to that answer in Q13? Most people would, either consciously or
unconsciously.
Because of this dependence between the answers to these two
importance questions, we will use a Paired-Samples T Test
rather than the Independent-Samples T Test of the previous
section. Remember that with the Independent-Samples T Test
we were actually comparing the means of two groups, males
versus females. In this current example, we simply want to find
out whether the mean of the differences between individuals’
ratings of the importance to them personally and to their careers
are different.
As you did with the Independent-Samples T Test, pull down the
Analyze menu, and slide out the Compare Means menu (Exhibit
37). Now, select "Paired-Samples T Test." and Exhibit 38 appears.
Exhibit 37
52
A Guide to PASW (SPSS) Statistics 18
Select the pair of variables, IMPPER and IMPCAR, and place them
in the Paired Variables box (Exhibit 38), then press the OK
button. Next, the statistical analysis appears in the output
window.
The Paired T Test provides the descriptive statistics for the two
variables in Exhibit 39: means, standard deviations, and standard
errors for each of the variables. The correlation between the two
variables is shown in Exhibit 40. The "Sig." next to the
"Correlation" indicates that these two variables are significantly
correlated, as one might reasonably think. In this case, the
Exhibit 38
Please note that you must fill each
line in the ‘Paired Variables:’ window
with pairs of variables.
If you put only one variable in a line,
the ‘OK’ button will be a dull gray.
This is a tip-off that you have not
completed a command.
Exhibit 39
Exhibit 40
53
A Guide to PASW (SPSS) Statistics 18
correlation is 0.369 and the 2-tailed significance of 0.000
indicates that this correlation is significantly different from zero
(i.e., the importance to the respondents, personally, of visits to
the library and the importance to their careers are significantly
correlated).
Exhibits 41a and 41b provide the information directly relevant to
the t-test. The mean of the differences between each customer's
answers to the questions is given as 1.0671. The standard
deviation of 1.7128 and standard error of 0.063 are also printed.
In addition, the "95% Confidence Interval" is stated as 0.9439 to
1.1903 and does not include zero. The T value in Exhibit 41b is
17.005 and the "Sig. (2-tailed)" value is 0.000.
These statistics provide very strong evidence that people do
consider visits to the library to be significantly more important to
them personally than to their careers. This conclusion is reached
because the "Sig. (2-tailed)" value is smaller than 0.05 (5% risk).
The T value of 17.005 is very large. (You might remember that T
values larger than 1.96 or smaller than -1.96 for large samples
Exhibit 41a
Exhibit 41b
54
A Guide to PASW (SPSS) Statistics 18
are judged to be significantly different from zero at a 5% level of
risk.)
55
A Guide to PASW (SPSS) Statistics 18
Analysis of Variance
(One-Way)
Think back to the problem addressed with Independent-Samples
T Tests, that of determining whether a statistically significant
difference exists between two means. If we extend that problem
to three or more means, then we need to use the One Way
Analysis of Variance. As with the T Tests, pull down the Analyze
Exhibit 42
56
A Guide to PASW (SPSS) Statistics 18
menu, slide out the Compare Means option, and choose "One-
Way ANOVA." as shown in Exhibit 42. The One-Way ANOVA
options window should now be showing (see Exhibit 43).
Move the variable "Overall satisfaction" into the Dependent List
window. Then move the variable "Motivation," which stands for
"Which of the following best describes how you happened to
come to the Central Library today?" to the Factor window. The
"Motivation" variable has the following three possible answers:
1. Library was the one special reason for this trip;
2. Library was one of several things I wanted to do on this
trip; and
3. Thought of Library after starting out to do something
else.
Your next job is to provide for some additional output that will
help you to interpret the ANOVA output. To do this, press the
"Post Hoc." button. The window in Exhibit 44 will open,
providing a long list of options. Without providing an
Exhibit 43
57
A Guide to PASW (SPSS) Statistics 18
explanation, please click on the box for "Tukey's-b," and then
click Continue. This will put you back in the One-Way ANOVA
window. Now press the Options button.
Exhibit 44
Exhibit 45
58
A Guide to PASW (SPSS) Statistics 18
In the One-Way ANOVA: Options window (Exhibit 45), click on
the Descriptive box, the Means plot box, and then press the
Continue button. You're now back in the One-Way ANOVA
window. Click OK, when you're back in the window shown in
Exhibit 43, and let SPSS crank away at its calculations.
In a few seconds the ANOVA output that appears in Exhibits 46-
49 below will be on your computer screen. The descriptives in
Exhibit 46 show the three means and their confidence intervals.
There's quite a bit of detail in the ANOVA output, too much to
explain in this tutorial. Please refer to your marketing research
or statistics text for more complete explanations. The most
important piece of information in Exhibit 47 is the "Sig." value of
0.023. This indicates whether the F value of 3.794 is significantly
different from 1.0 or not. This will be used as a level of
significance for the analysis that investigated the degree of
Exhibit 46
The null hypothesis is that the
three means are the same. Do
you think that the means of 5.47,
5.53 and 5.25 are the same or
are they different?
The F statistic is the ration of the
two Mean Squares, which are
really variances.
If the null hypothesis is true, the
F ratio should be statistically
close to 1.0
Exhibit 47
59
A Guide to PASW (SPSS) Statistics 18
difference among the mean satisfaction levels for the three
groups of customers who indicated how they happened to be in
the library on the day of the survey.
This F Sig. is less than 0.05 (5% risk) and will lead us to conclude
that there is a significant difference among the three means,
which are listed as 5.47, 5.53 and 5.25 in Exhibit 46. Also, notice
the 95% confidence intervals and how they overlap or don't. The
F Sig. indicates that the F Ratio of 3.794 is significantly larger
than 1.0 and that at least one of the three means is significantly
larger or smaller than the other two.
Exhibit 48 is the result of asking for the "Tukey's-b" test in the
Post-Hoc window. The key part of this output is the grouping of
means in homogeneous subsets. The mean for "Afterthought"
stands by itself in Subset 1 with a mean value of 5.25. The other
two means, for "One Reason" (mean of 5.47) and "One of
Several (reasons)" (mean of 5.53), are grouped in subset 2. While
this test is not definitive, it indicates that the mean overall
satisfaction with the library for those who visited the library as
an afterthought was significantly lower than was the overall
satisfaction for the other two groups, which are considered to
Exhibit 48
60
A Guide to PASW (SPSS) Statistics 18
not be statistically different between each other. If the variances
differ among the groups, one of the other appropriate post hoc
tests should be used. (See Exhibit 44.)
The "Means Plots" is shown in Exhibit 49. These graphs often
help in interpreting the ANOVA output by providing a visual
perspective to the analysis. No statistical information is provided
in these plots. However, you can probably detect that the mean
satisfaction for those who said their visits were "Afterthought" is
much lower than the means for those who said that going to the Exhibit 49a
61
A Guide to PASW (SPSS) Statistics 18
library was their "One reason" for going out and going to the
library was "One of Several (reasons)" for going out.
Always be careful of tables and graphs. Notice the vertical scale
in Exhibit 49a; it is very tight. By double clicking on a graph, you
can alter almost all of its aspects. I rescaled the graph to produce
Exhibit 49b with the question scale of 1 to 6. It is still legitimate
to consider satisfaction when going to the library as an
afterthought to be statistically less than the satisfaction when
going to the library when it was your primary reason or one of
Notice that in Exhibit 49b, the ver-
tical scale was changed and the
font size was increased on the axis
labels. Try doing this yourself;
SPSS offers tremendous versatility
in designing graphs. Plus, when
you get a format that you like, you
can save it as your template for
other graphs.
Exhibit 49b
62
A Guide to PASW (SPSS) Statistics 18
several reasons. However, that difference must be considered
relative to the scale on which respondents answered the
question. This finding makes sense, statistically. And, while the
difference is not large when considered on the scale, it should
make the library management think about motivating residents
to think of the library more often.
63
A Guide to PASW (SPSS) Statistics 18
Linear Regression
Up to this point, we have always been investigating one variable,
relationships between two variables, or differences between two
or more groups. Now we will begin to consider multivariate
analysis (i.e., more than two variables and how they are related).
To do this, we'll use simple linear regression (still bivariate) and
multiple linear regression (multivariate analysis).
In our library example, we will strive to determine those aspects
of the library and the staff that have the greatest influences on
overall satisfaction with the library. The first part of the process
should be familiar to you. Pull down the Analyze menu then slide
out the Regression menu, as you see in Exhibit 50. Choose
"Linear..." for linear regression, and the Linear Regression
window opens (see Exhibit 51).
We will be working only with linear regression in this section.
However, look at all of the variations listed in the regression slide
-out menu in Exhibit 50. Each of those methods provides several
variations for applying each method. In addition, many of the
other statistical methods listed in the Analyze menu are
variations of regression or related to regression.
In the Linear Regression window, select the overall satisfaction
variable and place it in the Dependent section. Now, select the
variables "Always noisy [Noisy]" to "Staff is easily approachable
[Approachable]" plus "Convenience of Hours [Hours]" from the
left window and slide them into the Independent(s) window. Pull
Regression is an extremely impor-
tant and extensively useful statistical
technique. Your time would be well-
spent in understanding regression
methods, execution techniques, and
interpretation.
64
A Guide to PASW (SPSS) Statistics 18
down the menu within “Method:” and select Stepwise. Although
there are many other triggers that can be selected in Linear
Regression, we've done the basics. Click the OK button and SPSS
will calculate and display quite a large number of statistics.
Since we selected Stepwise Regression, SPSS will present the
step-by-step results that are obtained by adding in the variable
that provides the most contribution during each step.
(Sometimes a previously entered variable is deleted on a step.)
Exhibit 50
65
A Guide to PASW (SPSS) Statistics 18
There are several statistics that are important to understand,
basically, in regression. The first are R, R-square, and Adjusted R-
square. The Model Summary in Exhibit 52 provides these
statistics for each of the six regressions of this analysis.
R is the correlation coefficient. R-square is, of course, the square
of R and is called the coefficient of determination. R-square
indicates the percentage of variation in the dependent variable
“Satisfaction”, that is explained by the set of independent
variables that are in the analysis at that stage. We'd like this to
be as large as possible (the maximum of R-square is 1.0). The R-
square of 0.235 might not seem all that impressive to you at
first. However, let's look at the whole picture. The Adjusted R-
square provides an R-square that reflects the number of
variables and the sample size.
Exhibit 51
66
A Guide to PASW (SPSS) Statistics 18
You'll see the ‘standard error of estimate’ statistic listed next.
That statistic by itself is often not very informative. However, it is
very helpful when comparing two or more regressions. We
should be led to consider more favorably those regressions from
a set that have smaller standard errors and larger R-squares. As
you can see in Exhibit 52, each regression shows improved
values of the Rs and the ‘Std. Error of the Estimate’. This is what
should happen, and typically does occur, in a stepwise
regression.
Exhibit 53 shows the ANOVA for each of the six steps in the
regression. The lower part of the table shows the variables that
are added to the regression equation on each step. Easy-to-use
arrangement, was included first and Always noisy, the library is
always noisy, was added on the sixth step.
Exhibit 52
These letters represent the
successive models, as SPSS
adds variables to the re-
gression equation. The
stepwise procedure strives
to improve the solution on
each step.
67
A Guide to PASW (SPSS) Statistics 18
The key information to get from the Analysis of Variance part of
the table is that the ‘Sig.’ = .000 for each of the regressions.
Since this value is less than 0.05 (5% risk), we can conclude that
these regressions do provide some potentially valuable
information. If ‘Sig.’ was greater than 0.05 and a 5% level of risk
was being used, the conclusion would be that the set of variables
68
A Guide to PASW (SPSS) Statistics 18
was not helpful in explaining overall satisfaction. Even though
the F value gets smaller with each step, the ‘Sig.’ is still smaller
than 0.000, even in Model 6.
After successfully passing the test of whether the regression is
valuable to work with, we can proceed to the part of the table
that provides specific information on the variables being
analyzed. The variables in the regression equation are listed in
Exhibit 54 along with B, the regression coefficient, the Std. Error
of B, the standardized regression coefficients, Beta, a measure of
the relative impact of each variable on Overall satisfaction, T, a
measure of the relative distance between the B value and a
slope of zero, and ‘Sig.’ of T. This last measure is compared to
0.05 and, if smaller or equal to 0.05, indicates that the
corresponding variable might have a significant effect on the
dependent variable.
Notice that ‘Always noisy’ and ’Staff is to too busy to assist’ have
negative coefficients. This is because they were stated in a
negative fashion in the questionnaire (e.g., "The Library is always
noisy"). All six variables have Sig. t values less than 0.05. (This is
the work of Stepwise to allow into the analysis only those
variables that are significant.)
If you arrange the Beta values in order by absolute value, the
following relative impacts can be seen:
� Easy to use arrangement (Beta = 0.210; mean= 4.84)
� Staff appears knowledgeable (Beta = 0.186; mean= 5.24)
� Great contribution to Hamilton (Beta = 0.148; mean= 5.52)
The null hypothesis is that the
slope, B, of the variables is
zero. The values of t measure
how far each unstandardized
regression coefficient is from
the null hypothesized value of
zero.
69
A Guide to PASW (SPSS) Statistics 18
� Convenience of hours (Beta = 0.114; mean= 4.96)
� Staff is too busy to assist (Beta = -0.112; mean= 2.34)
Exhibit 54
70
A Guide to PASW (SPSS) Statistics 18
� Always noisy (Beta = -0.097; mean=2.33)
The regression equation can now be written as:
Satisfaction = 3.57 + 0.115*(Easy to Use Arrangement) + 0.145*
(Staff Appears Knowledgeable) + 0.105*(Great Contribution to
Hamilton) + 0.056*(Convenience of Hours) - 0.052*(Staff is too
busy to assist) - 0.050*(Always noisy).
The major use of the regression results is usually not obtained by
directly substituting values of the independent variables into the
equation. Rather, by knowing the relative impact, we can better
understand those actions that can influence the satisfaction of
the library's customers. This impact of the individual variables is
typically based on the Beta values, which are considered to be
ordinal indicators.
The mean scores for the six factors indicate that the Library
received very credible, if not outstanding, scores on all
dimensions. The two lowest scores are "noisy" and "busy," with
"hours" next. If management were to identify areas where they
might be able to increase an already very high level of overall
satisfaction (mean= 5.48), noise in the Library and the
appearance of being too busy among staff might be areas to
investigate. However, keep in mind that with the already
extremely high satisfaction, and the high scores on noise and
staff assistance, that there might not be too much direct payback
from this investigation. Keeping these scores high should result
in continuing high levels of customer satisfaction.
Caution. A basic assumption about the final set of independent
variables in a regression is that they are highly correlated with
71
A Guide to PASW (SPSS) Statistics 18
the dependent variable, but not with each other. This
assumption should be checked as a normal part of any
multivariate regression analysis. Sometimes the term
multicollinearity is used to describe this condition of dependence
among the independent variables. A significant amount of
collinearity among the independent variables may invalidate the
regression; using such findings could lead to erroneous
marketing decisions. Since this topic requires more statistical
background than is appropriate for this tutorial, please look in
your marketing research text or in a statistics text.
This has been a very cursory treatment of regression. Once
again, the objective has been to provide a quick introduction to
SPSS, not a refresher course in statistics. Be aware that effective
use of regression relies on a more complete understanding of
the topic than has been presented here. Please consult texts in
statistics and marketing research for fuller discussions of this
important topic.
72
A Guide to PASW (SPSS) Statistics 18
Setting Up the Data Before analyzing data, it is important to understand how to set
up a data table in SPSS so that it can be analyzed and so that the
output is as usable and attractive as possible. The Library SPSS
data set was fully formatted and you had no work to do before
beginning the analysis of the data. However, it will be essential
for working with other data sets to read and understand this
section on how to set up and format a data set.
Typical result of a survey is numeric and character, or string, data
that may be arranged in many different ways. If the survey was
fielded through a commercial marketing research field agency,
the client can ask to have the data produced in almost any
convenient format. A very basic way of providing data is in an
Excel spreadsheet or in a tab or comma delimited text file, either
of which can be very easily imported into SPSS.
As an example, the Library data has been included in an Excel file
named just Library Excel. Part of that data in the Excel format is
included in Exhibit 55. Note that the first row contains the
variable names that you saw in the SPSS version of that data file
in Exhibit 4. Of course, those names had to be typed into that
first row of the Excel spreadsheet or into the SPSS file. For this
data, it really does not matter whether those names were typed
into Excel or SPSS.
An important feature of any statistical analysis program is that
data can be entered easily and that data that was provided
73
A Guide to PASW (SPSS) Statistics 18
originally in either an Excel file or in a delimited text file can be
imported with very little work.
The first step in importing the Library Excel.xls file into SPSS is in
Exhibit 56, i.e., pull down the File menu and select Data. After
clicking on Data, you will see a window that looks similar to
Exhibit 57. You will probably see something different from the
Downloads folder that appears in the central window of Exhibit
Exhibit 55
Exhibit 56
74
A Guide to PASW (SPSS) Statistics 18
57. Pull down the ‘File of type:’ window and select Excel (*.xls,
*.xlsx, *.xlsm). Then navigate to the folder in which you saved
the data files for this tutorial. When you arrive at that folder,
e.g., ‘C:\\Marketing Research’, you should see a window similar
to Exhibit 58. Now, just click on the Library Excel.xls file. You will
then see Exhibit 59. The Library Excel.xls file contains variable
names in the first row, so leave that box checked in Exhibit 59.
Just click “OK” and the data set will open in SPSS, as seen in
Exhibit 60.
As you can see, the SPSS spreadsheet in Exhibit 60 looks very
similar to the data in Exhibit 55. However, you will have noticed
that the Value Labels that are shown in Exhibit 4 do not appear
in Exhibit 60. And, those Value Labels can’t be made to appear
Exhibit 57
75
A Guide to PASW (SPSS) Statistics 18
by pulling down the View menu and clicking Value Labels. What
do you think the 999s signify in Exhibit 60. As you probably
guessed, those 999s indicate “missing values”. Respondents 5, 8
and 20 did not answer the question that asked for the number of
times the library was visited during the past month. Also,
respondents 2 and 19 did not enter an amount of money they
expected to spend during their trips to the library.
Exhibit 58
Exhibit 59
76
A Guide to PASW (SPSS) Statistics 18
Exhibit 60
77
A Guide to PASW (SPSS) Statistics 18
Entering Data into PASW
Statistics 18®
Up to now you've been working with a prepared data set. Your
next step is to go through a brief introduction to using the SPSS
Data Editor. Entering a data set into SPSS Windows® is a very
simple and logical process.
The first thing to do is pull down the file menu and slide the New
menu over to Data, click and release, as shown in Exhibit 61. You
will then see a fully blank spreadsheet (Exhibit 62), that will act
as our vehicle for entering data.
Let's say that you have a very short four-question survey. The
variables in the data set are:
Exhibit 61
Exhibit 62
78
A Guide to PASW (SPSS) Statistics 18
� ID - respondent identification number;
� Usage - the number or times out of the last 10 purchases that
the brand of interest was purchased;
� Intention - respondents' intentions to buy the brand of interest
when they next buy the product, measured on a 5 point
intention scale with 5 being Definitely Will Buy, 4 is Probably
Will, 3 is Might or Might Not, 2 is Probably Will Not, and 1 being
Definitely Will Not Buy; and
� Sex- the sex of the respondent.
To being this process, click on Variable View in the lower left
corner of the spreadsheet. Now, type in "ID" under "Name" in
the first row and click the Enter key. Notice that SPSS
automatically entered information into the cells in the ID row of
the spreadsheet. (See Exhibit 63.) While SPSS should be
commended for trying to be helpful, some of the information is
correct, some is not and some needs to be completed. Of course,
the name ‘ID’, is correct and the Type is numeric. The width is
fine, but there will be no decimals in the respondent
identification values. Change the ‘2’ decimals to ‘0’. Type
‘Respondent identification number’ into the Label column. There
will be no Values and none of the respondents will be missing.
Exhibit 63
79
A Guide to PASW (SPSS) Statistics 18
The number of columns can stay at 8 and the alignment can
remain as specified. The Measure of this variable, ID, is nominal.
Change Scale to Nominal by clicking in the right area of the
rectangle and pulling down the menu to Nominal. Three circles
should now show in that cell. The Role of Input can be changed
to None by clicking in the right hand area of the Role cell, pulling
down the menu and clicking on None. Now, your first row should
look like Exhibit 64.
Move down to the second variable row and type "Usage," and so
on. Notice that SPSS automatically fills in the other dimensions
of each variable with default values. You should see Exhibit 65 on
your screen.
You should now change the parameters for each variable so that
they are appropriate for that variable. To make those changes,
Exhibit 64
Exhibit 65
80
A Guide to PASW (SPSS) Statistics 18
click in the right part of each cell. You should keep Type, Missing,
Columns and Align as they appear in Exhibit 65.
Change Decimals to 0 (zero) and change the Variable Labels to
those in Exhibit 66. Assign Value Labels, i.e., Values, as you see in
Exhibit 67 by clicking in the right part of the Values box for a
variables box and then typing in the label that you want for each
of the numerical values that variable might take. Adding Values
for the Intention variable is shown in Exhibit 67. You should also
change the Measure to Nominal for "Id," Scale for "Usage," Scale
for "Intention," and Nominal for "Sex."
The hard part is now finished. All you need to do next is to shift
to the Data View and type in the data. For respondent 1, if you
type in 1, 3, 3, 1, the first line of the Data View editor will look
Exhibit 66
Exhibit 67
81
A Guide to PASW (SPSS) Statistics 18
like the table in Exhibit 68. Continue entering the data until your
Data View editor looks exactly like Exhibit 56. When you have
finished, pull down the File menu and Save the data using
whatever name you desire.
Exhibit 68
82
A Guide to PASW (SPSS) Statistics 18
Manipulating the Data File
There will be occasions when you will need to change the data
that you originally entered in a file. We will illustrate two of the
most often used procedures: Select Cases and Recode. You are
highly encouraged to investigate and play with the other Data
functions; these are found under the Data, Transforms and
Utilities menus.
Filtering the Data Set
Pull down the Data menu and notice Select Cases near the
bottom of the menu in Exhibit 69; this is used when you want to
work temporarily with only part of the data set. To use this
function, pull down the Data menu and click on "Select Cases..."
The Select Cases window appears, as in Exhibit 70. Let's say that
in our little practice data set that we want to analyze only those
cases where the respondents had bought our brand at least 7
out of the last 10 times. Click on the circle next to "If condition is
satisfied" and then click on the "If..." button.
The "Select Cases: If" window pops up and you have the
opportunity to create an "if statement" that must be satisfied for
a case to remain active in your forthcoming analyses. You can
see, in Exhibit 71, that we have selected ‘Usage’ variable and
moved it to the central section. Click on the ">=" button and
then click ’7’ in the keypad. You may also just type in all of these
83
A Guide to PASW (SPSS) Statistics 18
characters from the keyboard. While you are still in the window
of Exhibit 71, notice the tremendous versatility of this option.
While you may type complex statements for selecting cases, you
may also select functions from the ’Function Group:’ menu to
help identify cases of interest.
Press the Continue button and you'll return to the Select Cases
window. You now have the option to filter out those cases that
do not satisfy the "If" statement, or delete those cases. Choose
"Filter" and then click OK.
Exhibit 69
84
A Guide to PASW (SPSS) Statistics 18
Be very careful at this stage. Notice in Exhibit 72, which is the
Output block from Exhibit 70, that the ‘Filter out unselected
cases’ option is selected by default. There are two other options.
The second, ‘Copy selected cases to a new dataset’ is available if
you want to build a new dataset with only those respondents
who satisfy your selection criteria.
The last of the three options, ‘Delete unselected cases’ is
hazardous since cases will be eliminated permanently from your
dataset. There will be times when this is exactly what you want
to do. However, if you incorrectly click that button, you may be
very sorry.
Exhibit 70
85
A Guide to PASW (SPSS) Statistics 18
After clicking the ‘OK’ button, you'll see the Data Editor, as
portrayed in Exhibit 73, with slashes through those cases that did
not meet the requirements of the ‘If’ statement. Notice that a
filter variable has been generated and placed in the file.
If you analyze your data now, you will be working with only the
three people who had bought the brand at least 7 out of the last
Exhibit 71
Exhibit 72
86
A Guide to PASW (SPSS) Statistics 18
10 times. The two unselected cases can be brought back into the
data analysis by pressing the reset button in the Select Cases
window, shown in Exhibit 73. If you wanted to permanently
delete those two respondents who had bought the brand 6 or
fewer times, you would click on the Deleted button near the
bottom of the window in Exhibit 70.
Recoding Values
The Recode function is at least as valuable as the Select Cases
function. There are many situations in which the analyst wants
to change the values of the original variable in some way.
Perhaps the initial coding was not done correctly. In many cases,
the initial coding of a variable can be changed to assist with a
more informative analysis.
For example, in our data entry case you might want to recode
the Intention to Buy Next variable into "positive intention" (i.e.,
probably will buy and definitely will buy), and "negative
intention," meaning "probably will not buy" and "definitely will
not buy." We'll illustrate this recoding now.
Exhibit 73
87
A Guide to PASW (SPSS) Statistics 18
First, pull down the Transform menu, as in Exhibit 74, and then
click on the ‘Recode into Different Variables…’ option. Be careful;
Exhibit 74
Exhibit 75
88
A Guide to PASW (SPSS) Statistics 18
if you select ’Recode into Same Variables …’, the original variable
will be replaced by the new recoding and lost forever.
The Recode into Different Variables window, in Exhibit 75, will
be accessible. Move the ‘Intention’ variable into the central
‘Numeric Variable’ window, rename the Output Variable as
"Intention2" and type the label ‘Intention to Buy, Pos & Neg’ into
the Label box. Now press the ‘Old and New Values…’ button and
the Recode into Different Variables: Old and New Values window
will open (Exhibit 76).
In the left window, highlight the circle next to ‘Range:’ and enter
1 in the top box, and 2 in the box below "through." In the New
Value box in the upper right, highlight the Value circle and enter
the value 1. Then click the Add button and you'll see this
recoding added to the ‘Old --> New’ window. Now, enter 3 and 4 Exhibit 76
89
A Guide to PASW (SPSS) Statistics 18
in the two ‘Range:’ boxes, and 2 in the New Value box. Press
Add. Finally, highlight the ‘All other values’ circle on the lowest
left circle button and ‘System-missing’ in the upper right-hand
side of the window. Click Add and then Continue. This will put
you back in the Recode window where you should press the
Change button and then click OK.
The results of your recoding are displayed in the Data Editor
matrix in Exhibit 78. Notice that the values of ‘Intention’ have
been transformed to ‘Intention2’ as you directed. The values of
the two cases that stated ‘might or might not’, (neither positive
nor negative), have been set to ‘missing’ (indicated by the dots).
When you analyze this variable, there will be fewer valid cases
than with the original variable, ‘Intention’. Two respondents will
Exhibit 77
90
A Guide to PASW (SPSS) Statistics 18
be categorized as having positive intentions to buy and one as
having negative intentions to buy. Now, you should go into the
Variable View and provide value labels for ‘Intention2’. If you
want to save this transformation, be sure to click on the diskette
icon for ’Save’ in the Tool Bar.
Exhibit 78
91
A Guide to PASW (SPSS) Statistics 18
Non-parametric Analysis
There are some situations where data is available that is not
conducive to parametric analysis. Parametric analysis, very
basically, refers to those statistical procedures that assume
particular probability distributions underlie the data. In some
cases, this assumption is obviously wrong or the analyst is
uncertain which distribution might be present. In these
situations, non-parametric procedures exist for analyzing the
data. We've already used two of these, the Chi-Square test and
Lambda. While there are many non-parametric test that can be
very valuable, we will work with just two, the Friedman test and
another version of the chi-square test.
Friedman Two-way Analysis of Variance
for Ranked Data
In this section, we'll look at one additional non-parametric test
that is often very helpful when analyzing survey data. Let's say
that you asked people to rank five Arnold Schwartzenegger
movies: Twins, True Lies, Eraser, Conan the Barbarian, and
Kindergarten Cop. Respondents were asked which of the five
they liked best (coded 1), second best (2), and so on. As an
example, we have the findings from 20 respondents. These are
provided in the file ARNOLD.SAV that accompanies this book.
The partial data matrix is presented in Exhibit 79.
92
A Guide to PASW (SPSS) Statistics 18
A legitimate question is whether any of these movies is
significantly better liked or worse liked than the others.
Remember back to ANOVA; doesn't this question sound very
much like what was asked when you performed an ANOVA on
parametric data? However, we can't use the parametric ANOVA
that we used before.
Fortunately, a test called Friedman's Two-way Analysis of
Variance by Ranks does exist to specifically analyze this type of
Exhibit 79
93
A Guide to PASW (SPSS) Statistics 18
data. Before beginning this test, make sure that the movie
variables are specified as ‘scale’ in the measure column of the
Variable View. Pull down the Analyze menu, slide over the
Nonparametric Tests menu, and then choose ‘Related Samples’.
(See Exhibit 80.)
The window for ’Nonparametric Tests: Two or More Related
Samples’ opens and is shown in Exhibit 81. That window has
three tabs and Exhibit 81 is open to the ’Fields’ tab. Select the
Exhibit 80
94
A Guide to PASW (SPSS) Statistics 18
five movie titles from the left window and then click on the
arrow that will move them into the right box for ’Test Fields’.
Next, click on the ‘Settings’ tab and you will see Exhibit 82. There
are two ways to run the Friedman test: 1) just leave the
‘Automatically choose the test based on the data’ button clicked
and click the ‘Run’ button; or 2) click on the ‘Customize tests’
and then click on the box next to Friedman’s in the Compare
Distributions box. It might be instructive to do both.
Exhibit 81
95
A Guide to PASW (SPSS) Statistics 18
Let’s go the ‘Compare Distributions’ route. When you take the
‘Compare Distributions’ alternative and click on the Friedman’s
box, you’ll see that you can pull down the menu next to
‘Multiple comparisons’. That menu is shown in Exhibit 83. If you
click on ‘All pairwise’ or ‘Stepwise step-down’ in Exhibit 83, you
will get an overall analysis plus statistical
comparisons between each of the movies
in all possible pairs. That analysis will
Exhibit 82
Exhibit 83
96
A Guide to PASW (SPSS) Statistics 18
indicate which movies are differently ranked compared to each
of the other movies.
The overall analysis output in Exhibit 84 states the ‘Sig.’ to be
0.001 and the Decision is to reject the null hypotheses. In this
case, the null hypothesis is that all 5 of Arnold’s movies are
equally liked. Rejecting that null hypothesis means that at least
one of the movies is significantly better liked or worse liked than
the others or that there are significant difference at more levels
between the movies.
Now, double click on the box portrayed in Exhibit 84 and
additional levels of analysis are displayed, as in Exhibit 85.
Actually, the display in SPSS is too wide to display on one page,
so one plot on either side of the three shown was cropped from
the exhibit. You’ll now see in Exhibit 85 the Friedman test
statistic of 17.880 and the significance level. Also, you can see
that the graphs of the ranks are quite different.
At the bottom of the window that contains Exhibit 85 is a menu
bar that is shown in Exhibit 86. Not all of those options will be
Exhibit 84
97
A Guide to PASW (SPSS) Statistics 18
available for all datasets. We will just investigate ‘Pairwise
Comparisons’ and ‘Homogeneous Subsets’.
If you select ‘All pairwise’, you will see a rather elaborate
statistical test that provides significance tests between all pairs
of the five movies, i.e., 10 tests in all. Plus, you get two graphs.
Exhibit 87 shows the pairwise tests covering all of the movies.
You’ll see that the analysis of two pairs of movies produced
significant differences, i.e., tests that were significant at the 5%
Exhibit 85
Exhibit 86
98
A Guide to PASW (SPSS) Statistics 18
level of risk. True Lies and Conan were judged to be significantly
different as were True Lies and Kindergarten Cops.
Above the table in Exhibit 87 in the SPSS output is a graph that is
shown in Exhibit 88. If you place your cursor on any of the lines,
the adjusted significance of the difference between the two
movies represented at the end nodes will be displayed. The Adj.
Sig. = .002 in the exhibit indicates that Kindergarten Cop and
True Lies are significantly different.
Exhibit 87
99
A Guide to PASW (SPSS) Statistics 18
You will now see a small button call ‘Layout’ in the menu bar
below the table in Exhibit 87. If you click that button, you’ll get
yet another graph. (See Exhibit 89.) That graph provides the
same information displayed in Exhibit 88 but is just drawn
differently.
If you choose ‘Stepwise step-down’ in Exhibit 83 and then select
‘Homogeneous subsets’ in Exhibit 86, a test will be performed to
Exhibit 88
Exhibit 89
100
A Guide to PASW (SPSS) Statistics 18
essentially tell you which groups of movies are judged to be
similar among themselves and, perhaps, different from another
subset of movies, or just one other movie. That output is shown
in Exhibit 90.
The Mean Ranks for each of the movies is shown in Exhibit 90.
True Lies ranks highest with a mean rank of 1.80. The lowest
rank (highest mean) is Kindergarten Cop. The movies are listed in
two columns. The Subset 1 column indicated that True Lies and
Twins lie in that subset and Subset 2 contains Twins, again, plus
Eraser, Conan and Kindergarten Cop. In the pairwise analysis, we
saw that True Lies and Twins were not significantly different. We
Exhibit 90
101
A Guide to PASW (SPSS) Statistics 18
also saw that True Lies was significantly different from Conan
and from Kindergarten Cops. So, most of the information from
the pairwise analysis is shown here. However, the detail of the
pairwise analysis does not all show in the homogeneous subsets
table. Essentially, there are two subsets with Twins belonging to
both. This should lead us to conclude that True Lies is
significantly different from the other movies and is the most
highly preferred of the five and my personal favorite from the
Gubernator.
102
A Guide to PASW (SPSS) Statistics 18
Chi-Square Test for Uniformity of a
Frequency Distribution
The last test will be another nonparametric test, a Chi-Square
test that measures whether an observed frequency distribution
can be considered to be a uniform distribution (all values appear
with about the same level of frequency), or is significantly
different from a uniform distribution.
We'll return to the Central Public Library for an example. Let's
suppose that someone hypothesized before the survey that all
the floors of the library were used about equally by customers.
The frequency distribution appears in Exhibit 91 and the graph in
Exhibit 92.
The null hypothesis is that an equal percent of the sample stated
that they used each of the five floors of the public library. The
null hypothesis always sets up an expectation for the data. What
would that be in this example? Of course, we should expect that
Exhibit 91
103
A Guide to PASW (SPSS) Statistics 18
one-fifth, or 20%, of the 776 respondents who answered this
question to say each of the five floors.
This Chi-Square test is obtained by pulling down the Analyze
menu, sliding over the Nonparametric Tests menu, and choosing
‘One sample test…’ (See Exhibit 93). Open the ‘Fields’ tab. Then
click on ‘Use custom field assignments’ and select ‘Floor used
most’ and sent it into the right side box, ‘Test fields’, by itself.
The simplest thing to do now is to just click the ‘Run’ button and
let SPSS do the calculations. If you want to have more control
over the process, click on the ‘Settings’ tab and then on
Exhibit 92
104
A Guide to PASW (SPSS) Statistics 18
‘Customize settings’. Now, you can choose the type of statistical
non-parametric test that you would like to perform on the data.
The most appropriate test for the data and the hypothesis is the
chi-square test to compare the observed probabilities to the
observed probabilities. When you click on that option shown in
Exhibit 95, you should then click on the options button. There
are two options: 1) to test whether the classes contain equal
percentages of respondents; or 2) to test against some other
distribution of percentages, which can be entered into the table
provided in Exhibit 95.
Option 2 allows you to test the observed percents against, for
example, a distribution that occurred the last time the study was
Exhibit 93
105
A Guide to PASW (SPSS) Statistics 18
conducted. Or, perhaps someone hypothesized a distribution of
customers using the floors of the library based on their
observations. By entering those values in the table, you could
test whether those observations from the past are confirmed by
the data or not.
The output from this chi-square test is quite extensive, as was
the production from the Friedman’s test. Whether you chose the
default values or manually specified the chi-square test of the
uniform distribution, the first information produced is the table
shown in Exhibit 96.
You can see from Exhibit 96 that SPSS automatically choose the
one sample chi-square test, unless you chose it manually. Also, it
Exhibit 94
106
A Guide to PASW (SPSS) Statistics 18
takes the variable label and sets up the null hypothesis in the
right-most pane. Notice that the ‘Sig.’ value is 0.000, which
indicates that if we were working with a 5% level of risk then we
should reject that null hypothesis. That means that if you felt
that the graph in Exhibit 92 did not look uniform, you’re correct,
statistically.
Exhibit 95
Exhibit 96
107
A Guide to PASW (SPSS) Statistics 18
Next, double click on that box in Exhibit 96 and the information
shown in Exhibit 97 will pop up for you. The chi-square test
statistic is 81.436. If you look in the back of your statistics or
marketing research textbook in the chi-square table, you’ll find
the critical value of 9.49 for a 5% level of risk of a type I error and
4 degrees of freedom. That’s the critical value for the chi-square
distribution that cuts off 5% of the tail to the right of the
distribution. If the calculated chi-square value is 9.49 or higher,
then the null hypothesis should be rejected. In this case, the
81.436 is huge compared to 9.49 and you should feel very
comfortable in rejecting the null hypothesis. Of course, the
108
A Guide to PASW (SPSS) Statistics 18
‘Asymptotoc Sig.’ of 0.000 tells you that you would have a level
of risk much smaller than 5%, or 0.05, and should reject the null
hypothesis.
You can conclude that the floors are not used with the same
frequency; it appears that the fourth is used more often than
others and that the first floor is used much less than the other
four.
109
A Guide to PASW (SPSS) Statistics 18
Summary
This introduction to PASW (SPSS) Statistics 18® has covered quite
a bit of material. The intention was to give you a quick way to
become familiar with the capabilities of SPSS 18.0 for Windows®
and to give you some experience with a real data set. Also, you
established a small data set on your own that requires many of
the basic actions of entering a survey data set of realistic
proportions.
The best way to gain knowledge of data analysis is to conduct a
detailed analysis on a real set of data. The data set for the
Hamilton Central Public Library is available to you and can serve
this purpose very well. Whether you're using PASW (SPSS)
Statistics 18® or PASW (SPSS) Statistics 18® Student Version,
SPSS provides enough functionality to allow you to conduct
professional-level analyses. I hope you benefit from these
exercises.
110
A Guide to PASW (SPSS) Statistics 18
111
A Guide to PASW (SPSS) Statistics 18
Appendix
Central Public Library
On-site Library User Survey
Questionnaire
(Shortened from original version)
It was necessary to reduce the number of variables in this questionnaire to fewer than 50 so
that it could be accommodated by the Student Version of PASW (SPSS) Statistics 18®. The
original version of this questionnaire was handed out to library customers during the hours of
operation for 7 consecutive days. Customers completed the questionnaires and deposited
them in collection boxes on each floor of the library. Several other surveys, besides this on-site
version, were executed for the library during this study. This study was conducted several
years ago and is used here as a sample data set to assist with the learning of SPSS. We’re
grateful to the Hamilton Central Public Library for releasing this data for educational purposes.
Introduction to Survey Data Analysis Using SPSS Windows & Student Version
Appendix Hamilton Central Public Library
On-Site Library User Survey
Questionnaire
(Shortened from the original version.)
Copyright 1999, 2000, 2003 by Dr. Ken Deal No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.
Introduction to Survey Data Analysis Using SPSS Windows
Marketing Decision Research Inc.
Hamilton Central Public Library Community Opinion Study
Have you filled in this form before? If so, thank you very much. We don't expect you to fill it in again.
The Hamilton Central Public Library is in the process of conducting a study to determine how it can better serve this community. This questionnaire is being used to obtain the opinions of you and others who use this Library. We'd be very grateful if you would carefully complete all of the questions and return the questionnaire to the attendant or to the Survey Return Box on the Information Desk for this floor. Your answers will be treated anonymously and confidentially. 1. When was the last time that you were in this Library before today? (Please check the box.) [ ]1 EARLIER THIS WEEK 2. How many times have you visited this [ ]2 LAST WEEK Library, including today's visit during the [ ]3 THE WEEK BEFORE LAST past month? [ ]4 THREE TO FOUR WEEKS AGO [ ]5 BETWEEN ONE AND THREE MONTHS AGO ___ ___ ___ NUMBER OF TIMES [ ]6 BETWEEN THREE AND SIX MONTHS AGO [ ]7 BETWEEN SIX MONTHS AND ONE YEAR AGO [ ]8 ONE YEAR AGO OR LONGER [ ]9 FIVE NEVER BEEN IN THIS LIBRARY BEFORE TODAY 3. What mode of transportation did you use to get to the Hamilton Central Public Library today? [ ]1 WALKED [ ]6 BUSINESS VEHICLE [ ]2 BUS [ ]7 RENTAL VEHICLE [ ]3 TAXI CAB [ ]8 BICYCLE [ ]4 SPECIAL TRANSPORTATION FOR DISABLED, SUCH AS DARTS [ ]5 PERSONAL VEHICLE 4. Where were you just before you came to this Library today? [ ]1 AT HOME [ ]2 AT WORK [ ]3 AT SCHOOL [ ]4 SHOPPING [ ]5 LEISURE ACTIVITY 5. Which of the following best describes how you happened to come to the Central Library today? [ ]1 LIBRARY WAS THE ONE SPECIAL REASON FOR THIS TRIP [ ]2 LIBRARY WAS ONE OF SEVERAL THINGS I WANTED TO DO ON THIS TRIP [ ]3 THOUGHT OF LIBRARY AFTER STARTING OUT TO DO SOMETHING ELSE 6. How much money will you have spent on all products, services or other items in the downtown core area during this trip to this Library today? (Please write in amount.) $ ___ ___ ___ . ___ ___
Copyright 2003 by Dr. Ken Deal No reproduction of this material , nor can any made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.
Introduction to Survey Data Analysis Using SPSS Windows
7. On which floor of this Library have you spend the greatest amount of time today? [ ]l FIRST FLOOR [ ]4 FOURTH FLOOR [ ]2 SECOND FLOOR [ ]5 FIFTH FLOOR [ ]3 THIRD FLOOR
8. Did you use the floor indicated in Question 7 above more as a source of books or services or as a reading area?
[ ]l MORE AS A SOURCE OF BOOKS, OTHER MATERIALS OR SERVICES [ ]2 MORE AS A READING AREA
9. Please rate the range of selection of each of the following materials available from this Library. Check the box to the left of each row to indicate those that you used during 1990 and 1991. For each box checked, then circle the number on the scale shown below from very good selection to very poor selection. USED VERY SLIGHTLY SLIGHTLY VERY DURING POOR POOR POOR GOOD GOOD GOOD 1990? SELECTION SELECTION SELECTION SELECTION SELECTION SELECTION [ ]Fiction Books .......................1 .......................2 .......................3 ....................... 4……………. .. 5 ....................... 6 [ ]Business Books .....................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Video Cassettes ....................1………………2…………… ...3……………... 4 ....................... 5 ....................... 6 [ ]Magazines..............................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Newspapers ..........................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Books about Health ..............1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Music on Cassette .................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 10. We'd like your opinions about several aspects of the Library. Please circle the number that indicates how strongly you agree or disagree with the phrases listed below. STRONGLY SOMEWHAT SLIGHTLY SLIGHTLY SOMEWHAT STRONGLY DISAGREE DISAGREE DISAGREE AGREE AGREE AGREE …is always noisy……...…… ....1………… .......2…….… ..........3……………..4.……………5 ………..……6 …has an easy-to-use arrangement ………...1………… .......2 .......................3……………..4.……………5 ………..……6 …it's hard to find my way around ………...1………… .......2 .......................3……………..4.……………5 ………..……6 …is too modern …………......1……… ...........2 .......................3……………..4.……………5 ………..……6 …is a great contribution to Hamilton …… ...1………… .......2 .......................3……………..4.……………5 ………..……6 …has too many rules ………..1………… .......2 .......................3……………..4.……………5 ………..……6 11. What are your opinions about the staff in this Central Library? STRONGLY SOMEWHAT SLIGHTLY SLIGHTLY SOMEWHAT STRONGLY
DISAGREE DISAGREE DISAGREE AGREE AGREE AGREE …is friendly .............................1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6 …is too busy to assist me properly ………. 1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6 …appears to be knowledgeable 1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6 …is easily approachable ……. 1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6
Copyright 2003 by Dr. Ken Deal
No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.
Introduction to Survey Data Analysis Using SPSS Windows & Student Version 12. How convenient or inconvenient to you are the hours of operation of the Library?
[ ]6 VERY CONVENIENT [ ]5 SOMEWHAT CONVENIENT [ ]4 SLIGHTLY CONVENIENT [ ]3 SLIGHTLY INCONVENIENT [ ]2 SOMEWHAT INCONVENIENT [ ]1 VERY INCONVENIENT
13. How important to you personally are your visits to this Library?
[ ]6 EXTREMELY IMPORTANT [ ]5 VERY IMPORTANT [ ]4 FAIRLY IMPORTANT [ ]3 SOMEWHAT IMPORTANT [ ]2 SLIGHTLY IMPORTANT [ ]1 NOT IMPORTANT AT ALL
14. How important to your career are your visits to this Library? [ ]6 EXTREMELY IMPORTANT [ ]5 VERY IMPORTANT [ ]4 FAIRLY IMPORTANT [ ]3 SOMEWHAT IMPORTANT [ ]2 SLIGHTLY IMPORTANT [ ]1 NOT IMPORTANT AT ALL 15. In which of the following statements best describes how you usually use this Library? [ ]1 I BORROW MATERIALS FROM THE LIBRARY [ ]2 I STUDY IN THE LIBRARY [ ]3 I BROWSE IN THE LIBRARY [ ]4 I USE LIBRARY MATERIALS IN THE LIBRARY 16. Which of the purposes below best describes why you use this Library? [ ]1 FOR LEISURE OR RECREATIONAL READING [ ]2 FOR SCHOOL [ ]3 FOR LEARNING HOW TO DO THINGS [ ]4 FOR MY JOB [ ]5 FOR PERSONAL RESEARCH 17. How satisfied are you with this Library overall considering its services, its collections, its physical facilities and its staff? [ ]6 VERY SATISFIED [ ]5 SOMEWHAT SATISFIED
[ ]4 SLIGHTLY SATISFIED [ ]3 SLIGHTLY DISSATISFIED [ ]2 SOMEWHAT DISSATISFIED [ ]1 VERY DISSATISFIED 18. Are you male or female? [ ]1 FEMALE [ ]2 MALE
Copyright 2003 by Dr. Ken Deal No reproduction of this material can he made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.
Introduction to Survey Data Analysis Using SPSS Windows & Student Version 19. Please indicate the highest level of formal education which you've completed. [ I1 SOME GRADE SCHOOL [ ]6 GRADUATED COLLEGE [ ]2 GRADE SCHOOL [ ]7 SOME UNIVERSITY [ ]3 SOME HIGH SCHOOL [ ]8 GRADUATED UNIVERSITY [ ]4 HIGH SCHOOL GRADUATE [ ]9 SOW GRADUATE SCHOOL [ ]5 SOME COLLEGE [ ]10 MASTER'S DEGREE OR HIGHER 20. In which area do you live? [ ]1CENTRAL AREA OF HAMILTON [ ]6 DUNDAS [ ]2 NORTH END [ ]7 ANCASTER [ ]3 WEST END [ ]8 STONEYCREEK [ ]4 EAST END [ ]9 BURLINGTON [ ]5 HAMILTON MOUNTAIN [ ]10 OTHER 21. What is your occupation? [ ]1 ADMINISTRATIVE/MANAGEMENT [ ]9 FOREMAN/PLANT SUPERVISION [ ]2 PROFESSIONAL/SEMI-PROFESSIONAL [ ]10 FARMER [ ]3 SALES [ ]11 SELFEMPLOYED [ ]4 CLERICAL [ ]12 STUDENT [ ]5 FULL-TTME HOMEMAKER [ ]13 RETIRED [ ]6 SKILLED LABOUR [ ]14 UNEMPLOYED [ ]7 UNSKILLEDLABOUR [ ]15 DON'T WORK BECAUSE OF DISABILITY [ ]8 SERVICE WORKER [ ]16 OTHER 22. In which area do you work? [ ]1 JACKSON SQUARE/STELCO TOWER BLOCK [ ]7 DUNDAS [ ]2 CENTRAL AREA OF HAMILTON [ ]8 ANCASTER [ ]3 NORTH END [ ]9 STONEY CREEK [ ]4 WEST END [ ]10 BURLINGTON [ ]5 EAST END [ ]11 OTHER [ ]6 HAMILTON MOUNTAIN 23. In which year were you born? 19__ __ 24. Which language did you first learn to speak and still understand? (Please write on line below.) ___________________________________ Your answers to the questions above will help this Library serve you better in the future. Thank you very much for your help. We might need to check back with some people to-confirm their answers. So that we can do this, would you please you please print your name and phone number the line below? First Name Last Name Phone Number _____________________ _________________________________ ______________________________
Copyright 2003 by Dr. Ken Deal No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.
Introduction to Survey Data Analysis Using SPSS Windows & Student Version
Correspondence between question numbers and variable names. Q.1 TIMLAST Q.2 VISITMN Q.3 MODE Q.4 BEFORE Q.5 HACTRPWHY Q.6 SPENT Q.7 FLOORMST Q.8 FLOORUSE Q.9 USFIC SFIC USBUS SBUS USVIDEO SVIDEO USMAG SMAG USNEWS SNEWS USHEALTH SHEALTH USMUSCAS SMUSCAS Q.10 NOISY ARR WAY MODERN CONTRIB RULES Q.11 FRND BUSY KNOW APPR Q.12 HOURS Q.13 IMPPER Q.14 IMPCAR Q.15 HWUS Q.16 WYUS Q.17 SATIS Q-18 SEX Q.19 EDUCATN Q.20 LIVE Q.21 OCCUP Q.22 WORK Q-23 BORN Q.24 ETHNIC Copyright 2003 by Dr. Ken Deal No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.