a guide to pasw (spss) statistics 18

1

A Guide to PASW (SPSS) Statistics 18

A GUIDE TO PASW (SPSS) STATISTICS 18

Ken Deal

McMaster University

2


A Guide to PASW (SPSS) Statistics 18©

Copyright 1996, 1997, 1999, 2000, 2002, 2005, 2007, 2009 by Ken Deal and marketPOWER research inc.

First published 1996. Revised Editions 1997, 1999, 2000, 2002, 2005, 2007 and 2009.

All rights reserved. No parts of this publication may be reproduced, stored in a retrieval system, or transmit-

ted, in any form or by any means, without prior permission in writing of Ken Deal.

PASW Statistics 18 (formerly SPSS) are trademarks of IBM.

ISBN-13:

ISBN-10:

1 2 3 4 5 6 7 8 9 10 CP 0 9 8 7

Printed and bound in Canada.

Editorial Director:

Publisher:

Senior Sponsoring Editor:

Marketing Manager:

Editorial Associate:

Production Coordinator:

Senior Supervising Editor:

Cover Design:

Cover Design Credit:

Printer:

Library and Archives Canada Cataloguing in Publication

Deal, Kenneth R., 1944-

A guide to PASW (SPSS) Statistics 18 / Ken Deal

ISBN-13:

ISBN-10:

1. PASW (SPSS) Statistics 18 for Windows (Computer file) 2. Social sciences — Statistical methods

— Computer programs. 3. Statistics — Computer programs. I. Title.

HA32.D42 2006

3


Dedicated with love to my wife, Barbara.

4


5


Contents

Preface 7

Beginning PASW Statistics 18 9

Analyzing a Real Data Set 15

Frequency Distributions and Charts 18

Cross-Tabulations 31

T-Tests for Differences Between Means 44

T-Tests for Differences Between Paired Answers 50

Analysis of Variance (ANOVA) 55

Linear Regression 63

Setting Up a Data Set 72

Entering Data into PASW 77

Manipulating the Data File 82

Non-Parametric Analysis 91

Summary 109

Appendix: Questionnaire 111

6


7


Preface

One of the primary functions within marketing research is the

analysis of survey data. The mass of data is so great in all but the

very simplest of surveys that summarizing the data with anything

other than a major statistical package is tremendously time-

consuming and error-prone. PASW (SPSS) Statistics® is one of the

leading computer packages for analyzing survey data and is the

principle package for analyzing data from surveys conducted in

the social sciences.

PASW (SPSS) Statistics® Student Version is a somewhat scaled-

down version of PASW (SPSS) Statistics®. The main limitation is

that a maximum of 50 variables and 1500 cases can be analyzed.

In addition, several of the more powerful statistical procedures

from PASW (SPSS) Statistics® are included. However, there is

certainly enough power within PASW (SPSS) Statistics® Student

Version for a great deal of statistical calculation and data

manipulation.

The purpose of this introduction to PASW (SPSS) Statistics® is to

provide the beginning analyst with the knowledge and

experience to use the basic capabilities of SPSS. If you have the

Student Version, the manual that is included with the software

explains how to install the program and how to begin using SPSS.

Most people find PASW (SPSS) Statistics® to be a very easy

program to use. It is almost self-explanatory. Experience has

8


shown that the basics can be learned in three hours or less by

most people who have had a recent course in statistics.

The best way to use this tutorial is to simply follow the directions

in the book with your actions on the computer. If you don't get

exactly the same screens as the ones found in this tutorial,

simply back up until you find where your computer actions

diverged from the text. In many cases, it's critical that you read

and follow each word or symbol. If versions other than PASW

(SPSS) Statistics® Version 18.0 are used, screen response might

be somewhat different from what is presented in this manual.

However, most commands and output are so similar that the

translation to Version 18.0 takes minimal adjustment. In fact,

this manual can be used with any version, from 7.0 to 18.0.

Beginning with version 18, the application is referred to as PASW

Statistics 18.

The main data set used in this tutorial was obtained from a

survey conducted for the Hamilton Central Public Library,

Hamilton, Ontario, by Marketing Decision Research Inc. The

actual survey was substantially longer than the one provided in

this tutorial. This was done to keep the example within the

constraints of the SPSS Student Version. A questionnaire that

was shortened from the original to coincide with the data set is

provided in the Appendix. The Hamilton Central Public Library is

gratefully acknowledged for releasing this data to the author for

publication purposes, both educational and research-related.

9


Beginning PASW

(SPSS) Statistics®

At this point, we'll turn to the data set extracted from a survey

conducted for the Hamilton Central Public Library and work our

way through a basic analysis of that data. Before we discuss the

library data, let's open SPSS and set the backdrop for the rest of

this book.

A few assumptions are in order:

1. You are now sitting at a computer. There is no good

alternative to hands-on learning of computer applications.

2. Windows® 2000, XP, Vista or 7 is installed on your computer;

3. PASW (SPSS) Statistics 18® Student Version or PASW (SPSS)

Statistics 18® is installed on your computer (the screen captures

in this book will look very similar for any version of SPSS from

Version 7.0 through 18.0);

3. You have had a course in basic statistics or data analysis, or

have developed a good working understanding of statistical data

analysis; and

4. You understand the basics of using Windows® 2000, XP, Vista,

7 on a modern personal computer.

Your first action is to take the CD from the back of this manual or

from the book's website, insert it into your computer's CD drive,

10


and copy the files "Library SPSS" and "Arnold" to a new folder on

your computer's hard drive named "Marketing Research." We're

assuming that folder is "C:\Marketing Research."

Find PASW (SPSS) Statistics® in the Program Manager. Double

click on the SPSS icon and the program will open. The window in

Exhibit 1 should be seen on your computer. If you are using the

regular version of SPSS rather than the Student Version, almost

all of the screens will be identical. Those that are not identical

will have very minor differences that will not impede your

learning.

Exhibit 1

11


This window provides several options for your next step. As you

can see from the top button on the right side of Exhibit 1, SPSS

does come with a built-in tutorial, which you might decide to

follow. You could also decide to type in a set of data. We'll do

this later, so pass over this option for now. We'll ignore the two

"query" options and focus our attention on the last option,

"Open an existing file."

If you had opened the Library SPSS.sav data file before now, it

would show in the window at the upper left of Exhibit 1.

Whatever data file you click on will be opened by SPSS, as long as

it is a proper SPSS file. If you have never accessed the Library

SPSS.sav data file with SPSS before now, double-click the "More

Files ..." line and work through your folders and files until you

find the Library SPSS.sav file in the Marketing Research folder on

your C drive. It will look similar to the following window.

Exhibit 2

12


Another path that you can follow for opening an SPSS data set is

shown in Exhibit 3. You just need to pull down the File menu,

then slide open the Open menu and click on Data. Once again,

navigate to the C:\Marketing Research folder and click on the

Library SPSS file.

Now just double-click on the Library SPSS file and the data sheet

in Exhibit 4 will open. The data sheet, or spreadsheet, can be

considered "home base," that place where you typically begin an

SPSS session and often where you exit from SPSS when you have

finished your work. The application window is the total window

frame within which resides the Menu Bar, the Tool Bar, the Data

Editor, the Output Window, and several other helpful features of

SPSS.

The Menu Bar is at the top of the window in Exhibit 4. It reads

horizontally as File, Edit, View, Data, Transform, Analyze,

Applications, Graphs, Utilities, Add-ons, Window, and Help. The

pull down menu under each Menu Bar heading contains many

features, some of which have their own pull down, or pull to the

Exhibit 3

13


side, features. These features will be used regularly throughout

your data analysis.

The Tool Bar is the horizontal bar of diagrams directly below the

Menu Bar. These Tool Bar icons act as short cuts so that you

don't have to pull down the menus for several of the most-often-

used tasks. Some of the Tool Bar icons are self-explanatory,

others you'll need to learn. Whenever you want to learn the

function of a Toolbar icon, just hold the cursor over the icon

and the basic description of the icon appears below the

icon.

The first of the Tool bar items in Exhibit 3 shows the File

menu being opened, and that is exactly what it does: opens

files, saves output files, and syntax files and several other

helpful tasks. If you click on the icon, you'll be able to

Exhibit 4

Value Labels. You should see “Last

week” in the first row below

“Time_last”, as in Exhibit 4. If 2

appears instead, pull down the

View menu and click on Value La-

bels.

Exhibit 5

14


perform some tasks more easily than by pulling down the

respective menu and selecting the specific action.

Begin exploring the flexibility of SPSS by pulling down each of the

menus on the Menu Bar, and viewing their commands. The

Analyze menu is shown in Exhibit 6. While SPSS is capable of an

incredibly wide range of functions, the Analyze menu lists the

essentials that make the existence of SPSS valuable. These are

the statistical data analysis procedures that you can use to

analyze data sets and understand the nature and attitudes of

survey respondents. Exhibit 6

15


The cursor in Exhibit 6 points to Frequencies which, along with

Descriptives and Crosstabs will be the most frequently used

methods in this book and, probably, for most of the data analysis

that you will conduct when learning how to use SPSS.

Analyzing a Real Data Set

The Hamilton Central Public Library (HCPL) data file for the

survey conducted on customers in the library is labeled Library

SPSS.sav. Exhibit 4 shows the data for the first 10 respondents

and for the first 7 variables, RID through Spend. Now, look at the

questionnaire for the Hamilton Central Public Library Community

Opinion Study in the appendix.

Actually, several surveys and focus groups were executed during

the course of the Library study. The data used in this book was

obtained from library users who completed a questionnaire in

the library. Copies of the questionnaire were distributed to

library users on all five public floors of the library over the course

of one week. The appendix questionnaire is an abbreviated

version of the actual survey used in the study.

Since PASW (SPSS) Statistics 18® Student Version is limited to 50

variables, this library data set has been pruned to contain only

46 of the many variables that were in the original survey. Each

column in the Data Editor contains the values of the variable

listed in the heading. For example, "RID" is the variable that

simply numbers each of the cases (respondents) in the data set.

16


By using the vertical scroll bar on the right side of the Data Editor

window, you can see that the respondent identification (RID)

variable lists 1, 2, 3 ... to the total number in the data set, 779.

The second variable (second column) is "Time_last", which is

actually the first question in the questionnaire (see Appendix).

Respondent 1 answered "last week" (code 2) to the question

"When was the last time that you were in this Library before

today?" Person 2 indicated "Week before last" (code 3), while

respondents 3 and 4 both answered "Last week" (code 2 again).

Reading across the first row in Exhibit 5, person 1 said the

following: that she (slide the scroll bar at the bottom to the right

until you see the variable "Sex" which indicates the female

gender for this respondent) visited the library last week, that she

visited this library twice during the past month, that she got to

the library via a personal vehicle, that she was at home just

before going to the library, that the library was the one special

reason for the trip, and she indicated that she will have spent

$40.00 in the downtown core during the library trip. That's an

interesting story and it becomes even more fascinating to read

through the answers of several respondents and then combine

the answers of all 779 individuals who were kind enough to

complete the survey questionnaire.

The Data Editor can provide the numerical values that

correspond to the respondents' answers or it can show the

answers in words. Your Data Editor probably shows words (value

labels) under the variable "time_last_visited." If this is true, pull

down the View menu and notice that the Value Labels option is

checked (Exhibit 5). Drag your cursor down to Value Labels and

17


click to uncheck this option. Notice how the value labels have

been replaced by numbers. Instead of "Last week" for person #1,

the number 2 shows under "time_last_visited." We'll come back

to the Value Labels feature later.

The small icons to the left of

each variable name indicates

the information content of the

variable. The three circles

clustered together indicate

nominal variables. The verti-

cal bar chart signifies that the

variable was specified as hav-

ing ordinal properties and the

yellow ruler means that the

variable was specified as hav-

ing scale or metric properties.

Exhibit 8

18


Frequency

Distributions and Charts

Now we're ready to direct SPSS to analyze the data, i.e.,

summarize the information so that it's easier to get an overall

story of the 779 respondents. Exhibit 7 shows that we have

pulled down the Analyze menu, pulled out the Descriptive

Statistics menu, and that "Frequencies ..." is now highlighted.

Exhibit 7

19


This Descriptive Statistics section contains several analysis tools

that will allow you to conduct a tremendous amount of basic

analysis that is very important for understanding data. You’ll be

able to get frequency distributions along with basic statistics and

graphs, descriptive statistics, crosstabulations with related

statistics, and we'll be able to explore the information contained

in the data by using the "Explore..." feature. For now,

Frequencies will be enough.

In the Frequencies window (Exhibit 8), highlight the variable(s)

you want to analyze and move them to the currently blank

"Variable(s):" window by clicking on the arrowhead. (You can

also move the variables by double-clicking on each one.) The

window should now look like Exhibit 9. You can allow SPSS to

analyze the data using preset "default" options or you can

choose your own special features by clicking on the "Statistics,"

"Charts," and "Format" buttons.

The default settings on the "time_last_visited" variable produce

the frequency distribution shown in the Output window of

Exhibit 8

20


Exhibit 10. This is a very useful output and we'll soon see how we

can improve its appearance and obtain more information.

The Statistics table shows that 769 customers provided valid

answers to question 1 and that the answers for 10 people were

missing (i.e., they did not answer or their answers were

unintelligible).

The frequency distribution appears next in Exhibit 10. Check

back to the questionnaire to better understand the meaning of

values in the frequency distribution for "time_last_visited."

These are times when the respondent last used the Central

Public Library. "Earlier this week" was reported by 215

customers, "Last week" by 249, and so on.

As the values increase, the period between the time of the

survey and the last time the library was visited lengthens. Notice

that the distances between successive values are not equal to

the same lengths of time. The difference between "Earlier this

Exhibit 9

21


week" and "Last week" is much less than the difference between

the "Week before last" and "three to four weeks ago."

This means that the "Time_last" variable has ordinal information

properties (i.e., the "Week before last" designates a longer time

period than does "Last week"). Please note that

"time_last_visited" does not have interval properties since the

time distances between answers provided are not equal. Also,

"time_last_visited" does not have ratio properties since the

length of time denoted by "last week," or 2, is not twice the time

denoted by "earlier this week" or 1, and the length of time

Exhibit 10

If a variable has ratio

properties, one value can

be divided by another

and the quotient makes

sense. Also, there is a

natural zero point.

22


represented by "one year ago or longer," or 8, is not twice that

indicated by "three to four weeks ago," or 4, and so on.

We've now discussed the top three levels of information content

of data: ordinal, interval, and ratio. The lowest level is nominal;

this is typically an object or a number that is used to represent a

name or object in the data set rather than a value having a

specific numerical attribute. The legitimate measures of central

tendency for these levels of information content are:

information level measure of central tendency

nominal mode

ordinal mode, median

interval mode, median, mean

ratio mode, median, mean

Before we re-enter the Frequencies menu to get more

information, let's work a little to understand how to assign labels

to the values of "Time_last." This is very easy to do in SPSS

Windows by following these steps:

1) In the Data Editor View (Exhibit 4) window, select the tab at

the bottom left corner of the window labelled "Variable

View" (see Exhibit 11), or double-click on the box that contains

the variable name "RID" (first column). The Variable View

window in Exhibit 12 will appear.

23


2) This Variable View window provides substantial flexibility for

configuring the data in the Data Editor. All of the variables in the

data set are listed in this window. Scroll down the window and

you will see the complete list of variables and their

characteristics.

3) Each column allows you to set or change some characteristic

of the variable. The Name and Label columns allow you to

establish or change the name that you want to assign to a

variable. For "Time_Last”, you may change the name to "Q1" or

any other name that you like that abides by SPSS rules. You can

also change the Label for that variable. Since the variable "id"

does not have a label, you may type a suitable variable label into

the Label box for the "id" row; for example, "respondent

Exhibit 11

Exhibit 12

24


identification". When you do this, think of how your output

tables will look.

4) Explore the other boxes in the "Time_Last" row by clicking in

the right part of each box. You can adjust the Type, Width,

number of Decimals, Values (Value Labels), Missing, the number

of Columns, Align, and the Measure features.

5) Click on the Values box for the "Time_Last" variable, and then

click again on the gray segment in the rightmost part of that box.

You should see Exhibit 13. Each value that "Time_Last" can take

has a description that has been typed into the Variable View

window. Click on the first label and you will see that the number

1 appears in the Value window and "This week" appears in the

Label section of the window. If you like, you can change the label

for 1 to another that you might feel is more appropriate. Notice

that the last five variables, "live_where," "occupation,"

"work_where," "birth_year," and "language_first," do not have

Exhibit 13

25


value labels. Try typing in value labels for at least one of those

variables.

6) You may change these labels as you see fit. You can switch

back and forth between the Variable View and the Data View

screens by clicking the tabs having those names in the lower left

corner of the screen.

7) "Missing Values" is a very important function found in the

Variable View window. Notice in Exhibit 10 that Missing System

has the frequency of 10. This means that 10 people did not

provide any answer to the "Time_Last" question. In the Data

View window scroll down and you will see that 10 respondents'

values for "Time_Last" are not recorded.

Now, click in the far right of the box for Missing in

"Time_Last" and you will see Exhibit 14. Notice that there are

"No missing values" for "Time_Last." From the Value Labels

window, you can determine that "Never before today" has the

value 9. If you wanted to make this a missing value, click in the

circle for "Discrete missing values" and type 9 into the leftmost

box. Click OK (see Exhibit 15).

Exhibit 14

26


Now, let’s produce a frequency distribution for this altered

"Time_Last" variable. Pull down the Analyze menu, go to the

Descriptive Statistics menu and select Frequencies. Move

"Time_Last" into the Frequencies window and click OK. You

should get the frequency distribution shown in Exhibit 16.

"Never before today," value 9, is now listed as Missing. Under

the Valid Percent column, notice that the 13 people who said

"Never before today," and the 10 people who gave no response

Exhibit 15

Exhibit 16

27


to this question, are not listed. Now reverse the process and

include 9 as a valid response for "Time_Last".

Back to our analysis path. Re-enter the Frequencies window

and select "Time_Last" as the variable to analyze. Click on the

"Statistics..." button and Exhibit 17 will appear. SPSS presents a

wide range of statistics that can be calculated and displayed

under the frequency distribution. While some of these statistics

are relevant, others are not appropriate for some types of

variables. Remember that "Time_Last" is an ordinal variable. The

mode and median have been checked in the Statistics window Exhibit 17

28


(Exhibit 17), the mean is not relevant here. Now our frequency

distribution will look much more legible and provide more

information than it did in our first attempt. The frequency

distribution with a variable label and value labels is the same as

in Exhibit 10, with the addition of the mode and median being

presented in the Statistics box, as is shown in Exhibit 18.

You probably also noticed the Charts button in the Frequencies

window. Click the Charts button and select the Bar and

Percentages options; the window in Exhibit 19 will appear. There

are several options available at this point. We have chosen a bar

chart displaying percentages. You might want to play around

with different options until you get the chart you like best.

Exhibit 18

Exhibit 19

29


Exhibit 20a shows the bar chart with percentages for each of the

bars, and the value labels displayed along the horizontal axis.

The output from SPSS can be saved and used in other programs,

such as word processors like MS Word or page layout programs

like Pagemaker and MS Publisher.

Bar charts should normally be displayed with horizontal bars and

the value labels typed horizontally to the left of the bars. To

change the appearance of the graph, double-click on the graph

and the SPSS Chart Editor will appear. This gives substantial

flexibility in changing the graph to suit your taste. Work with the

editor until you get Exhibit 20b.

Exhibit 20a

30


Exhibit 20b

31


Cross-Tabulations Now we're ready to move on to the next level of analysis. This

involves investigating the relationships between pairs of

variables. For example, library customers were asked to indicate

why they used the library: for leisure reading, for school, for

learning how to do things, for their jobs, or for personal

research. Customers were also asked to indicate their sex.

During the analysis of survey research data, dependent or

criterion variables, such as "why they used the library," are

investigated to see if they are related to other variables. It is

especially important to determine if the relationship is such that

particular patterns can be identified, and predictions advanced.

In this library survey, we will now investigate whether there is

any relationship between "why they used the library" and the

sex of the respondents. Pull down the Analyze menu, select

Summarize and then "Crosstabs..." as shown in Exhibit 21.

The Crosstabs window will appear next. This window has a few

more options than the Frequencies window did, but it operates

in much the same way. For ease of reading and interpretation,

always place the dependent variable (if there is one) in the row

position, sometimes called the stub, and the independent

variable in the column position, referred to as the banner.

Now, select "Why_use_library" from the list of variables and

send this variable over to the "Row(s):" box by clicking on the

higher arrow button. Send "sex" to the "Column(s):" box by

32


selecting "sex" and clicking on the "Column(s):" arrow button.

(See Exhibit 22.) If you click on the OK button, a crosstab will be

produced. But let's do a bit more before getting a table.

We'll specify some statistics to accompany and help interpret the

table by clicking on the "Statistics..." button. The window in

Exhibit 23 should now be visible on the computer screen.

Although you may ask for all of the statistics displayed in the

window, it's better to ask for only those statistics that are

Exhibit 21

33


relevant to the analysis. In this case, click the box for Chi-Square

Exhibit 22

Exhibit 23

34


and the box for Lambda. These statistics will be explained only

briefly here. You'll need to read about these in a marketing

research text or a statistics text to obtain a more detailed

explanation.

Click on the Continue button in the Crosstabs: Statistics window

and then click on the Cell button in the Crosstabs window. The

Crosstabs: Cell Display window, in Exhibit 24, will now be visible

on the screen. The Observed box under Counts will already be

checked. Now, check the Column box under Percentages and

click the Continue button.

Your last necessary action is to click the OK button in the

Crosstabs window and wait a few seconds. Part 1 of the

Exhibit 24

35


Crosstabs output is the table of "Why use library by Sex" in

Exhibit 25.

You should have no problems interpreting the table. Of the 348

females in the study, 136, or 39.1%, said they used the library for

leisure reading. Only 29.3% of the males reported this reason.

The mode for the men was personal research, at 31.9%, just

slightly larger than the 29.3% for leisure reading. Overall, leisure

reading was the most often cited reason for using the library

(33.7%).

Notice in the crosstab that there seem to be some substantial

differences between the ways in which women and men use the

library. Women reported using the library more for leisure

reading and for school. The responses from men indicate that

personal research is their primary use for the library and that this

Exhibit 25

36


was much higher than for women. The second most frequently

mentioned use by men was for leisure reading (29%) and this

was 10 percentage points lower than for women. Men used the

library for jobs and for learning how to do things more often

than did women.

Reporting differences between groups is a very important

function of a marketing research analysis. Sometimes these

explanations are more important than the actual statistical

analysis. However, this reporting of differences should

concentrate primarily on contrasts that are large enough to be

substantial for business reasons and to be significant statistically.

In marketing research, the primary focus is on highlighting

information about the business environment, competitive

landscape, and customer attitudes and behaviour that can help

managers make better marketing and business decisions.

Statistical significance that supports information that is

substantial to assisting with decisions is important. Sometimes,

differences between data values are statistically significant, but

those differences might be so small that they make little or no

difference to running the business.

Exhibit 26

When preparing tables for re-

ports, the style of Exhibit 26 with

column percents is more readable

for most audiences. The base size,

i.e., the total number of respon-

dents on which the table is built

should always be provided.

37


This discussion leads us to looking at the statistical output for the

last crosstab. Remember that you selected to see the Chi-Square

test and the Lambda test results. The Pearson Chi-Square test

provides a value of 60.184, 4 degrees of freedom, and a

Significance of 0.000 (Exhibit 27). Unless you're familiar with this

test, you're probably not very excited by these findings. The

60.184 is a statistical calculation that measures the relative

differences between the observed frequencies in the table and

those frequencies that would have been expected to be found in

the survey data if the null hypothesis of statistical independence

between the two variables were true.

This expected frequency for the first cell of the table can be

calculated as (260 x 348)/771, or 117.35. So, based on the null

hypothesis, we would expect 117.35 women to say that they use

the library primarily for leisure reading. The survey found that

136 women actually said leisure reading was their main reason.

You could calculate these expected frequencies for all the cells in

the table and visually try to figure out if the two sets of numbers

are very similar or very different. (In fact, SPSS will provide a

crosstab with expected and observed values. To use this feature,

we would have checked the Expected box in Exhibit 24.) In some

Exhibit 27

Ho: Why use library and Gender

are statistically independent.

Ha: Why use library and Gender

are not statistically dependent,

i.e., they are related

Alpha or Type 1 risk = 0.05

38


tables, these differences between observed and expected counts

might be obviously close or obviously dissimilar. In most cases,

it's very hard to tell which is the correct answer. This is why the

Significance value is provided in the Pearson row under Chi-

Square.

To tell if the observed and expected frequencies are close or not,

you just have to compare the Significance to the benchmark

value of 0.05 for the risk of a Type 1 error. In our example, the

Significance of 0.000 is much smaller than 0.05, so we conclude

that the actual observed survey frequencies are very different

from those frequencies that would be expected to occur if the

null hypotheses of independence between sex and primary use

of the library were true. Therefore, we should reject

independence between the two variables and conclude that

there is a dependency or relationship between sex and use of

the library.

This finding tends to confirm our visual inspection of the table as

we discussed previously. When your visual inspection is

confirmed by the statistical test, you should feel good. Many

cases will not be this obvious.

So, we've concluded that a relationship exists between primary

use of the library and sex. The next question is whether this is

helpful for anything. Your first concern should always be

whether the data itself helps you to explain something important

to your client, even if the difference is not statistically significant.

(I know. This might sound like statistical heresy. However, your

#1 concern is to explain the phenomenon that is being studied in

a way that is of value to your business client. Statistical testing

39


exists to help you when your visual observation might not allow

you to make definitive conclusions about the data story. Don't

let the statistics lead you into making statements that might

appear to be silly to a business manager. They will want findings

that help them make good business decisions. These managers

are only rarely concerned about "statistically significant"

findings. That's reality!)

Strength of the Relationship

Next, how strong is the relationship between Why use the library

and sex? Please don't confuse this question with "Is the data of

any value?" The data might have value to your client even

though no statistically significant relationship exists. It better

have some value ... they've paid for the research!

The strength of the relationship can be measured in several

ways; the basic measure is asymmetric Lambda with ‘Why use

library’ specified as the dependent variable. This gives the

percentage reduction in the error of predicting "Why do you use

the Library?" if you use the information about sex compared to

the error of predicting ‘Why use library’ without using any

information about an independent variable.

Exhibit 28 contains information helpful for deciding on the

strength of the relationship. Lambda with ’Why use library’

dependent has a value of 0.022. This means that the error of

predicting the respondent's reason for using the library can be

40


reduced by 2.2% if information about the respondent's sex is

used. Doesn’t that seem like a really minor benefit?

If information about the independent variable ‘sex’ allows you to

always accurately predict the person's reason for using the

library, then the value of Lambda would be 1.0, its maximum. For

example, Lambda would be 1.0 and you would have perfect

ability to predict if you knew that the respondent was a man, he

would use the library for personal research, and if the

respondent were female, she would use the library for leisure

reading. (When would this be true? Of course, only when all

males used the library for personal research and all females used

the library for leisure reading.)

If information about sex did not at all help you to better predict

why a person used the library, then Lambda would have the

value 0.00. Lambda would be zero in this example if the most

often mentioned use of the library by women is, say, leisure

reading and men also used the library most often for leisure

reading. In this case, it really doesn't matter what the sex is,

Exhibit 28

Female Male

Leisure Reading 348 0

School 0 0

How to do 0 0

Job 0 0

Personal research 0 423

Total 348 423

Female Male

Leisure Reading 348 423

School 0 0

How to do 0 0

Job 0 0

Personal research 0 0

Total 348 423

41


you'd still guess that leisure reading is the main use of the library

for males and for females.

Notice that we observed that the most often cited reason for

using the library by all 771 respondents was leisure reading.

Women most often mentioned leisure reading, but men stated

personal research most frequently. Our guess changes

depending on whether the person is male or female. Whenever

this switch occurs in the guess of the value of the dependent

variable based on the value of the independent variable, the

value of Lambda must be greater than zero. The larger the value

of Lambda, the stronger the relationship is between the two

variables. Once again, the largest value of Lambda is 1.0, when

the prediction would be perfect.

In our example, the value of 0.022 indicates a very weak

relationship between sex and ‘Why use library’, using ‘Why use

library’ and as the dependent variable. Notice that although a

higher percentage of men used the library for personal research,

this 31.9% is just slightly higher than the 29.3% who said leisure

reading, which was the reason most often cited by women.

When these highest percentages in a column are very close in

value, expect Lambda to be small.

We've looked at just two of many statistics that are available

from Crosstabs. Depending on the types of variables being

analyzed, you might decide to select other statistics to be

presented by SPSS. These are discussed in the Help menu, in

your marketing research text, and in statistics texts.

42


The Sparse Cells Rule

One more item before we move on. Notice the line in Exhibit 29

that says "a. 0 cells (.0%) have expected counts less than 5. The

minimum expected count is 23.02". This might seem innocuous,

but it's really quite important. This information is used to

indicate whether the Chi-Square analysis can be used, or

whether it must be scrapped.

In this example, the value of 23.02 is perfectly adequate and no

further thought would have to be paid to any constraints on the

analysis. Without getting into a lot of details, this value,

"minimum expected frequency," must be 1.0 or more or the Chi-

Square analysis is not valid. The first line above that says "0 cells

(.0%) have expected count less than 5" and is also very good

news. Look for that value to be less than 20% or you could be

getting bad information from the Chi-Square test.

A rule-of-thumb for the Chi-Square test is that no more than

20% of the cells in the crosstab should have expected

frequencies less than 5 and none should be less than 1. Please

realize that this is a rule-of-thumb, not a hard-and-fast rule.

Exhibit 29

The Sparse Cells Rule

43


There is some flexibility for interpretation. One should be

cautious if between 20% and 25% of the cells have an expected

frequency of less than 5. If the data looks highly unusual, don't

rely on Chi-Square. If the data is fairly "normal" looking, then

consider using Chi-Square. If between 25% and 30% of the cells

have expected frequencies less than 5, be very hesitant to use

the Chi-Square test results. If 30% or more of the cells have

expected frequencies of less than 5 then do not use the Chi-

Square test.

If the minimum expected frequency is less than 1.0 or more than

20% of the cells have expected frequencies less than 5,

investigate the crosstab to identify the reason for these numbers

being so low. Typically, there is at lease one row or column that

contains very few expected frequencies. If combining two

columns or two rows makes sense for those variables, this could

solve the problem with the sparse cells rule and open the door

for using the Chi-Square test. Remember that this rule applies to

expected frequencies, not to the observed frequencies. It would

be helpful for you to return to Exhibit 24 , select expected

frequencies and relate the expected frequencies in the resulting

crosstab to the sparse cells indicator under the Chi-Square table.

Do this for several crosstabs and relate the expected frequencies

to the sparse cells rule.

44


T-tests for Differences

Between Means

The variables that were analyzed in the last section, use of the

library and sex, were both nominal in their information content.

Now, we'll investigate what can be done when the dependent

variable is interval or ratio. From the questionnaire, we see that

customers were asked whether they agree or disagree with the

statement that the Central Public Library "has too many rules." A

6-point scale was used that went from "Strongly disagree" to

"Strongly agree." Let's assume that this scale has interval

properties, although some might disagree. With this assumption,

we can calculate means, variances, and other parametric

statistics.

We will now ask the question "Is the mean level of agreement

for women the same as it is for men?" A t-test for independent

samples will be used for the analysis. To access this test, pull

down the Analyze menu, then slide out the Compare means

menu, and select "Independent-Samples T Test." (see Exhibit 30).

The Independent-Samples T Test window pops up and is shown

in Exhibit 31. Now, select the variable "rules" as the Test Variable

45


and "sex" as the Grouping Variable. You'll see that "sex(? ?)"

appears in the grouping variable window. Press the "Define

Groups..." button and you'll be asked to provide the User-

Specified values for Group 1. Insert 1 and for Group 2, insert 2

(see Exhibit 32). Press the Continue button and then the OK

button.

The table in Exhibit 33 shows the descriptive statistics. The mean

for Females was 1.86 for "Too many rules in the Library," slightly

less than "somewhat disagree" on the questionnaire scale. For

Exhibit 30

46


males the mean was 2.09, slightly more positive than "somewhat

disagree." Although both females and males "somewhat

disagreed" with this statement, the males tended not to disagree

as much as did the females that there were too many rules at

the library. Is this difference of 0.23 of a scale point significant

regarding the way in which the library management should serve

females compared to males? Is the 0.23 point difference

significantly different in a statistical sense?

Exhibit 31

Exhibit 32

Exhibit 33

47


As we had a hypothesis for the Chi-Square test, we should state

one for this test as well. An appropriate null hypothesis is that

the mean level of agreement with "The Central Public Library has

too many rules," is the same for men as for women. The

alternative hypothesis is that the means are different. Let's use a

5% risk of a Type I error once again.

The t-test gives us the information we need for deciding whether

to reject or not reject the null hypothesis. Before we test

whether the two means are the same or different, there's an

intermediate test that must be done. This involves testing

whether the variances of the two distributions, the variance of

the distribution for men and the variance for women, are equal

or different.

This test is conducted by using Levene's Test for Equality of

Variances. This, of course, involves another null hypothesis: that

the variance of the two distributions are the same. This is tested

using the F value and P value from Levene's test as shown in

Exhibit 34. The F value can be treated like we did the Chi-Square

value: it's a statistic that's presented by SPSS, but it's number

alone does not provide much information to most of us.

The "Sig." value, sometimes called the P value, produced by

Levene's test tells us where to find the information regarding the

Exhibit 34

Please note that this

is just the left half of

the SPSS output for

the t-test.

Levene’s Test Hypotheses

Ho: The variance for the distribution

of ‘Rules’ for males is statistically the

same as the variance for females.

Ha: The two variances are statistically

different for males and females.

alpha or Type I risk = 0.05

48


independent samples t-test for the difference between two

means. If the P value is greater than 0.05 (5%) then conclude

that the two variances are not different, i.e., do not reject the

null hypothesis, and the row in the table that begins "Equal

variances assumed," in Exhibit 35. (In SPSS, Exhibits 34, 35, and

36 are one table. They were segmented here for ease of

presentation.)

If the P value is less than or equal to 0.05, then continue to the

"Equal variances not assumed" row under Variances. In our

example, Levene's test provides F=0.387 and P=0.534. Therefore,

we can obtain the appropriate t-test information by reading the

Equal row of the last block of the table in Exhibit 35.

The Equal Variances row of the table provides a t-value of -2.41

and a 2-tailed Significance of 0.016 in Exhibit 35. Since this

significance level is less than 0.05 (5% risk), we should reject the

null hypothesis and conclude that the means for men and

women are different. In fact, we can see that the mean for men

is significantly higher than the mean for women. If the 95%

"Confidence Interval of the Difference" shown in Exhibit 36

contained zero, we could not conclude that the means are

significantly different.

Exhibit 35

49


Now, how should we interpret this for the library? In non-

statistical language, we should say that men disagreed

significantly less than did women with the statement "The

Hamilton Central Public Library has too many rules." Men stated

that they "somewhat disagreed" with that statement while the

women's rating was significantly lower than the men's rating.

However, the substantive difference between 1.86 and 2.09 on a

6-point attitude scale seems trivial from a business perspective.

What could you do with this finding if you were the CEO of the

library? Probably not very much. You might simply conclude that

both male and female customers felt that there were not too

many rules in the library. Perhaps this finding might allow the

library to justifiably introduce new rules that it might feel would

benefit customers and the library overall.

Exhibit 36

50


T-tests for Differences

Between Paired Answers

Library customers were asked to state their answers to

"How important to you personally are your visits to this Library?"

and "How important to your career are your visits to this

Library?" These are questions 13 and 14 in the questionnaire in

the appendix. The answers were on 6-point importance scales

that went from "Extremely important" (6) to "Not at all

important" (1).

13. How important to you personally are your visits to this

library?

[ ]6 EXTREMELY IMPORTANT

[ ]5 VERY IMPORTANT

[ ]4 FAIRLY IMPORTANT

[ ]3 SOMEWHAT IMPORTANT

[ ]2 SLIGHTLY IMPORTANT

[ ]1 NOT IMPORTANT AT ALL

14. How important to your career are your visits to this library?

[ ]6 EXTREMELY IMPORTANT

[ ]5 VERY IMPORTANT

[ ]4 FAIRLY IMPORTANT

[ ]3 SOMEWHAT IMPORTANT

[ ]2 SLIGHTLY IMPORTANT

[ ]1 NOT IMPORTANT AT ALL

The average for "personal importance" was 4.71 and for "career

importance" was 3.63. It seems natural to ask if there is a

51


difference between these two means. However, imagine yourself

being asked these two questions in sequence. Do you think that

you might look back-and-forth between these two questions and

calibrate your answers between them? For example, if you said

that you library visits are “very important” to you personally, do

you think your answer to career importance might be refereced

to that answer in Q13? Most people would, either consciously or

unconsciously.

Because of this dependence between the answers to these two

importance questions, we will use a Paired-Samples T Test

rather than the Independent-Samples T Test of the previous

section. Remember that with the Independent-Samples T Test

we were actually comparing the means of two groups, males

versus females. In this current example, we simply want to find

out whether the mean of the differences between individuals’

ratings of the importance to them personally and to their careers

are different.

As you did with the Independent-Samples T Test, pull down the

Analyze menu, and slide out the Compare Means menu (Exhibit

37). Now, select "Paired-Samples T Test." and Exhibit 38 appears.

Exhibit 37

52


Select the pair of variables, IMPPER and IMPCAR, and place them

in the Paired Variables box (Exhibit 38), then press the OK

button. Next, the statistical analysis appears in the output

window.

The Paired T Test provides the descriptive statistics for the two

variables in Exhibit 39: means, standard deviations, and standard

errors for each of the variables. The correlation between the two

variables is shown in Exhibit 40. The "Sig." next to the

"Correlation" indicates that these two variables are significantly

correlated, as one might reasonably think. In this case, the

Exhibit 38

Please note that you must fill each

line in the ‘Paired Variables:’ window

with pairs of variables.

If you put only one variable in a line,

the ‘OK’ button will be a dull gray.

This is a tip-off that you have not

completed a command.

Exhibit 39

Exhibit 40

53


correlation is 0.369 and the 2-tailed significance of 0.000

indicates that this correlation is significantly different from zero

(i.e., the importance to the respondents, personally, of visits to

the library and the importance to their careers are significantly

correlated).

Exhibits 41a and 41b provide the information directly relevant to

the t-test. The mean of the differences between each customer's

answers to the questions is given as 1.0671. The standard

deviation of 1.7128 and standard error of 0.063 are also printed.

In addition, the "95% Confidence Interval" is stated as 0.9439 to

1.1903 and does not include zero. The T value in Exhibit 41b is

17.005 and the "Sig. (2-tailed)" value is 0.000.

These statistics provide very strong evidence that people do

consider visits to the library to be significantly more important to

them personally than to their careers. This conclusion is reached

because the "Sig. (2-tailed)" value is smaller than 0.05 (5% risk).

The T value of 17.005 is very large. (You might remember that T

values larger than 1.96 or smaller than -1.96 for large samples

Exhibit 41a

Exhibit 41b

54


are judged to be significantly different from zero at a 5% level of

risk.)

55


Analysis of Variance

(One-Way)

Think back to the problem addressed with Independent-Samples

T Tests, that of determining whether a statistically significant

difference exists between two means. If we extend that problem

to three or more means, then we need to use the One Way

Analysis of Variance. As with the T Tests, pull down the Analyze

Exhibit 42

56


menu, slide out the Compare Means option, and choose "One-

Way ANOVA." as shown in Exhibit 42. The One-Way ANOVA

options window should now be showing (see Exhibit 43).

Move the variable "Overall satisfaction" into the Dependent List

window. Then move the variable "Motivation," which stands for

"Which of the following best describes how you happened to

come to the Central Library today?" to the Factor window. The

"Motivation" variable has the following three possible answers:

1. Library was the one special reason for this trip;

2. Library was one of several things I wanted to do on this

trip; and

3. Thought of Library after starting out to do something

else.

Your next job is to provide for some additional output that will

help you to interpret the ANOVA output. To do this, press the

"Post Hoc." button. The window in Exhibit 44 will open,

providing a long list of options. Without providing an

Exhibit 43

57


explanation, please click on the box for "Tukey's-b," and then

click Continue. This will put you back in the One-Way ANOVA

window. Now press the Options button.

Exhibit 44

Exhibit 45

58


In the One-Way ANOVA: Options window (Exhibit 45), click on

the Descriptive box, the Means plot box, and then press the

Continue button. You're now back in the One-Way ANOVA

window. Click OK, when you're back in the window shown in

Exhibit 43, and let SPSS crank away at its calculations.

In a few seconds the ANOVA output that appears in Exhibits 46-

49 below will be on your computer screen. The descriptives in

Exhibit 46 show the three means and their confidence intervals.

There's quite a bit of detail in the ANOVA output, too much to

explain in this tutorial. Please refer to your marketing research

or statistics text for more complete explanations. The most

important piece of information in Exhibit 47 is the "Sig." value of

0.023. This indicates whether the F value of 3.794 is significantly

different from 1.0 or not. This will be used as a level of

significance for the analysis that investigated the degree of

Exhibit 46

The null hypothesis is that the

three means are the same. Do

you think that the means of 5.47,

5.53 and 5.25 are the same or

are they different?

The F statistic is the ration of the

two Mean Squares, which are

really variances.

If the null hypothesis is true, the

F ratio should be statistically

close to 1.0

Exhibit 47

59


difference among the mean satisfaction levels for the three

groups of customers who indicated how they happened to be in

the library on the day of the survey.

This F Sig. is less than 0.05 (5% risk) and will lead us to conclude

that there is a significant difference among the three means,

which are listed as 5.47, 5.53 and 5.25 in Exhibit 46. Also, notice

the 95% confidence intervals and how they overlap or don't. The

F Sig. indicates that the F Ratio of 3.794 is significantly larger

than 1.0 and that at least one of the three means is significantly

larger or smaller than the other two.

Exhibit 48 is the result of asking for the "Tukey's-b" test in the

Post-Hoc window. The key part of this output is the grouping of

means in homogeneous subsets. The mean for "Afterthought"

stands by itself in Subset 1 with a mean value of 5.25. The other

two means, for "One Reason" (mean of 5.47) and "One of

Several (reasons)" (mean of 5.53), are grouped in subset 2. While

this test is not definitive, it indicates that the mean overall

satisfaction with the library for those who visited the library as

an afterthought was significantly lower than was the overall

satisfaction for the other two groups, which are considered to

Exhibit 48

60


not be statistically different between each other. If the variances

differ among the groups, one of the other appropriate post hoc

tests should be used. (See Exhibit 44.)

The "Means Plots" is shown in Exhibit 49. These graphs often

help in interpreting the ANOVA output by providing a visual

perspective to the analysis. No statistical information is provided

in these plots. However, you can probably detect that the mean

satisfaction for those who said their visits were "Afterthought" is

much lower than the means for those who said that going to the Exhibit 49a

61


library was their "One reason" for going out and going to the

library was "One of Several (reasons)" for going out.

Always be careful of tables and graphs. Notice the vertical scale

in Exhibit 49a; it is very tight. By double clicking on a graph, you

can alter almost all of its aspects. I rescaled the graph to produce

Exhibit 49b with the question scale of 1 to 6. It is still legitimate

to consider satisfaction when going to the library as an

afterthought to be statistically less than the satisfaction when

going to the library when it was your primary reason or one of

Notice that in Exhibit 49b, the ver-

tical scale was changed and the

font size was increased on the axis

labels. Try doing this yourself;

SPSS offers tremendous versatility

in designing graphs. Plus, when

you get a format that you like, you

can save it as your template for

other graphs.

Exhibit 49b

62


several reasons. However, that difference must be considered

relative to the scale on which respondents answered the

question. This finding makes sense, statistically. And, while the

difference is not large when considered on the scale, it should

make the library management think about motivating residents

to think of the library more often.

63


Linear Regression

Up to this point, we have always been investigating one variable,

relationships between two variables, or differences between two

or more groups. Now we will begin to consider multivariate

analysis (i.e., more than two variables and how they are related).

To do this, we'll use simple linear regression (still bivariate) and

multiple linear regression (multivariate analysis).

In our library example, we will strive to determine those aspects

of the library and the staff that have the greatest influences on

overall satisfaction with the library. The first part of the process

should be familiar to you. Pull down the Analyze menu then slide

out the Regression menu, as you see in Exhibit 50. Choose

"Linear..." for linear regression, and the Linear Regression

window opens (see Exhibit 51).

We will be working only with linear regression in this section.

However, look at all of the variations listed in the regression slide

-out menu in Exhibit 50. Each of those methods provides several

variations for applying each method. In addition, many of the

other statistical methods listed in the Analyze menu are

variations of regression or related to regression.

In the Linear Regression window, select the overall satisfaction

variable and place it in the Dependent section. Now, select the

variables "Always noisy [Noisy]" to "Staff is easily approachable

[Approachable]" plus "Convenience of Hours [Hours]" from the

left window and slide them into the Independent(s) window. Pull

Regression is an extremely impor-

tant and extensively useful statistical

technique. Your time would be well-

spent in understanding regression

methods, execution techniques, and

interpretation.

64


down the menu within “Method:” and select Stepwise. Although

there are many other triggers that can be selected in Linear

Regression, we've done the basics. Click the OK button and SPSS

will calculate and display quite a large number of statistics.

Since we selected Stepwise Regression, SPSS will present the

step-by-step results that are obtained by adding in the variable

that provides the most contribution during each step.

(Sometimes a previously entered variable is deleted on a step.)

Exhibit 50

65


There are several statistics that are important to understand,

basically, in regression. The first are R, R-square, and Adjusted R-

square. The Model Summary in Exhibit 52 provides these

statistics for each of the six regressions of this analysis.

R is the correlation coefficient. R-square is, of course, the square

of R and is called the coefficient of determination. R-square

indicates the percentage of variation in the dependent variable

“Satisfaction”, that is explained by the set of independent

variables that are in the analysis at that stage. We'd like this to

be as large as possible (the maximum of R-square is 1.0). The R-

square of 0.235 might not seem all that impressive to you at

first. However, let's look at the whole picture. The Adjusted R-

square provides an R-square that reflects the number of

variables and the sample size.

Exhibit 51

66


You'll see the ‘standard error of estimate’ statistic listed next.

That statistic by itself is often not very informative. However, it is

very helpful when comparing two or more regressions. We

should be led to consider more favorably those regressions from

a set that have smaller standard errors and larger R-squares. As

you can see in Exhibit 52, each regression shows improved

values of the Rs and the ‘Std. Error of the Estimate’. This is what

should happen, and typically does occur, in a stepwise

regression.

Exhibit 53 shows the ANOVA for each of the six steps in the

regression. The lower part of the table shows the variables that

are added to the regression equation on each step. Easy-to-use

arrangement, was included first and Always noisy, the library is

always noisy, was added on the sixth step.

Exhibit 52

These letters represent the

successive models, as SPSS

adds variables to the re-

gression equation. The

stepwise procedure strives

to improve the solution on

each step.

67


The key information to get from the Analysis of Variance part of

the table is that the ‘Sig.’ = .000 for each of the regressions.

Since this value is less than 0.05 (5% risk), we can conclude that

these regressions do provide some potentially valuable

information. If ‘Sig.’ was greater than 0.05 and a 5% level of risk

was being used, the conclusion would be that the set of variables

68


was not helpful in explaining overall satisfaction. Even though

the F value gets smaller with each step, the ‘Sig.’ is still smaller

than 0.000, even in Model 6.

After successfully passing the test of whether the regression is

valuable to work with, we can proceed to the part of the table

that provides specific information on the variables being

analyzed. The variables in the regression equation are listed in

Exhibit 54 along with B, the regression coefficient, the Std. Error

of B, the standardized regression coefficients, Beta, a measure of

the relative impact of each variable on Overall satisfaction, T, a

measure of the relative distance between the B value and a

slope of zero, and ‘Sig.’ of T. This last measure is compared to

0.05 and, if smaller or equal to 0.05, indicates that the

corresponding variable might have a significant effect on the

dependent variable.

Notice that ‘Always noisy’ and ’Staff is to too busy to assist’ have

negative coefficients. This is because they were stated in a

negative fashion in the questionnaire (e.g., "The Library is always

noisy"). All six variables have Sig. t values less than 0.05. (This is

the work of Stepwise to allow into the analysis only those

variables that are significant.)

If you arrange the Beta values in order by absolute value, the

following relative impacts can be seen:

� Easy to use arrangement (Beta = 0.210; mean= 4.84)

� Staff appears knowledgeable (Beta = 0.186; mean= 5.24)

� Great contribution to Hamilton (Beta = 0.148; mean= 5.52)

The null hypothesis is that the

slope, B, of the variables is

zero. The values of t measure

how far each unstandardized

regression coefficient is from

the null hypothesized value of

zero.

69


� Convenience of hours (Beta = 0.114; mean= 4.96)

� Staff is too busy to assist (Beta = -0.112; mean= 2.34)

Exhibit 54

70


� Always noisy (Beta = -0.097; mean=2.33)

The regression equation can now be written as:

Satisfaction = 3.57 + 0.115*(Easy to Use Arrangement) + 0.145*

(Staff Appears Knowledgeable) + 0.105*(Great Contribution to

Hamilton) + 0.056*(Convenience of Hours) - 0.052*(Staff is too

busy to assist) - 0.050*(Always noisy).

The major use of the regression results is usually not obtained by

directly substituting values of the independent variables into the

equation. Rather, by knowing the relative impact, we can better

understand those actions that can influence the satisfaction of

the library's customers. This impact of the individual variables is

typically based on the Beta values, which are considered to be

ordinal indicators.

The mean scores for the six factors indicate that the Library

received very credible, if not outstanding, scores on all

dimensions. The two lowest scores are "noisy" and "busy," with

"hours" next. If management were to identify areas where they

might be able to increase an already very high level of overall

satisfaction (mean= 5.48), noise in the Library and the

appearance of being too busy among staff might be areas to

investigate. However, keep in mind that with the already

extremely high satisfaction, and the high scores on noise and

staff assistance, that there might not be too much direct payback

from this investigation. Keeping these scores high should result

in continuing high levels of customer satisfaction.

Caution. A basic assumption about the final set of independent

variables in a regression is that they are highly correlated with

71


the dependent variable, but not with each other. This

assumption should be checked as a normal part of any

multivariate regression analysis. Sometimes the term

multicollinearity is used to describe this condition of dependence

among the independent variables. A significant amount of

collinearity among the independent variables may invalidate the

regression; using such findings could lead to erroneous

marketing decisions. Since this topic requires more statistical

background than is appropriate for this tutorial, please look in

your marketing research text or in a statistics text.

This has been a very cursory treatment of regression. Once

again, the objective has been to provide a quick introduction to

SPSS, not a refresher course in statistics. Be aware that effective

use of regression relies on a more complete understanding of

the topic than has been presented here. Please consult texts in

statistics and marketing research for fuller discussions of this

important topic.

72


Setting Up the Data Before analyzing data, it is important to understand how to set

up a data table in SPSS so that it can be analyzed and so that the

output is as usable and attractive as possible. The Library SPSS

data set was fully formatted and you had no work to do before

beginning the analysis of the data. However, it will be essential

for working with other data sets to read and understand this

section on how to set up and format a data set.

Typical result of a survey is numeric and character, or string, data

that may be arranged in many different ways. If the survey was

fielded through a commercial marketing research field agency,

the client can ask to have the data produced in almost any

convenient format. A very basic way of providing data is in an

Excel spreadsheet or in a tab or comma delimited text file, either

of which can be very easily imported into SPSS.

As an example, the Library data has been included in an Excel file

named just Library Excel. Part of that data in the Excel format is

included in Exhibit 55. Note that the first row contains the

variable names that you saw in the SPSS version of that data file

in Exhibit 4. Of course, those names had to be typed into that

first row of the Excel spreadsheet or into the SPSS file. For this

data, it really does not matter whether those names were typed

into Excel or SPSS.

An important feature of any statistical analysis program is that

data can be entered easily and that data that was provided

73


originally in either an Excel file or in a delimited text file can be

imported with very little work.

The first step in importing the Library Excel.xls file into SPSS is in

Exhibit 56, i.e., pull down the File menu and select Data. After

clicking on Data, you will see a window that looks similar to

Exhibit 57. You will probably see something different from the

Downloads folder that appears in the central window of Exhibit

Exhibit 55

Exhibit 56

74


57. Pull down the ‘File of type:’ window and select Excel (*.xls,

*.xlsx, *.xlsm). Then navigate to the folder in which you saved

the data files for this tutorial. When you arrive at that folder,

e.g., ‘C:\\Marketing Research’, you should see a window similar

to Exhibit 58. Now, just click on the Library Excel.xls file. You will

then see Exhibit 59. The Library Excel.xls file contains variable

names in the first row, so leave that box checked in Exhibit 59.

Just click “OK” and the data set will open in SPSS, as seen in

Exhibit 60.

As you can see, the SPSS spreadsheet in Exhibit 60 looks very

similar to the data in Exhibit 55. However, you will have noticed

that the Value Labels that are shown in Exhibit 4 do not appear

in Exhibit 60. And, those Value Labels can’t be made to appear

Exhibit 57

75


by pulling down the View menu and clicking Value Labels. What

do you think the 999s signify in Exhibit 60. As you probably

guessed, those 999s indicate “missing values”. Respondents 5, 8

and 20 did not answer the question that asked for the number of

times the library was visited during the past month. Also,

respondents 2 and 19 did not enter an amount of money they

expected to spend during their trips to the library.

Exhibit 58

Exhibit 59

76


Exhibit 60

77


Entering Data into PASW

Statistics 18®

Up to now you've been working with a prepared data set. Your

next step is to go through a brief introduction to using the SPSS

Data Editor. Entering a data set into SPSS Windows® is a very

simple and logical process.

The first thing to do is pull down the file menu and slide the New

menu over to Data, click and release, as shown in Exhibit 61. You

will then see a fully blank spreadsheet (Exhibit 62), that will act

as our vehicle for entering data.

Let's say that you have a very short four-question survey. The

variables in the data set are:

Exhibit 61

Exhibit 62

78


� ID - respondent identification number;

� Usage - the number or times out of the last 10 purchases that

the brand of interest was purchased;

� Intention - respondents' intentions to buy the brand of interest

when they next buy the product, measured on a 5 point

intention scale with 5 being Definitely Will Buy, 4 is Probably

Will, 3 is Might or Might Not, 2 is Probably Will Not, and 1 being

Definitely Will Not Buy; and

� Sex- the sex of the respondent.

To being this process, click on Variable View in the lower left

corner of the spreadsheet. Now, type in "ID" under "Name" in

the first row and click the Enter key. Notice that SPSS

automatically entered information into the cells in the ID row of

the spreadsheet. (See Exhibit 63.) While SPSS should be

commended for trying to be helpful, some of the information is

correct, some is not and some needs to be completed. Of course,

the name ‘ID’, is correct and the Type is numeric. The width is

fine, but there will be no decimals in the respondent

identification values. Change the ‘2’ decimals to ‘0’. Type

‘Respondent identification number’ into the Label column. There

will be no Values and none of the respondents will be missing.

Exhibit 63

79


The number of columns can stay at 8 and the alignment can

remain as specified. The Measure of this variable, ID, is nominal.

Change Scale to Nominal by clicking in the right area of the

rectangle and pulling down the menu to Nominal. Three circles

should now show in that cell. The Role of Input can be changed

to None by clicking in the right hand area of the Role cell, pulling

down the menu and clicking on None. Now, your first row should

look like Exhibit 64.

Move down to the second variable row and type "Usage," and so

on. Notice that SPSS automatically fills in the other dimensions

of each variable with default values. You should see Exhibit 65 on

your screen.

You should now change the parameters for each variable so that

they are appropriate for that variable. To make those changes,

Exhibit 64

Exhibit 65

80


click in the right part of each cell. You should keep Type, Missing,

Columns and Align as they appear in Exhibit 65.

Change Decimals to 0 (zero) and change the Variable Labels to

those in Exhibit 66. Assign Value Labels, i.e., Values, as you see in

Exhibit 67 by clicking in the right part of the Values box for a

variables box and then typing in the label that you want for each

of the numerical values that variable might take. Adding Values

for the Intention variable is shown in Exhibit 67. You should also

change the Measure to Nominal for "Id," Scale for "Usage," Scale

for "Intention," and Nominal for "Sex."

The hard part is now finished. All you need to do next is to shift

to the Data View and type in the data. For respondent 1, if you

type in 1, 3, 3, 1, the first line of the Data View editor will look

Exhibit 66

Exhibit 67

81


like the table in Exhibit 68. Continue entering the data until your

Data View editor looks exactly like Exhibit 56. When you have

finished, pull down the File menu and Save the data using

whatever name you desire.

Exhibit 68

82


Manipulating the Data File

There will be occasions when you will need to change the data

that you originally entered in a file. We will illustrate two of the

most often used procedures: Select Cases and Recode. You are

highly encouraged to investigate and play with the other Data

functions; these are found under the Data, Transforms and

Utilities menus.

Filtering the Data Set

Pull down the Data menu and notice Select Cases near the

bottom of the menu in Exhibit 69; this is used when you want to

work temporarily with only part of the data set. To use this

function, pull down the Data menu and click on "Select Cases..."

The Select Cases window appears, as in Exhibit 70. Let's say that

in our little practice data set that we want to analyze only those

cases where the respondents had bought our brand at least 7

out of the last 10 times. Click on the circle next to "If condition is

satisfied" and then click on the "If..." button.

The "Select Cases: If" window pops up and you have the

opportunity to create an "if statement" that must be satisfied for

a case to remain active in your forthcoming analyses. You can

see, in Exhibit 71, that we have selected ‘Usage’ variable and

moved it to the central section. Click on the ">=" button and

then click ’7’ in the keypad. You may also just type in all of these

83


characters from the keyboard. While you are still in the window

of Exhibit 71, notice the tremendous versatility of this option.

While you may type complex statements for selecting cases, you

may also select functions from the ’Function Group:’ menu to

help identify cases of interest.

Press the Continue button and you'll return to the Select Cases

window. You now have the option to filter out those cases that

do not satisfy the "If" statement, or delete those cases. Choose

"Filter" and then click OK.

Exhibit 69

84


Be very careful at this stage. Notice in Exhibit 72, which is the

Output block from Exhibit 70, that the ‘Filter out unselected

cases’ option is selected by default. There are two other options.

The second, ‘Copy selected cases to a new dataset’ is available if

you want to build a new dataset with only those respondents

who satisfy your selection criteria.

The last of the three options, ‘Delete unselected cases’ is

hazardous since cases will be eliminated permanently from your

dataset. There will be times when this is exactly what you want

to do. However, if you incorrectly click that button, you may be

very sorry.

Exhibit 70

85


After clicking the ‘OK’ button, you'll see the Data Editor, as

portrayed in Exhibit 73, with slashes through those cases that did

not meet the requirements of the ‘If’ statement. Notice that a

filter variable has been generated and placed in the file.

If you analyze your data now, you will be working with only the

three people who had bought the brand at least 7 out of the last

Exhibit 71

Exhibit 72

86


10 times. The two unselected cases can be brought back into the

data analysis by pressing the reset button in the Select Cases

window, shown in Exhibit 73. If you wanted to permanently

delete those two respondents who had bought the brand 6 or

fewer times, you would click on the Deleted button near the

bottom of the window in Exhibit 70.

Recoding Values

The Recode function is at least as valuable as the Select Cases

function. There are many situations in which the analyst wants

to change the values of the original variable in some way.

Perhaps the initial coding was not done correctly. In many cases,

the initial coding of a variable can be changed to assist with a

more informative analysis.

For example, in our data entry case you might want to recode

the Intention to Buy Next variable into "positive intention" (i.e.,

probably will buy and definitely will buy), and "negative

intention," meaning "probably will not buy" and "definitely will

not buy." We'll illustrate this recoding now.

Exhibit 73

87


First, pull down the Transform menu, as in Exhibit 74, and then

click on the ‘Recode into Different Variables…’ option. Be careful;

Exhibit 74

Exhibit 75

88


if you select ’Recode into Same Variables …’, the original variable

will be replaced by the new recoding and lost forever.

The Recode into Different Variables window, in Exhibit 75, will

be accessible. Move the ‘Intention’ variable into the central

‘Numeric Variable’ window, rename the Output Variable as

"Intention2" and type the label ‘Intention to Buy, Pos & Neg’ into

the Label box. Now press the ‘Old and New Values…’ button and

the Recode into Different Variables: Old and New Values window

will open (Exhibit 76).

In the left window, highlight the circle next to ‘Range:’ and enter

1 in the top box, and 2 in the box below "through." In the New

Value box in the upper right, highlight the Value circle and enter

the value 1. Then click the Add button and you'll see this

recoding added to the ‘Old --> New’ window. Now, enter 3 and 4 Exhibit 76

89


in the two ‘Range:’ boxes, and 2 in the New Value box. Press

Add. Finally, highlight the ‘All other values’ circle on the lowest

left circle button and ‘System-missing’ in the upper right-hand

side of the window. Click Add and then Continue. This will put

you back in the Recode window where you should press the

Change button and then click OK.

The results of your recoding are displayed in the Data Editor

matrix in Exhibit 78. Notice that the values of ‘Intention’ have

been transformed to ‘Intention2’ as you directed. The values of

the two cases that stated ‘might or might not’, (neither positive

nor negative), have been set to ‘missing’ (indicated by the dots).

When you analyze this variable, there will be fewer valid cases

than with the original variable, ‘Intention’. Two respondents will

Exhibit 77

90


be categorized as having positive intentions to buy and one as

having negative intentions to buy. Now, you should go into the

Variable View and provide value labels for ‘Intention2’. If you

want to save this transformation, be sure to click on the diskette

icon for ’Save’ in the Tool Bar.

Exhibit 78

91


Non-parametric Analysis

There are some situations where data is available that is not

conducive to parametric analysis. Parametric analysis, very

basically, refers to those statistical procedures that assume

particular probability distributions underlie the data. In some

cases, this assumption is obviously wrong or the analyst is

uncertain which distribution might be present. In these

situations, non-parametric procedures exist for analyzing the

data. We've already used two of these, the Chi-Square test and

Lambda. While there are many non-parametric test that can be

very valuable, we will work with just two, the Friedman test and

another version of the chi-square test.

Friedman Two-way Analysis of Variance

for Ranked Data

In this section, we'll look at one additional non-parametric test

that is often very helpful when analyzing survey data. Let's say

that you asked people to rank five Arnold Schwartzenegger

movies: Twins, True Lies, Eraser, Conan the Barbarian, and

Kindergarten Cop. Respondents were asked which of the five

they liked best (coded 1), second best (2), and so on. As an

example, we have the findings from 20 respondents. These are

provided in the file ARNOLD.SAV that accompanies this book.

The partial data matrix is presented in Exhibit 79.

92


A legitimate question is whether any of these movies is

significantly better liked or worse liked than the others.

Remember back to ANOVA; doesn't this question sound very

much like what was asked when you performed an ANOVA on

parametric data? However, we can't use the parametric ANOVA

that we used before.

Fortunately, a test called Friedman's Two-way Analysis of

Variance by Ranks does exist to specifically analyze this type of

Exhibit 79

93


data. Before beginning this test, make sure that the movie

variables are specified as ‘scale’ in the measure column of the

Variable View. Pull down the Analyze menu, slide over the

Nonparametric Tests menu, and then choose ‘Related Samples’.

(See Exhibit 80.)

The window for ’Nonparametric Tests: Two or More Related

Samples’ opens and is shown in Exhibit 81. That window has

three tabs and Exhibit 81 is open to the ’Fields’ tab. Select the

Exhibit 80

94


five movie titles from the left window and then click on the

arrow that will move them into the right box for ’Test Fields’.

Next, click on the ‘Settings’ tab and you will see Exhibit 82. There

are two ways to run the Friedman test: 1) just leave the

‘Automatically choose the test based on the data’ button clicked

and click the ‘Run’ button; or 2) click on the ‘Customize tests’

and then click on the box next to Friedman’s in the Compare

Distributions box. It might be instructive to do both.

Exhibit 81

95


Let’s go the ‘Compare Distributions’ route. When you take the

‘Compare Distributions’ alternative and click on the Friedman’s

box, you’ll see that you can pull down the menu next to

‘Multiple comparisons’. That menu is shown in Exhibit 83. If you

click on ‘All pairwise’ or ‘Stepwise step-down’ in Exhibit 83, you

will get an overall analysis plus statistical

comparisons between each of the movies

in all possible pairs. That analysis will

Exhibit 82

Exhibit 83

96


indicate which movies are differently ranked compared to each

of the other movies.

The overall analysis output in Exhibit 84 states the ‘Sig.’ to be

0.001 and the Decision is to reject the null hypotheses. In this

case, the null hypothesis is that all 5 of Arnold’s movies are

equally liked. Rejecting that null hypothesis means that at least

one of the movies is significantly better liked or worse liked than

the others or that there are significant difference at more levels

between the movies.

Now, double click on the box portrayed in Exhibit 84 and

additional levels of analysis are displayed, as in Exhibit 85.

Actually, the display in SPSS is too wide to display on one page,

so one plot on either side of the three shown was cropped from

the exhibit. You’ll now see in Exhibit 85 the Friedman test

statistic of 17.880 and the significance level. Also, you can see

that the graphs of the ranks are quite different.

At the bottom of the window that contains Exhibit 85 is a menu

bar that is shown in Exhibit 86. Not all of those options will be

Exhibit 84

97


available for all datasets. We will just investigate ‘Pairwise

Comparisons’ and ‘Homogeneous Subsets’.

If you select ‘All pairwise’, you will see a rather elaborate

statistical test that provides significance tests between all pairs

of the five movies, i.e., 10 tests in all. Plus, you get two graphs.

Exhibit 87 shows the pairwise tests covering all of the movies.

You’ll see that the analysis of two pairs of movies produced

significant differences, i.e., tests that were significant at the 5%

Exhibit 85

Exhibit 86

98


level of risk. True Lies and Conan were judged to be significantly

different as were True Lies and Kindergarten Cops.

Above the table in Exhibit 87 in the SPSS output is a graph that is

shown in Exhibit 88. If you place your cursor on any of the lines,

the adjusted significance of the difference between the two

movies represented at the end nodes will be displayed. The Adj.

Sig. = .002 in the exhibit indicates that Kindergarten Cop and

True Lies are significantly different.

Exhibit 87

99


You will now see a small button call ‘Layout’ in the menu bar

below the table in Exhibit 87. If you click that button, you’ll get

yet another graph. (See Exhibit 89.) That graph provides the

same information displayed in Exhibit 88 but is just drawn

differently.

If you choose ‘Stepwise step-down’ in Exhibit 83 and then select

‘Homogeneous subsets’ in Exhibit 86, a test will be performed to

Exhibit 88

Exhibit 89

100


essentially tell you which groups of movies are judged to be

similar among themselves and, perhaps, different from another

subset of movies, or just one other movie. That output is shown

in Exhibit 90.

The Mean Ranks for each of the movies is shown in Exhibit 90.

True Lies ranks highest with a mean rank of 1.80. The lowest

rank (highest mean) is Kindergarten Cop. The movies are listed in

two columns. The Subset 1 column indicated that True Lies and

Twins lie in that subset and Subset 2 contains Twins, again, plus

Eraser, Conan and Kindergarten Cop. In the pairwise analysis, we

saw that True Lies and Twins were not significantly different. We

Exhibit 90

101


also saw that True Lies was significantly different from Conan

and from Kindergarten Cops. So, most of the information from

the pairwise analysis is shown here. However, the detail of the

pairwise analysis does not all show in the homogeneous subsets

table. Essentially, there are two subsets with Twins belonging to

both. This should lead us to conclude that True Lies is

significantly different from the other movies and is the most

highly preferred of the five and my personal favorite from the

Gubernator.

102


Chi-Square Test for Uniformity of a

Frequency Distribution

The last test will be another nonparametric test, a Chi-Square

test that measures whether an observed frequency distribution

can be considered to be a uniform distribution (all values appear

with about the same level of frequency), or is significantly

different from a uniform distribution.

We'll return to the Central Public Library for an example. Let's

suppose that someone hypothesized before the survey that all

the floors of the library were used about equally by customers.

The frequency distribution appears in Exhibit 91 and the graph in

Exhibit 92.

The null hypothesis is that an equal percent of the sample stated

that they used each of the five floors of the public library. The

null hypothesis always sets up an expectation for the data. What

would that be in this example? Of course, we should expect that

Exhibit 91

103


one-fifth, or 20%, of the 776 respondents who answered this

question to say each of the five floors.

This Chi-Square test is obtained by pulling down the Analyze

menu, sliding over the Nonparametric Tests menu, and choosing

‘One sample test…’ (See Exhibit 93). Open the ‘Fields’ tab. Then

click on ‘Use custom field assignments’ and select ‘Floor used

most’ and sent it into the right side box, ‘Test fields’, by itself.

The simplest thing to do now is to just click the ‘Run’ button and

let SPSS do the calculations. If you want to have more control

over the process, click on the ‘Settings’ tab and then on

Exhibit 92

104


‘Customize settings’. Now, you can choose the type of statistical

non-parametric test that you would like to perform on the data.

The most appropriate test for the data and the hypothesis is the

chi-square test to compare the observed probabilities to the

observed probabilities. When you click on that option shown in

Exhibit 95, you should then click on the options button. There

are two options: 1) to test whether the classes contain equal

percentages of respondents; or 2) to test against some other

distribution of percentages, which can be entered into the table

provided in Exhibit 95.

Option 2 allows you to test the observed percents against, for

example, a distribution that occurred the last time the study was

Exhibit 93

105


conducted. Or, perhaps someone hypothesized a distribution of

customers using the floors of the library based on their

observations. By entering those values in the table, you could

test whether those observations from the past are confirmed by

the data or not.

The output from this chi-square test is quite extensive, as was

the production from the Friedman’s test. Whether you chose the

default values or manually specified the chi-square test of the

uniform distribution, the first information produced is the table

shown in Exhibit 96.

You can see from Exhibit 96 that SPSS automatically choose the

one sample chi-square test, unless you chose it manually. Also, it

Exhibit 94

106


takes the variable label and sets up the null hypothesis in the

right-most pane. Notice that the ‘Sig.’ value is 0.000, which

indicates that if we were working with a 5% level of risk then we

should reject that null hypothesis. That means that if you felt

that the graph in Exhibit 92 did not look uniform, you’re correct,

statistically.

Exhibit 95

Exhibit 96

107


Next, double click on that box in Exhibit 96 and the information

shown in Exhibit 97 will pop up for you. The chi-square test

statistic is 81.436. If you look in the back of your statistics or

marketing research textbook in the chi-square table, you’ll find

the critical value of 9.49 for a 5% level of risk of a type I error and

4 degrees of freedom. That’s the critical value for the chi-square

distribution that cuts off 5% of the tail to the right of the

distribution. If the calculated chi-square value is 9.49 or higher,

then the null hypothesis should be rejected. In this case, the

81.436 is huge compared to 9.49 and you should feel very

comfortable in rejecting the null hypothesis. Of course, the

108


‘Asymptotoc Sig.’ of 0.000 tells you that you would have a level

of risk much smaller than 5%, or 0.05, and should reject the null

hypothesis.

You can conclude that the floors are not used with the same

frequency; it appears that the fourth is used more often than

others and that the first floor is used much less than the other

four.

109


Summary

This introduction to PASW (SPSS) Statistics 18® has covered quite

a bit of material. The intention was to give you a quick way to

become familiar with the capabilities of SPSS 18.0 for Windows®

and to give you some experience with a real data set. Also, you

established a small data set on your own that requires many of

the basic actions of entering a survey data set of realistic

proportions.

The best way to gain knowledge of data analysis is to conduct a

detailed analysis on a real set of data. The data set for the

Hamilton Central Public Library is available to you and can serve

this purpose very well. Whether you're using PASW (SPSS)

Statistics 18® or PASW (SPSS) Statistics 18® Student Version,

SPSS provides enough functionality to allow you to conduct

professional-level analyses. I hope you benefit from these

exercises.

110


111


Appendix

Central Public Library

On-site Library User Survey

Questionnaire

(Shortened from original version)

It was necessary to reduce the number of variables in this questionnaire to fewer than 50 so

that it could be accommodated by the Student Version of PASW (SPSS) Statistics 18®. The

original version of this questionnaire was handed out to library customers during the hours of

operation for 7 consecutive days. Customers completed the questionnaires and deposited

them in collection boxes on each floor of the library. Several other surveys, besides this on-site

version, were executed for the library during this study. This study was conducted several

years ago and is used here as a sample data set to assist with the learning of SPSS. We’re

grateful to the Hamilton Central Public Library for releasing this data for educational purposes.

Introduction to Survey Data Analysis Using SPSS Windows & Student Version

Appendix Hamilton Central Public Library

On-Site Library User Survey

Questionnaire

(Shortened from the original version.)

Copyright 1999, 2000, 2003 by Dr. Ken Deal No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.

Introduction to Survey Data Analysis Using SPSS Windows

Marketing Decision Research Inc.

Hamilton Central Public Library Community Opinion Study

Have you filled in this form before? If so, thank you very much. We don't expect you to fill it in again.

The Hamilton Central Public Library is in the process of conducting a study to determine how it can better serve this community. This questionnaire is being used to obtain the opinions of you and others who use this Library. We'd be very grateful if you would carefully complete all of the questions and return the questionnaire to the attendant or to the Survey Return Box on the Information Desk for this floor. Your answers will be treated anonymously and confidentially. 1. When was the last time that you were in this Library before today? (Please check the box.) [ ]1 EARLIER THIS WEEK 2. How many times have you visited this [ ]2 LAST WEEK Library, including today's visit during the [ ]3 THE WEEK BEFORE LAST past month? [ ]4 THREE TO FOUR WEEKS AGO [ ]5 BETWEEN ONE AND THREE MONTHS AGO ___ ___ ___ NUMBER OF TIMES [ ]6 BETWEEN THREE AND SIX MONTHS AGO [ ]7 BETWEEN SIX MONTHS AND ONE YEAR AGO [ ]8 ONE YEAR AGO OR LONGER [ ]9 FIVE NEVER BEEN IN THIS LIBRARY BEFORE TODAY 3. What mode of transportation did you use to get to the Hamilton Central Public Library today? [ ]1 WALKED [ ]6 BUSINESS VEHICLE [ ]2 BUS [ ]7 RENTAL VEHICLE [ ]3 TAXI CAB [ ]8 BICYCLE [ ]4 SPECIAL TRANSPORTATION FOR DISABLED, SUCH AS DARTS [ ]5 PERSONAL VEHICLE 4. Where were you just before you came to this Library today? [ ]1 AT HOME [ ]2 AT WORK [ ]3 AT SCHOOL [ ]4 SHOPPING [ ]5 LEISURE ACTIVITY 5. Which of the following best describes how you happened to come to the Central Library today? [ ]1 LIBRARY WAS THE ONE SPECIAL REASON FOR THIS TRIP [ ]2 LIBRARY WAS ONE OF SEVERAL THINGS I WANTED TO DO ON THIS TRIP [ ]3 THOUGHT OF LIBRARY AFTER STARTING OUT TO DO SOMETHING ELSE 6. How much money will you have spent on all products, services or other items in the downtown core area during this trip to this Library today? (Please write in amount.) $ ___ ___ ___ . ___ ___

Copyright 2003 by Dr. Ken Deal No reproduction of this material , nor can any made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.

Introduction to Survey Data Analysis Using SPSS Windows

7. On which floor of this Library have you spend the greatest amount of time today? [ ]l FIRST FLOOR [ ]4 FOURTH FLOOR [ ]2 SECOND FLOOR [ ]5 FIFTH FLOOR [ ]3 THIRD FLOOR

8. Did you use the floor indicated in Question 7 above more as a source of books or services or as a reading area?

[ ]l MORE AS A SOURCE OF BOOKS, OTHER MATERIALS OR SERVICES [ ]2 MORE AS A READING AREA

9. Please rate the range of selection of each of the following materials available from this Library. Check the box to the left of each row to indicate those that you used during 1990 and 1991. For each box checked, then circle the number on the scale shown below from very good selection to very poor selection. USED VERY SLIGHTLY SLIGHTLY VERY DURING POOR POOR POOR GOOD GOOD GOOD 1990? SELECTION SELECTION SELECTION SELECTION SELECTION SELECTION [ ]Fiction Books .......................1 .......................2 .......................3 ....................... 4……………. .. 5 ....................... 6 [ ]Business Books .....................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Video Cassettes ....................1………………2…………… ...3……………... 4 ....................... 5 ....................... 6 [ ]Magazines..............................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Newspapers ..........................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Books about Health ..............1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 [ ]Music on Cassette .................1 .......................2 .......................3 ....................... 4 ....................... 5 ....................... 6 10. We'd like your opinions about several aspects of the Library. Please circle the number that indicates how strongly you agree or disagree with the phrases listed below. STRONGLY SOMEWHAT SLIGHTLY SLIGHTLY SOMEWHAT STRONGLY DISAGREE DISAGREE DISAGREE AGREE AGREE AGREE …is always noisy……...…… ....1………… .......2…….… ..........3……………..4.……………5 ………..……6 …has an easy-to-use arrangement ………...1………… .......2 .......................3……………..4.……………5 ………..……6 …it's hard to find my way around ………...1………… .......2 .......................3……………..4.……………5 ………..……6 …is too modern …………......1……… ...........2 .......................3……………..4.……………5 ………..……6 …is a great contribution to Hamilton …… ...1………… .......2 .......................3……………..4.……………5 ………..……6 …has too many rules ………..1………… .......2 .......................3……………..4.……………5 ………..……6 11. What are your opinions about the staff in this Central Library? STRONGLY SOMEWHAT SLIGHTLY SLIGHTLY SOMEWHAT STRONGLY

DISAGREE DISAGREE DISAGREE AGREE AGREE AGREE …is friendly .............................1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6 …is too busy to assist me properly ………. 1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6 …appears to be knowledgeable 1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6 …is easily approachable ……. 1 ......................2 ......................3 ...................... 4 ...................... 5 ...................... 6

Copyright 2003 by Dr. Ken Deal

No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.

Introduction to Survey Data Analysis Using SPSS Windows & Student Version 12. How convenient or inconvenient to you are the hours of operation of the Library?

[ ]6 VERY CONVENIENT [ ]5 SOMEWHAT CONVENIENT [ ]4 SLIGHTLY CONVENIENT [ ]3 SLIGHTLY INCONVENIENT [ ]2 SOMEWHAT INCONVENIENT [ ]1 VERY INCONVENIENT

13. How important to you personally are your visits to this Library?

[ ]6 EXTREMELY IMPORTANT [ ]5 VERY IMPORTANT [ ]4 FAIRLY IMPORTANT [ ]3 SOMEWHAT IMPORTANT [ ]2 SLIGHTLY IMPORTANT [ ]1 NOT IMPORTANT AT ALL

14. How important to your career are your visits to this Library? [ ]6 EXTREMELY IMPORTANT [ ]5 VERY IMPORTANT [ ]4 FAIRLY IMPORTANT [ ]3 SOMEWHAT IMPORTANT [ ]2 SLIGHTLY IMPORTANT [ ]1 NOT IMPORTANT AT ALL 15. In which of the following statements best describes how you usually use this Library? [ ]1 I BORROW MATERIALS FROM THE LIBRARY [ ]2 I STUDY IN THE LIBRARY [ ]3 I BROWSE IN THE LIBRARY [ ]4 I USE LIBRARY MATERIALS IN THE LIBRARY 16. Which of the purposes below best describes why you use this Library? [ ]1 FOR LEISURE OR RECREATIONAL READING [ ]2 FOR SCHOOL [ ]3 FOR LEARNING HOW TO DO THINGS [ ]4 FOR MY JOB [ ]5 FOR PERSONAL RESEARCH 17. How satisfied are you with this Library overall considering its services, its collections, its physical facilities and its staff? [ ]6 VERY SATISFIED [ ]5 SOMEWHAT SATISFIED

[ ]4 SLIGHTLY SATISFIED [ ]3 SLIGHTLY DISSATISFIED [ ]2 SOMEWHAT DISSATISFIED [ ]1 VERY DISSATISFIED 18. Are you male or female? [ ]1 FEMALE [ ]2 MALE

Copyright 2003 by Dr. Ken Deal No reproduction of this material can he made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.

Introduction to Survey Data Analysis Using SPSS Windows & Student Version 19. Please indicate the highest level of formal education which you've completed. [ I1 SOME GRADE SCHOOL [ ]6 GRADUATED COLLEGE [ ]2 GRADE SCHOOL [ ]7 SOME UNIVERSITY [ ]3 SOME HIGH SCHOOL [ ]8 GRADUATED UNIVERSITY [ ]4 HIGH SCHOOL GRADUATE [ ]9 SOW GRADUATE SCHOOL [ ]5 SOME COLLEGE [ ]10 MASTER'S DEGREE OR HIGHER 20. In which area do you live? [ ]1CENTRAL AREA OF HAMILTON [ ]6 DUNDAS [ ]2 NORTH END [ ]7 ANCASTER [ ]3 WEST END [ ]8 STONEYCREEK [ ]4 EAST END [ ]9 BURLINGTON [ ]5 HAMILTON MOUNTAIN [ ]10 OTHER 21. What is your occupation? [ ]1 ADMINISTRATIVE/MANAGEMENT [ ]9 FOREMAN/PLANT SUPERVISION [ ]2 PROFESSIONAL/SEMI-PROFESSIONAL [ ]10 FARMER [ ]3 SALES [ ]11 SELFEMPLOYED [ ]4 CLERICAL [ ]12 STUDENT [ ]5 FULL-TTME HOMEMAKER [ ]13 RETIRED [ ]6 SKILLED LABOUR [ ]14 UNEMPLOYED [ ]7 UNSKILLEDLABOUR [ ]15 DON'T WORK BECAUSE OF DISABILITY [ ]8 SERVICE WORKER [ ]16 OTHER 22. In which area do you work? [ ]1 JACKSON SQUARE/STELCO TOWER BLOCK [ ]7 DUNDAS [ ]2 CENTRAL AREA OF HAMILTON [ ]8 ANCASTER [ ]3 NORTH END [ ]9 STONEY CREEK [ ]4 WEST END [ ]10 BURLINGTON [ ]5 EAST END [ ]11 OTHER [ ]6 HAMILTON MOUNTAIN 23. In which year were you born? 19__ __ 24. Which language did you first learn to speak and still understand? (Please write on line below.) ___________________________________ Your answers to the questions above will help this Library serve you better in the future. Thank you very much for your help. We might need to check back with some people to-confirm their answers. So that we can do this, would you please you please print your name and phone number the line below? First Name Last Name Phone Number _____________________ _________________________________ ______________________________

Copyright 2003 by Dr. Ken Deal No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.

Introduction to Survey Data Analysis Using SPSS Windows & Student Version

Correspondence between question numbers and variable names. Q.1 TIMLAST Q.2 VISITMN Q.3 MODE Q.4 BEFORE Q.5 HACTRPWHY Q.6 SPENT Q.7 FLOORMST Q.8 FLOORUSE Q.9 USFIC SFIC USBUS SBUS USVIDEO SVIDEO USMAG SMAG USNEWS SNEWS USHEALTH SHEALTH USMUSCAS SMUSCAS Q.10 NOISY ARR WAY MODERN CONTRIB RULES Q.11 FRND BUSY KNOW APPR Q.12 HOURS Q.13 IMPPER Q.14 IMPCAR Q.15 HWUS Q.16 WYUS Q.17 SATIS Q-18 SEX Q.19 EDUCATN Q.20 LIVE Q.21 OCCUP Q.22 WORK Q-23 BORN Q.24 ETHNIC Copyright 2003 by Dr. Ken Deal No reproduction of this material can be made, nor can any derivatives be prepared, without the express written consent of Dr. Ken Deal.

a guide to pasw (spss) statistics 18

Documents