organizing your data for statistical analysis in spss

27
Organizing Your Data for Statistical Analysis in SPSS Edward A. Greenberg, PhD ASU HEALTH SOLUTIONS DATA LAB REVISED JANUARY 4, 2013

Upload: oswald

Post on 05-Jan-2016

56 views

Category:

Documents


4 download

DESCRIPTION

Organizing Your Data for Statistical Analysis in SPSS. Edward A. Greenberg, PhD. ASU HEALTH SOLUTIONS DATA LAB. Revised January 4, 2013. SPSS Data Sets. SPSS Data Sets. SPSS Data Sets. Rows are cases or observations Columns are variables (measurements) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Organizing Your Data for Statistical Analysis in SPSS

Organizing Your Data for Statistical Analysis in SPSSEdward A. Greenberg, PhDASU HEALTH SOLUTIONS DATA LAB

REVISED JANUARY 4, 2013

Page 2: Organizing Your Data for Statistical Analysis in SPSS

SPSS Data Sets

Page 3: Organizing Your Data for Statistical Analysis in SPSS

SPSS Data Sets

Page 4: Organizing Your Data for Statistical Analysis in SPSS

SPSS Data Sets

• Rows are cases or observations• Columns are variables (measurements)• Up to 231-1 columns (2,147,493,647)• No limit on the number of cases

Page 5: Organizing Your Data for Statistical Analysis in SPSS

Variable Types

• Numeric (40 character maximum length)

• Dates and times (various formats)• Other variations of numeric (currency,

comma, scientific notation, etc.)• String (32,767 maximum length)

Page 6: Organizing Your Data for Statistical Analysis in SPSS

Variable Names

• Variable names must be unique.• Variable names may be up to 64

characters in length.• Names can contain letters, numbers, or

special characters. • Names must start with a letter or @, #,

or $.

Page 7: Organizing Your Data for Statistical Analysis in SPSS

Unit of Analysis

What constitutes a “case?”• A person• A household• An organization• An experimental trial

Page 8: Organizing Your Data for Statistical Analysis in SPSS

Level of Measurement

• Nominal• Ordinal• Interval• Ratio } Scale

Page 9: Organizing Your Data for Statistical Analysis in SPSS

Labeling Data

• Variable names may be short and cryptic.

• Variable labels can be up to 255 characters.

• SPSS procedures display at least 40 characters of variable labels.

• Value labels can be up to 120 characters.

Page 10: Organizing Your Data for Statistical Analysis in SPSS

Order of Variables

• The order of variables in the SPSS data file normally should be the same as the order of items in the questionnaire.

• Use variable names that help you identify the scale or instrument to which they apply.

Page 11: Organizing Your Data for Statistical Analysis in SPSS

Case Numbers

• Each case in an SPSS file should include a case number.

• Often this will be the first variable in the file.

• The case number does not identify the subject but it links the data record to the subject’s questionnaire.

• Useful for correcting data entry errors

Page 12: Organizing Your Data for Statistical Analysis in SPSS

Create a Codebook

• When preparing to enter your data into SPSS, prepare a codebook for the data set.

• The codebook documents all of the items to be entered in the data set:– Variable names and labels– Variable types and formats– Coded values for categorical items– Missing values

Page 13: Organizing Your Data for Statistical Analysis in SPSS

Sample Codebook

VARIABLE NAME

TYPE & LENGTHDESCRIPTION / VARIABLE LABEL / CODED VALUE / VALUE LABEL

CASENO NUM 3 Case numberCase number

SEX STR 1 6. I am:M MaleF Female

AGE NUM 2 7. My age is:(Code actual age in years)

EDUC NUM 1 8. What is the highest level of education that you have completed?Education level1 No formal education2 Some grade school3 Completed grade school4 Some high school5 Completed high school6 Some college7 Completed college8 Some graduate work9 A graduate degree

Page 14: Organizing Your Data for Statistical Analysis in SPSS

Missing Data

Data may be missing for several reasons:• Don’t know• Refused to answer• Not applicable• Skipped a question• Instrument problem• Data entry omission

Page 15: Organizing Your Data for Statistical Analysis in SPSS

Missing Values

SPSS provides several ways of designating numeric data as “missing values.”• A blank cell is treated as “system

missing,” represented by a dot (“.”) in the SPSS Data Editor.

• Specific values can be declared as “user missing” values.

Page 16: Organizing Your Data for Statistical Analysis in SPSS

Missing Values

• Up to three “user missing” values can be declared for a variable.

• Or, a range of values plus one additional value can be declared to be missing.

Page 17: Organizing Your Data for Statistical Analysis in SPSS

Missing Values

Page 18: Organizing Your Data for Statistical Analysis in SPSS

Missing Values

In this example, variable AGEWED has three labeled values that are to be treated as missing

Page 19: Organizing Your Data for Statistical Analysis in SPSS

Missing Values

The three values are declared to be missing in the Missing Values dialog.

Page 20: Organizing Your Data for Statistical Analysis in SPSS

Missing Values

• Expressions handle missing values in different ways.

• The result of (var1+var2+var3)/3 is missing if any of the three variables is missing.

• The result of MEAN(var1, var2, var3) is missing if all three of the variables are missing.

Page 21: Organizing Your Data for Statistical Analysis in SPSS

Missing Values in Procedures

The FREQUENCIES procedure excludes cases with missing values from computations.

Page 22: Organizing Your Data for Statistical Analysis in SPSS

Multiple Responses

• Multiple-response items are questions that can have more than one value for each case.

• Two ways of coding:– For each response, a variable can have one

of two values e.g., 1=Yes and 2=No (“multiple-dichotomy” method)

– Create a series of variables for 1st choice, 2nd choice, etc. (“multiple categories” method)

Page 23: Organizing Your Data for Statistical Analysis in SPSS

MULT RESPONSE Procedure

• In the MULT RESPONSE procedure, multiple response variables are combines into groups.

• The MULT RESPONSE procedure counts responses in multiple response groups in frequency or cross tabular tables.

• Total percentages of responses generally will exceed 100%.

Page 24: Organizing Your Data for Statistical Analysis in SPSS

Repeated Measures

• Data that are recorded on more than one occasion for each subject

• Some procedures, such as GLM, require that all measurements for a case be on the same data record.

• Other procedures, such as the MIXED procedure, may expect one data record per occasion.

Page 25: Organizing Your Data for Statistical Analysis in SPSS

One data record per subject, one variable per occasion on which it is measured

Repeated Measures

Page 26: Organizing Your Data for Statistical Analysis in SPSS

One data record per occasion per subject

Repeated Measures

Page 27: Organizing Your Data for Statistical Analysis in SPSS

Repeated Measures

The good news is that SPSS allows you to easily restructure a data set

• Restructure selected variables into cases

• Restructure selected cases into variables

• Transpose all data