descriptive statistics in spss - denver, colorado · descriptive statistics in spss when first...

Descriptive Statistics in SPSS

When first looking at a dataset, it is wise to use descriptive statistics to get some idea of

what your data look like. Here is a simple dataset, showing three different variables –

type of psychological problem someone was treated for, type of treatment approach used,

and symptom level at the end of treatment. One of these variables (symptom) is on an

interval-level scale; the other two are grouping variables (i.e., they are measured on a

nominal-level scale).

The type of descriptive statistics you can calculate depends on the level of data that you

have to work with. Here are the options:

• For simple description of nominal-level variables (groups) use frequencies

• For more complex description of nominal-level variables use crosstabs

• For simple description of interval- or ratio-level variables (items measured on a

scale) use the descriptives command

• For more complex description of interval- or ratio-level variables use the explore

command

Each of these options will be shown in more detail below.

One Important Note: Even though “problem” and “treatment” are nominal-level

variables, they have to be coded as numbers (not as text) in order to use the following

procedures. See the instructions on How to Recode Variables if you need to convert from

a text variable to a number variable.

Here are the various choices. All of them are found in the “Analyze” menu in SPSS,

under the sub-menu for “Descriptive Statistics”:

We will start by looking at frequencies for a nominal-level variable, like “problem” in

this dataset. Open the “Frequencies” dialog box:

Select the variable that you are interested in from the left-hand lists, and use the

arrow button to move it from the left-hand list into the right-hand list.

Use the “Charts” button to see a graphical output of the frequencies on your variable. In

this case, I have selected a histogram. You can also use pie charts or bar charts. If you

aren’t happy with the way your data are displayed (ascending vs. descending order, etc.),

try some of the options found by clicking on the “Format” button.

Hit “OK” in the main dialog box to continue. The output for this variable looks like this: Patient Diagnosis

Frequency Percent Valid Percent Cumulative Percent

Anxiety 5 31.3 31.3 31.3

Depression 5 31.3 31.3 62.5

Eating Disorder 6 37.5 37.5 100.0

Valid

Total 16 100.0 100.0

This is the frequency table, which is the basic output from this procedure. It shows the

number of people in each group, the percent of the total in each group, the percent in each

group as a proportion of just the people with complete data (that’s the “valid percent”),

and the cumulative percent reached as you add each new group to the previous ones

(that’s the “cumulative percent”). Notice that the numeric values have been replaced with

text labels – these were entered in the “values” column in the “variable view” of the data.

3.532.521.510.5

Patient Diagnosis

6

5

4

3

2

1

0

Frequency

Mean = 2.06Std. Dev. = 0.854N = 16

Histogram

Here’s the graphical output: This histogram shows the number of people in each

category.

For a more detailed breakdown of nominal-level data, use the crosstabs command in the

“Analyze/Descriptive Statistics” sub-menu. This is appropriate if you want to consider

two variables in combination – for example, what % of the patients with an anxiety

disorder were treated with cognitive-behavioral therapy?

The “Crosstabs” dialog box lets you enter one variable (or more) as “rows” in a

frequency table, and another variable as the “columns” in the same table.

Use the “Cells” command to get percentages for each row and column:

Hit “OK” in the main dialog box to continue. The output from this procedure looks like

this:

Patient Diagnosis * Type of Therapy Crosstabulation

3 2 5

60.0% 40.0% 100.0%

37.5% 25.0% 31.3%

18.8% 12.5% 31.3%

2 3 5

40.0% 60.0% 100.0%

25.0% 37.5% 31.3%

12.5% 18.8% 31.3%

3 3 6

50.0% 50.0% 100.0%

37.5% 37.5% 37.5%

18.8% 18.8% 37.5%

8 8 16

50.0% 50.0% 100.0%

100.0% 100.0% 100.0%

50.0% 50.0% 100.0%

Count

% within Patient

Diagnosis

% within Type of Therapy

% of Total

Count

% within Patient

Diagnosis


% of Total

Count

% within Patient

Diagnosis


% of Total

Count

% within Patient

Diagnosis


% of Total

Anxiety

Depression

Eating Disorder

Patient

Diagnosis

Total

CBT IPT

Type of Therapy

Total

The central area of this table shows you the total number of patients who have each

individual combination of the various levels of the two variables.

The far-right column shows you the total for each diagnosis, as a percent of all patients.

The bottom-most row slices the data the other way, showing you the total for each type of

treatment, as a percent of all patients.

The interior cells of the table also show you each individual cell’s total as a percentage of

the patients with that particular diagnosis, as a percentage of the patients who received

that particular type of therapy, and as a percentage of the overall total.

For interval- or ratio-level variables (i.e., “scales”), use the descriptives sub-command

on the “Analyze/Descriptive Statistics” menu:

This opens a dialog box where you can select the variable that you want descriptive

statistics on. Make sure that what you are selecting is actually an interval- or ratio-level

variable. Because all of the data are entered as numbers, SPSS will actually calculate

descriptive statistics on any of these variables; but these results are only meaningful for

interval- or ratio-level variables (in this case, just the “score” variable).

As usual, select the variable(s) that you are interested in from the left-hand column, and

move them to the right-hand column. Use the “Options” button to select the specific

descriptive statistics that you are interested in. Usually, good choices include the mean,

standard deviation, maximum, and minimum. If you are concerned about the impact of

outliers on your data, a measure of skewness may also be appropriate.

The output from this procedure looks like this:

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

Symptom Severity (higher = worse) 16 19.00 33.00 24.9375 4.15482

Valid N (listwise) 16

This shows you the specific results for each variable that you entered into the analysis. It

is possible to get a table that gives you basic descriptive statistics for many variables

simultaneously, just by moving them all at once from the left-hand to the right-hand list

in the main dialog box.

Finally, to get more complex descriptive results for an interval- or ratio-level variable,

use the explore command in the “Analyze/Descriptive Statistics” sub-menu.

The following dialog box will appear:

The variable that you are interested in goes into the “dependent” list. For now, we won’t

use the other lists, but we will use the “plots” button to select additional options.

The “plots” dialog box lets you select options for graphical display of the data, including

a stem-and-leaf plot like this:

Frequency Stem & Leaf

1.00 1 . 9

7.00 2 . 0112444

5.00 2 . 55567

3.00 3 . 123

Stem width: 10.00

Each leaf: 1 case(s)

A boxplot like this:

Symptom Severity (higher = worse)

33

30

27

24

21

18

A histogram like this:

33.0030.0027.0024.0021.0018.00

Symptom Severity (higher = worse)

7

6

5

4

3

2

1

0

Frequency

Mean = 24.9375Std. Dev. = 4.15482N = 16

Histogram

Or a plot to test whether the data are normally distributed, like this:

333027242118

Observed Value

2

1

0

-1

-2

Expected Normal

Normal Q-Q Plot of Symptom Severity (higher = worse)

(This result is showing a slight deviation from normality, based on the difference between

the actual data points and the theoretical line they should fit, but probably nothing to

worry about. Some “tests of normality” are found earlier in the printout, and give you an

actual test to determine how far off the data are from a normal distribution. If these tests

are statistically significant [p < .05], you may need to do a data transformation to correct

for non-normality, before you run any statistical tests):

Tests of Normality

Kolmogorov-Smirnov(a) Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Symptom Severity (higher = worse) .181 16 .166 .924 16 .196

a Lilliefors Significance Correction

The “explore” command also gives you descriptive statistics for each variable, with more

extensive results (median, interquartile range, etc.) than the “descriptives” command:

Descriptives

24.9375 1.03870

22.7236

27.1514

24.8194

24.5000

17.263

4.15482

19.00

33.00

14.00

5.50

.669 .564

-.205 1.091

Mean

Lower Bound

Upper Bound

95% Confidence

Interval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

Symptom Severity

(higher = worse)

Statistic Std. Error

One additional feature of the “explore” command is the ability to get separate descriptive

results for different sub-groups within your dataset. The way to do this is by grouping

your data based on some additional variable, which is treated as a “factor” for breaking

down the analyses. Go back to the main dialog box to add one of your nominal-level

variables as a “factor” (the grouping variable always has to be nominal-level):

After you add this factor, the results will show you two separate sets of statistics and

plots – one for just those patients in the CBT group, and a second set for just those

patients in the IPT group.

descriptive statistics in spss - denver, colorado · descriptive statistics in spss when first...

Documents