chapter eleven a primer for descriptive statistics

Chapter Eleven

A Primer for Descriptive Statistics

Descriptive Statistics

• A variety of tools, conventions, and procedures for describing variables and relationships between variables

Measurement is the process of assigning numbers to phenomena according to a set of rules

Levels of MeasurementNominal: involves no underlying continuum;assignment of numeric values arbitrary

Examples: religious affiliation, gender, etc.

Levels of Measurement

Ordinal: implies an underlying continuum;values are ordered but intervals are not equal.

Examples: Community size, Likert items, etc.

Levels of Measurement Cont.

Ratio: involves an underlying continuum;numeric values assigned reflect equal intervals; zero point aligned with true zero.

Examples: weight, age in years, % minority

Data Distributions

• A listing of all the values for any one variable

• The most basic technique for presenting a large data set is to create a frequency distribution table

• A systematic listing of all the values on a variable from the lowest to the highest with the # of times (frequency) each value was observed

Normal Distribution

• A normal distribution roughly follows a bell-shaped curve

• Bimodal distribution (2 peaks eg. male & female body weight)

• Platykurtic distribution (flat & wide, great deal of variability)

• Leptokurtic distribution (peaked, little variability)

Measures of Central Tendency

• A single numeric value that summarizes the data set in terms of its “average” value.

• Eg. the nurse researcher uses the value of 98.6 F or 37 C to describe the average adult body temperature

Measures of Central Tendency

Mean: calculated by summing values anddividing by number of cases

Median: caluculated by ordering a set ofvalues and then using the middle mostvalue (in cases of two middle values, calculated the mean of the two values.

Mode: the most frequently occuring value.

Measures of Dispersion

Range: calculated by substracting lowest valuefrom the highest value in a set of values.

Standard Deviation: a measure reflecting theaverage amount of deviation in a set of values. ___________

_ sd = (X - X)² N - 1

Dispersion Cont.

Variance: this measure is simply thestandard deviation squared.

(X - X)² Variance = sd² = N - 1

Standardizing Data

• To standardize data is to report data in a way that comparisons between units of different size may be made

Standardizing Data

Proportions: represents the part of 1 that someelement represents. A so-called batting averageis actually a proportion because it represents:

BA = Number of Hits

Number at Bats

Percentage: a proportion may be converted to a percentage by multiplying by 100.

If a players batting “average” is .359 we couldconvert that to a percentage by multiplying by100. In this case, the percentage of time theperson gets a hit is 35.9%.

In short, a percentage represents how often something happens per 100 times.

Percentage Change: a measure of how muchsomething has changed over a given time period. Percentage change is:

Time 2 - Time 1 x 100 Time 1

Thus, if there were 25 nurses now compared to 17 five years earlier, the percentage change over the 5 year period would be:

((25 - 17) 17) x 100 = 47.1%

Rates: represent the frequency of somethingfor a standard sized unit. Divorce rates, suicide rates, crime rates are examples. So if we had 104 suicides in a population of757,465 the suicide rate per 100,000 would be calculated as follows:

SR = 104 x 100 = 13.73 757,465

I.e., there are 13.73 suicides per 100,000

Ratio: represents a comparison of one thingto another. So if there are 200 suicides in theU.S. and 57 per 100,000 in Canada, the U.S./Canadian suicide ratio is:

US Suicide Rate = 200 = 3.51Candian Suicide Rate 57

Normal DistributionMuch data in the social and physical worldis “normally distributed”. If it is this meansthat there will be a few low values, manymore clustered toward the middle, and a fewhigh values. Normal distributions are:• symmetrical, bell-shaped curve• mean, mode, and median will be similar•2/3 of cases ± 1 standard deviation of mean

• 95.6 cases ± 2 standard deviations of mean

Normal Distribution Cont.

Z Scores

A Z score represents the distance, in standarddeviation units, of any value in a distribution.

The Z Score formula is as follows: __ Z = X - X sd

Exercise:Exercise:Suppose: Income Mean = $72,000; SD = $18,000 Education Mean = 11 years; SD = 4 years

Subject Income EducationCase 1 80,000 14Case 2 70,000 10Case 3 91,000 19Case 4 56,000 8

Calculation Case 1:Case 1 Z (income) = 80,000 - 72,000 = .44

18,000

Case 1 Z (education) = 14 - 11 = .75 4SES score Case 1 = .44 + .75 = 1.19

Areas Under the Normal Curve

• draw normal curve, include lines to represent problem

• calculate Z score(s) for problem• look up value in Table 11.14• Solve problem, recall that .5 of cases fall

above the mean, .5 below

• convert proportion to percentage, if needed

Exercise:Exercise:

Suppose you wished to know percentage of cases will fall above $100,000 in a sample whose MEAN is $65,000 and the SD is $22,000

Show p. 370 of text

Z = 1.59 100,000 - 65,000 / 22,000

look up in Table 11.14, p 368 = .4441

.5000 - .4441 = .0559 (proportion) x 100 = 5.6% (percentage)

Describing RelationshipsBetween Variables

1. Crosstabular Analysis: used with a nominal dependent variablewe cross-classify the information to show the relation between an independentand a dependent variable a standard table looks like the following:

Table 11.11 Plans to Attend University by Size of Home Community================================================================= Town up Town over University Rural to 5,000 5,000 TOTAL Plans? N % N % N % N % ----------------------------------------------------------------- Plans 69 52.3 44 48.9 102 73.9 215 59.7 No Plans 63 47.7 46 51.1 36 26.1 145 40.3 ___ _____ ___ _____ ___ _____ ___ _____ TOTAL 132 100.0 90 100.0 138 100.0 360 100.0-----------------------------------------------------------------If appropriate, test of significance values entered here.

Rules for Crosstabular Tables:• in table title, name dependent variable first• place dependent variable on vertical axis• place independent on horizontal plane• use clear variable labels• run % figures toward independent variable• report % to one decimal point• statistical data reported below table• interpret by comparing % in categories

of the independent variable

2. Comparing Means

• used when dependent variable is ratio• comparison to categories of independent

variable

• both t-test and ANOVA may be used

Presentation may be as follows:

Mean Heart Rate by Treatment Group

------------------------------------------------------------ Treatment Group Mean Heart Rate Number of Cases ------------------------------------------------------------ Touch Therapy 74.6 78 Routine Treatment 77.1 77 COMBINED MEAN 75.8 155------------------------------------------------------------If appropriate, test of significance values entered here.For Example: F = 3.514 df = 2,153 p = >.05

t Test• T-test is used to determine:

• if the differences in the means of two groups are statistically significant

• with samples under 30

• when comparing 2 groups on a ratio level dependent variable

Analysis of Variance (ANOVA)

• ANOVA is used when 3 or more groups means are compared, or

• When the means for 2 or more groups are compared at 2 or more points in time in a single analysis (e.g., a pre-post experimental design)

• Computes a ratio that compares 2 kinds of variability-with-in group & between-groups variability

3. Correlation

• used with ratio level variables• interest in both the equation and the strength

of the correlation• Y = a + bX is the general equation• the r is the symbol used to report the

strength of the correlation: can vary from-1.0 to + 1.0

Sample Data Set (X) (Y)

2 33 45 47 68 8

Y

8

7

6

5

4

3

2

1

00 1 2 3 4 5 6 7 8 X

•• •

••

Y

8

7

6

5

4

3

2

1

00 1 2 3 4 5 6 7 8 X

•• •

••

Regression Line

Y

8

7

6

5

4

3

2

1

00 1 2 3 4 5 6 7 8 X

•• •

••

a value read here

b value (slope)read here h/b

h

b

Y

8

7

6

5

4

3

2

1

00 1 2 3 4 5 6 7 8 X

•• •

••

Predicted Value

chapter eleven a primer for descriptive statistics

Documents