Dr. G. Johnson, www.ResearchDemystified.org
1
Data Analysis for Description
Research Methods for Public Administrators
Dr. Gail Johnson
Dr. G. Johnson, www.ResearchDemystified.org
2
Simple But Concrete
The Children’s Defense Fund reports on each day in America: Four children are killed by abuse or neglect Five children or teens commit suicide Eight children or teens are killed by firearms Seventy-five babies die before their 1st birthday
㹈 http://www.childrensdefense.org/child-research-data-publications/each-day-in-america.html
Dr. G. Johnson, www.ResearchDemystified.org
3
Simple But Concrete
A million seconds = 11 ½ days A billion seconds= 32 years A trillion seconds= 32,000 years
Dr. G. Johnson, www.ResearchDemystified.org
4
Simple But Concrete
A $700 billion bailout translates into $2,333 IOU from every person in the U.S.
Or—using a different metric-it comes to $45 per week for each person in the U.S.
Going one step further, it comes out to $6 a day Framing: are you willing to pay $6 a day to have a
functioning financial system?
Read more: http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqek0mRZ
Dr. G. Johnson, www.ResearchDemystified.org
5
Going Too Far?
Six dollars a day is also 25 cents an hour, or less than half a penny a minute.
Framing: Would you be willing to pay less than half a penny a minute?
Key Point: Does the comparison point make a difference in what you would be willing to pay?
Read more: http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqf9HSQ9
Dr. G. Johnson, www.ResearchDemystified.org
6
Common Descriptive Analysis
Counts: how many Decennial census
Percents Women earned 77% of what men earned in
2006, up from 59% in 1970 Parts of a whole
Percents (75%) and proportions (.75 or three-quarters)
Dr. G. Johnson, www.ResearchDemystified.org
7
Common Descriptive Analysis
But be mindful of “bigger pie” distortions when working with percents and proportions If the pie grows much faster than the slice, the slice will
appear relatively smaller as a percent even though it still grew
Best example is budget deficit as a percent of the GDP: if GDP grows much faster than the budget deficit, it will appear smaller even though it has also grown.
Dr. G. Johnson, www.ResearchDemystified.org
8
Common Descriptive Analysis
Rates: number of occurrences that are standardized Deaths of infants per 100,000 births Crop yields per acre Crime rates
Rates provide an apples-to-apples comparison between places of different size or populations
Dr. G. Johnson, www.ResearchDemystified.org
9
Common Descriptive Analysis
Ratio: numbers presented in relationship to each other Student to teacher ratio: 15:1 Divide number of students by the number of
teachers 1,500 students and 45 teachers equals a 33 to 1
student to teacher ratio (1,500 divided by 45)
Dr. G. Johnson, www.ResearchDemystified.org
10
Common Descriptive Analysis
Rates of change Percentage change from one time period to
the other For example: The budget increased 23% from FY
2006 to FY 2007.
Three Steps:1. Divided newest data by oldest data2. Subtract 13. Multiple by 100 to get the percentage change
Dr. G. Johnson, www.ResearchDemystified.org
11
Common Descriptive Analysis
Rates of change Percentage change from one time period to
the other For example: The budget increased 23% from FY
2006 to FY 2007.
Three Steps:1. Divided newest data by oldest data2. Subtract 13. Multiple by 100 to get the percentage change
Dr. G. Johnson, www.ResearchDemystified.org
12
Common Descriptive Analysis
Rates of change: applied What was the rate of change in 1992 budget
deficit as compared to 1980.1. Divide 1992 budget deficit ($290 billion) by the 1980
budget deficit ($73.8 billion) = 3.93
2. 3.93-1 – 2.93
3. 2.93 x 100 = 293 percent The budget deficit in current dollars (meaning not
controlled for by inflation) increased 293 percent.
Dr. G. Johnson, www.ResearchDemystified.org
13
Common Descriptive Analysis
Frequency Distributions Number and percents of a single variable
Dr. G. Johnson, www.ResearchDemystified.org
14
In The News: Women Now Are Majority of College Graduates
Dr. G. Johnson, www.ResearchDemystified.org
15
Interpretation?
How would you interpret these percentages in the comparative trend analysis?
Are you surprised by the changes over time?
Why or why not?
Dr. G. Johnson, www.ResearchDemystified.org
16
Frequency and Percent Distributions Survey data: analyzed by distributions How many men and women are in the program?
Distribution of Respondents by Gender:
Male Female TotalNumber Percent Number Percent Number
100 33% 200 67% 300
Dr. G. Johnson, www.ResearchDemystified.org
17
Frequency and Percent Distributions How many men and women are in the
program?
Write-up:
Of the 300 people in this program, 67% are women and 33% are men.
Dr. G. Johnson, www.ResearchDemystified.org
18
Different Analysis Tools For Different Situations Frequency/percent distributions make sense when
working with nominal and ordinal data But frequency/percent distributions for
interval/ratio data can result in a ridiculously long table that is impossible to interpret If I ask 500 people how many years they lived in an
area, I can can get a wide range of answers. For this type of data, I would then look at means,
medians, modes to describe that variable.
Dr. G. Johnson, www.ResearchDemystified.org
19
Describing Distributions
Central tendency Means, Medians, Modes How similar are the characteristics?
Example: Use when we want to describe the similarity of the ages of a group of people.
Dispersion Range, standard deviation How dissimilar are the characteristics? Example: how much variation in the ages?
Dr. G. Johnson, www.ResearchDemystified.org
20
Measures of Central Tendency
The 3-Ms: Mode, Median, Mode.
Mode: most frequent response. Median: mid-point of the distribution Mean: arithmetic average.
Dr. G. Johnson, www.ResearchDemystified.org
21
Basic Concepts Revisited
Levels of Measurement Nominal Level Data: names, categories
Eg. Gender, religion, state, country Ordinal Level Data: data with an order, going from low
to high Eg. Highest educational degree, income categories, agree—
disagree scales Interval Level Data: numbers but no zero
Eg. IQ scores, GRE scores Ratio Level Data: real numbers with a zero point
Eg. Age, weight, income, temperature
Dr. G. Johnson, www.ResearchDemystified.org
22
Which Measure of Central Tendency to Use?
Depends on the type of data you have: Nominal data: mode Ordinal data: mode and median Interval/ratio: mode, median and
mean
Dr. G. Johnson, www.ResearchDemystified.org
23
For Interval Or Ratio Data:Which One To Use?
Concept of the Normal Distribution—also called the bell-shape curve In a normal distribution, the mean, median and
mode should be very similar
Use mean if distribution is normal Use median if distribution is not normal
Dr. G. Johnson, www.ResearchDemystified.org
24
Normal Distribution: Bell-Shaped Curve
http://en.wikipedia.org/wiki/Normal_distribution
Mean
Dr. G. Johnson, www.ResearchDemystified.org
25
Office contributions
$10, $ 1, $.50, $.25, $.25. The mean is $2.40 (add up and divide by 5) The median is .50 (the mid-point of this
distribution) The mode is .25 (the most frequently
reported contribution) Best description of contributions is median.
Dr. G. Johnson, www.ResearchDemystified.org
26
Salaries
Assume that you had 11 teachers. 10 teachers earned $21,000 per year and one earned $1,000,000.
What would be the best measure to describe this data?
Dr. G. Johnson, www.ResearchDemystified.org
27
Salaries
The average salary would be $110,000. The median and mode is $21,000. The curve would be positively skewed, i.e.
Mean higher than Mode and Median The median would do the best job at
describing the center the salaries
Dr. G. Johnson, www.ResearchDemystified.org
28
Skewed Data
1. negative skew: The mass of the distribution is concentrated on the right of the figure. It has relatively few low values. The distribution is said to be left-skewed.
2. positive skew: The mass of the distribution is concentrated on the left of the figure. It has relatively few high values. The distribution is said to be right-skewed. The $ million salary pulls the average up.
Wikipedia: http://en.wikipedia.org/wiki/Skewness
Dr. G. Johnson, www.ResearchDemystified.org
29
Skewed Distributions:Negative and Positive
http://en.wikipedia.org/wiki/File:Skewness_Statistics.svg
Dr. G. Johnson, www.ResearchDemystified.org
30
Using Means With Survey Data?
Survey data is typically coded using numbers: Gender: Male is coded 1
Female is coded 2 It is faster and less error-prone to code variables using
numbers
But the computer could treat these as numbers and will compute a mean if asked How would you interpret a mean for gender of 1.6? Or
a mean for religion of 2.8
Dr. G. Johnson, www.ResearchDemystified.org
31
Do Not Use Means With Nominal Data Gender (and religion) are nominal variables
and should only be reported in terms of distributions: Frequency distribution: 10 men and 12 women Percentage distribution: 45% men and 55%
women
Dr. G. Johnson, www.ResearchDemystified.org
32
Using Means With Survey Data?
Scales (very satisfied<->very dissatisfied are ordinal scales But they coded into the computer using numbers 5 for very satisfied<->1 for very dissatisfied
The computer will compute a mean if asked: The mean was 3.8 for job satisfaction. The mean satisfaction with faculty performance was
4.2 on a scale from 1-5 Grade-point averages are an example of means based
on an ordinal scale (A—F (scale of 0-4)
Dr. G. Johnson, www.ResearchDemystified.org
33
Using Means With Ordinal Data?
There is disagreement in the field—partly based on academic discipline-about whether to use means with ordinal data.
Things like GPA or faculty ratings are often shown as means
It is often helpful for researchers to look at the means initially when working with a lot of data—researchers are looking for unusually high or low means.
It is also true that sometimes it is easier to show the means than the percentage distribution for every variable
Question 2006 2007 2009 Percent reporting 4 or 5 (positive)
I know what is expected of me at work
4.28 4.25 4.31 87%
I receive recognition for a job well done.
3.34 3.43 3.47 54%
I have the tools and resources I need to do my job effectively.
3.76 3.75 3.80 70%
Washington Employee Survey
Dr. G. Johnson, www.ResearchDemystified.org
35
Using Means With Ordinal Data?
But most people are more familiar with polling results, which report percent distributions. We tend to see something like 55% report supporting
cap and trade legislation rather than a mean of 3.4 on a scale of 5 (for) to 1 (against).
The decision about whether means or percent distributions are used to report ordinal data should reflect audience preference and ease of audience understanding. Not an ideological stance
Dr. G. Johnson, www.ResearchDemystified.org
36
Measures of Dispersion
Used with Interval and Ratio Data Simple Description: The Range Reported salaries ranged from $21,000 to $1,000,000 Ages in the group ranged from 18 to 32
Standard Deviation Measures the dispersion in terms of the the distance
from the mean Small standard deviation: not much dispersion Large standard deviation: lots of dispersion
Dr. G. Johnson, www.ResearchDemystified.org
37
Standard Deviation
Normal Distribution: Bell-shaped curve 68% of the variation is within 1 standard
deviation of the mean 95% of the variation is within 2 standard
deviations of the mean
Normal Distribution
Mean Standard deviationsStandard deviations
95% of the distribution
Dr. G. Johnson, www.ResearchDemystified.org
39
Applying the Standard Deviation
Average test score= 60. The standard deviation is 10. Therefore, 95% of the scores are
between 40 and 80. Calculation: 60+20=80 60-20=40.
Dr. G. Johnson, www.ResearchDemystified.org
40
Standard Deviation with Means
The Standard Deviation is used with interval/ratio level data
Typically, standard deviations are presented with means so the reader can tell whether there is a lot or a little variation in the distribution.
Note: the standard deviation is sometimes used in other statistical calculations, such as z-scores and confidence intervals
Dr. G. Johnson, www.ResearchDemystified.org
41
Describing Two Variables Simultaneously Cross-tabulations (cross tabs, contingency
tables) Used when working with nominal and
ordinal data It provides great detail
Dr. G. Johnson, www.ResearchDemystified.org
42
Describing Two Variables SimultaneouslyDetail about the race and gender of the 233
people in the workplace:
Race Men Women
White 21% 31%
Black 15% 11%
Other 14% 6%
Dr. G. Johnson, www.ResearchDemystified.org
43
Describing Race and Gender
Write-up:
Of the 233 employees, the greatest proportion are white women (31%) followed by white men (21%). Fifteen percent of the employees are black men and 11% are black women, and 14% are men of other race identity and 6% are women of other race identity.
Dr. G. Johnson, www.ResearchDemystified.org
44
Describing Two Variables SimultaneouslyComparison of Means
Used when one variable is nominal or ordinal, and the second variable is interval/ration level of measurement.
Examples: Men in the MPA program have a GPA of 3.2 as
compared to 3.0 for women. The mean overall citizen satisfaction score is 4.2 this
year as compared to 3.5 last year. Mean salary for women was $35,000 as compared to
$38,000 for men last year.
Dr. G. Johnson, www.ResearchDemystified.org
45
Key Points
These simple descriptive analysis techniques can be effective: Illuminates, provides feedback, informs and might
persuade. The math is generally straight-forward. Descriptive data is generally easy for many people
understand as compared to more complex statistics (stay tuned).
Complex statistics are not inherently better!
Dr. G. Johnson, www.ResearchDemystified.org
46
The Tough Question
If descriptive data is distorted, it is tends to be in the way things are being counted and measured. The math is usually correct. Example: The federal debt is often presented just in
terms of percent of debt held by the public but the total debt includes money borrowed from other government funds.
As a result, the debt looks smaller than what it actually is.
Dr. G. Johnson, www.ResearchDemystified.org
47
The Tough Question
If descriptive data is distorted, it is tends to be in the way things are being counted and measured. The math is usually correct Example. Health insurance profits look
different when calculated as a percent of corporate revenue than when calculated as a percent of all spending on health care. It will look smaller when presented as a percent of
all health care spending which is larger than just corporate insurance revenue.
Dr. G. Johnson, www.ResearchDemystified.org
48
The Tough Question
Always ask: what exactly is being measured and counted?
Consider whether there are other ways of counting and other ways of doing the analysis that might yield different results (or create different perceptions).
Do the choices reflect a political agenda?
Creative Commons
This powerpoint is meant to be used and shared with attribution
Please provide feedback If you make changes, please share freely
and send me a copy of changes: [email protected]
Visit www.creativecommons.org for more information