central tendency - kevin dooley's ultimate web page!dooleykevin.com/psyc60.2.pdf · central...
TRANSCRIPT
Central Tendency
Central Tendency n A single summary score that best describes the central
location of an entire distribution of scores. n Measures of Central Tendency:
n Mean n The sum of all scores divided by the number of scores.
n Median n The value that divides the distribution in half when
observations are ordered. n Mode
n The most frequent score.
Central Tendency Example: Mode
n 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891
n Mode: most frequent observation n Mode(s) for hotel rates:
n 264, 317, 384
Pros and Cons of the Mode
n Pros n Good for nominal data. n Good when there are
two “typical” scores. n Easiest to compute
and understand. n The score comes from
the data set.
n Cons n Ignores most of the
information in a distribution.
n Small samples may not have a mode.
Central Tendency Example: Median n 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264,
280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891
n The median is the middle value when observations are ordered. n To find the middle, count in (N+1)/2 scores when
observations are ordered lowest to highest. n Median hotel rate:
n (35+1)/2 = 18 n 317
Finding the median with an even number of scores. n 2, 2, 3, 5, 6, 7, 7, 7, 8, 9 n With an even number of scores, the median is the
average of the middle two observations when observations are ordered. n Find the average of the N/2 and the (N/2)+1 score.
n N/2 = 5th score, (N/2)+1 = 6th score n Add middle two observations and divide by two.
n (6+7)/2 = 6.5 n Median is 6.5
Pros and Cons of Median
n Pros n Not influenced by
extreme scores or skewed distributions.
n Good with ordinal data. n Easier to compute than
the mean.
n Cons n May not exist in the
data. n Doesn’t take actual
values into account.
Mean: the average of all the scores
mean=∑x N eg. 3 5 1 2 3 5 4 6 7 5 mean = 3+5+1+2+3+5+4+6+7+5 10 = 4.1
*the most commonly used measure of central tendency *the balance point of a distribution (illustrated next slide)
Mean
n Is the balance point of a distribution.
Mean
n Population
n Sample NXΣ
=µ
nXX Σ
=
“mu”
“X bar”
“sigma”, the sum of X, add up all scores
“n”, the total number of scores in a sample
“N”, the total number of scores in a population
“sigma”, the sum of X, add up all scores
Central Tendency Example: Mean
n 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891
n Mean hotel rate:
60.37135
13005==X
nXX Σ
=
Mean hotel rate: $371.60
Pros and Cons of the Mean n Pros
n Mathematical center of a distribution.
n Just as far from scores above it as it is from scores below it.
n Good for interval and ratio data.
n Does not ignore any information.
n Inferential statistics is based on mathematical properties of the mean.
n Cons n Influenced by extreme
scores and skewed distributions.
n May not exist in the data.
The effect of skew on average.
n In a normal distribution, the mean, median, and mode are the same.
n In a skewed distribution, the mean is pulled toward the tail.
Which average? n Each measure contains a different kind of
information. n For example, all three measures are useful for
summarizing the distribution of American household incomes.
n In 1998, the income common to the greatest number of households was $25,000.
n Half the households earned less than $38,885. n The mean income was $50,600.
n Reporting only one measure of central tendency might be misleading and perhaps reflect a bias.
Describing Data with Tables & Graphs
Descriptive Statistics
n The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way. n What is the pattern of scores over the range of
possible values? n Where, on the scale of possible scores, is a point
that best represents the set of scores? n Do the scores cluster about their central point or
do they spread out around it?
What is the pattern of scores?
n Graphs often make it easier to see certain characteristics and trends in a set of data. n Graphs for quantitative data.
n Stem and Leaf Display n Histogram n Frequency Polygon
n Graphs for qualitative data. n Bar Chart n Pie Chart
number/count
class/category
Histogram
n Consists of a number of bars placed side by side. n The width of each bar indicates the interval
size. n The height of each bar indicates the frequency
of the interval. n There are no gaps between adjacent bars.
n Continuous nature of quantitative data.
Tables: Frequency Distributions n (def) organizes raw data or observations that
have been collected by showing how often an observation occurs in each class
n For quantitative data n Ungrouped: list all possible scores that occur,
and then indicate how often each score occurs n (Maximizes information about individual scores)
n Grouped: combine all possible scores into classes and indicate how often each score occurs within a class
n (Easier to see patterns in data, but lose information bout individual scores)
n Do not use if you have more than 20 classes!
Organize these depression scores! (ranging from 1 to 10)
1 4 4 2 7 3 7 9 1 2 5 2 1 1 4 5
Ungrouped 1. Each observation should be included in only one
class 2. List all classes, even those with zero frequencies score f
1 8 2 6 3 6 4 4 5 4 6 0 7 2 8 0 9 1 10 1 Total 32
1 3 3 2 10 5 5 4 1 2 3 2 1 1 3 3
Organize these depression scores! (ranging from 1 to 10)
1 4 4 2 7 3 7 9 1 2 5 2 1 1 4 5
Grouped 1. Find the lowest and highest scores 2. Classes should have (roughly) equal intervals 3. Each observation should be included in only one
class 4. List all classes, even those with zero frequencies
score f 1-3 (no depression) 20 4-6 (moderate depression)
8
7-10 (high depression) 4 Total 32
1 3 3 2 10 5 5 4 1 2 3 2 1 1 3 3
Tables: Relative Frequency Distributions n (def) a frequency distribution that also
includes the proportion of each group n Divide the frequency of each class by the
total frequency
score f Relative f 1-3 (no depression) 20 .625 4-6 (moderate depression)
8 .25
7-10 (high depression) 4 .125 Total 32 1.00
20/32 = .625
Tables: Cumulative Frequency Distributions
n (def) a frequency distribution that also includes the total number of observations in each class and all lower-ranked classes
n Cumulative f: Add the frequency of each class to the sum of all classes ranked before it
n Cumulative %: Divide the cumulative frequency by the total sample (i.e., percentile rank)
score f Relative f
Cumulative f
Cumulative %
1-3 (no depression) 20 .625 20 62.5% 4-6 (moderate depression)
8 .25 28 87.5%
7-10 (high depression) 4 .125 32 100.0% Total 32 1.00
20 + 8 = 28 28/32 = .875
Graphs for quantitative data: Histogram n Consists of bars placed side by side (to
represent continuity of data), the width of each bar indicates a single interval, the height indicates the frequency of the interval
0!2!4!6!8!
10!
1! 2! 3! 4! 5! 6! 7! 8! 9! 10!
Freq
uenc
y!
Level of Depression!
Outliers
n (def) a very extreme score
n Is it accurate? n Should you segregate it from your summary? n Will it enhance your understanding?
Graphs: Shape
Review! Assume all scores go from low to high
n Draw and describe the shape: n IQ scores for the general population n Attractiveness scores for a bunch of supermodels n 1st grader’s math scores on a college-level exam n Annual income level for a community consisting of
equal numbers of relatively poor and relatively wealthy people
Graphs for quantitative data: Frequency Polygons
n A line graph that emphasizes the continuity of continuous variables
n Uses a single point rather than a bar
0!2!4!6!8!
10!
1! 2! 3! 4! 5! 6! 7! 8! 9! 10!
Freq
uenc
y!
Level of Depression!
0!2!4!6!8!
10!
1! 2! 3! 4! 5! 6! 7! 8! 9! 10!
Freq
uenc
y!
Level of Depression!
Graphs for quantitative data: Stem & Leaf n A display for sorting data on the basis of
leading and trailing digits
0 1 2 3 4 5 6 7
Stem Leaf
001445679 4446789 134455556
2466
357 467
9
0339
14 24 36 73 52 35 15 24 35 33 34 66 11 14 54 56 24 35 75 43 17 67 26 31 40 29 10 43 27 49 16 9 35 77 28 34 19 64 10 56
Raw data
Histogram
Las Vegas Hotel Rates
0
1
2
3
4
5
6
7
8
9
0-99
100-199
200-299
300-399
400-499
500-599
600-699
700-799
800-899
Rates
Fre
qu
en
cy
hotel rates
Shapes of Histograms
Frequency Polygons n Uses a single point rather than a bar
0
5
10
15
20
25
30
35
35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99
Range
Frequency
Bar Graph
Pie Graph
Misleading Graphs