measures of central tendency - the university of md · pdf filemeasures of central tendency...
TRANSCRIPT
Measures of Central Tendency
SOCY601—Alan Neustadtl
Measures of Central TendencyA measure of central tendency is a single number used to represent the “center” of a group of data.Different variables may possess different numerical characteristics. So different measures of central tendency may better summarize the variable. The basic measures are the:
modemedianmean
This class of measures can be calculated on grouped or ungrouped data. The difference is in how the data values are weighted.
The Mode
The mode is the:most frequently occurring value in a group or raw scoresvalue of the group that contains the most cases in grouped data
The Mode—Example
Frequency Distribution of Sex in the 2000 General Social Survey
fRace
2,817Total170Other404Black
2,244White
The Median
The median is defined as the middle value (case) of n values of X objects arranged in order of size.
For an odd number of cases, the middle case will be equal to the case.
For an even number of cases, the middle case will be halfway between the and the case.
2n 1
2n+
12
n +
The Median—Example
66Chase Manhattan Bank569First National Bank470Chemical Bank of New York372Morgan Guaranty Trust275Equitable Life Insurance1
InterlockingDirectoratesNameRank
The Median—Example
61New York Life666Chase Manhattan Bank569First National Bank470Chemical Bank of New York372Morgan Guaranty Trust275Equitable Life Insurance1
InterlockingDirectoratesNameRank
The Median—Grouped Data
$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900
Number of CasesLess Than:FfTrue LimitsStated Limits
Look for the interval containing the median or the case.2n
189 94.52 2n= =
The Median—Grouped Data
$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900
Number of CasesLess Than:FFTrue LimitsStated Limits
There are 51 cases in this interval. We divide the interval into 51 equal sub-intervals equal to $19.61.
$1,000 $19.6151
=
The Median—Grouped Data
Then we simply count the sub-intervals from the lower class limit until we come to the median.
We could also get this number by subtracting 81 from 94.5, the location of the median.
The Median—Grouped Data
2n F
md l if
− = +
i=width of the interval containing the medianf=number of cases in the interval containing the median
l=lower limit of the interval containing the medianF=cumulative frequency corresponding to the lower limit
Where:
The Mean
1 2 NX X XN
µ + +=
…For Populations:
1 2 nX X XXn
+ +=
…For Samples:
The Mean
1
n
ii
XX
n==∑
XX
n= ∑
1X Xn
= ∑
The Mean as the Center of Gravity
Properties of the Mean
The mean has the algebraic property that the sum of the deviations of each score from the mean will always be zero. Symbolically:
( )1
0n
ii
X X=
− =∑
Properties of the Mean
The sum of the squared deviations of each score from the mean is less than the sum of the squared deviations from any other constant (number). Symbolically:
( )2
1
minimumn
ii
X X=
− =∑
Proof that: ( ) 0X X− =∑
( )X X−∑
iX X−∑ ∑
Given:
By distribution, we can rewrite this expression as:
iX nX−∑The mean is a constant. The sum of
a constant is equal to n times that constant. So, we can rewrite this expression as:
Proof that: ( ) 0X X− =∑
iX nX−∑The mean is a constant. The sum of
a constant is equal to n times that constant. So, we can rewrite this expression as:
XX n
n
−
∑∑We also know the basic definition of the mean and can substitute it:
0
XX n
n
X X
−
− =
∑∑
∑ ∑The n’s cancel:
Sum of Squared Deviations About the Mean
The logic of this proof is that if we subtract any number otherthan the mean from each value of X, square that amount, and sum up these values for all values of X, we will get a number that is larger than if we had carried out the same procedure using the mean of X.
( )∑=
−n
ii XX
1
2Let's call this other number “X-bar prime” and start with the original expression:
( )iX X ′−However, we will strip out the
summation and exponentiation, and substitute “X-bar prime” for the mean:
Sum of Squared Deviations About the MeanNow we can add and simultaneously subtract the actual mean to
and from this expression. This has no "net" effect on the expression. This is equal to:
( ) ( ) ( )XXXXXX i ′−+−=′−
Squaring both sides of the expression brings us a step closer tothe original equation:
( ) ( ) ( ) 22iX X X X X X ′ ′− = − + −
Because when expanded this is equal to:
( )2 2 22 ,a b a ab b+ = + +
Sum of Squared Deviations About the Mean
( ) ( ) ( )( ) ( )222 2 XXXXXXXXXX ii ′−+′−−+−=′−
Because when expanded this is equal to:
( )2 2 22 ,a b a ab b+ = + +
( ) ( ) ( )( ) ( )( ) ( ) ( ) ( )( ) ( )( ) ( )
2 2 2
2 2
2 2
2 2
2
2
i i
i i
i
i
X X X X X X X X X X
X X X X X X X X
X X X X
X X n X X
′ ′ ′− = − + − − + −
′ ′= − + − − + −
′= − + −
′= − + −
∑ ∑ ∑ ∑∑ ∑ ∑∑ ∑∑
Now, add the summation symbol back in on both sides of the expression and with a little bit of algebraic manipulation we get:
The Mean from Grouped Data
n
XX
n
ii∑
== 1
∑∑==
i
n
iii
w
XwX 1
Ungrouped Data Grouped Data
The only difference in these formulas is the weight, wi.With ungrouped data, the weight is implicitly equal to one.
$41,024.61$51,195.08Averages$189,574,722.994,621$307,170.48Sums
$4,063,541.41149$27,272.09W/O Stock$3,034,003.2856$54,178.63Cooperative
$52,848,878.40852$62,029.20T/M/H$18,231,445.881,318$13,832.66Non-Connected$41,832,891.52371$112,757.12Labor$69,563,962.501,875$37,100.78Corporations
WeightedNContributionsType of PAC
The Mean from Grouped Data—Example
The Mean from Grouped Data—Example
average=7.6991511964322%30-3481273%25-29
110225%20-241871711%15-192401220%10-14161723%5- 972236%0- 4
Weighted by MidpointMidpoint
Percentof Men
Annual Income
Annual Income of American Men in 1975
∑∑==
i
n
iii
w
XwX 1
The Mean from Grouped Data—Example
mean≈7.7
Percentiles
A percentile is the outcome or score below which a given percentage of observations fall,
( )( )i i pi p i
p
p n cP L W
f −
= +
Wi=the width of the interval containing Pi; W=Up-Lp
fp=the frequency in the interval containing the ith percentile
cp=the cumulative frequency up to but not including the interval containing Pi
n=the total number of observationspi= the ith percentile written as a proportion (e.g. 75th = 0.75)Lp=the true lower limit of the interval containing the ith percentilePi=the score if the ith percentileWhere:
Percentiles
$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900
Number of CasesLess Than:FfTrue LimitsStated Limits
( )( )50
.5 189 81$4,950 1,000 $5,214.70
51P
−= + =
Percentiles
$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900
Number of CasesLess Than:FfTrue LimitsStated Limits
( )( )75
.75 189 132$5,950 1,000 $5,220.83
36P
−= + =