measures of central tendency - the university of md · pdf filemeasures of central tendency...

Measures of Central Tendency

SOCY601—Alan Neustadtl

Measures of Central TendencyA measure of central tendency is a single number used to represent the “center” of a group of data.Different variables may possess different numerical characteristics. So different measures of central tendency may better summarize the variable. The basic measures are the:

modemedianmean

This class of measures can be calculated on grouped or ungrouped data. The difference is in how the data values are weighted.

The Mode

The mode is the:most frequently occurring value in a group or raw scoresvalue of the group that contains the most cases in grouped data

The Mode—Example

Frequency Distribution of Sex in the 2000 General Social Survey

fRace

2,817Total170Other404Black

2,244White

The Median

The median is defined as the middle value (case) of n values of X objects arranged in order of size.

For an odd number of cases, the middle case will be equal to the case.

For an even number of cases, the middle case will be halfway between the and the case.

2n 1

2n+

12

n +

The Median—Example

66Chase Manhattan Bank569First National Bank470Chemical Bank of New York372Morgan Guaranty Trust275Equitable Life Insurance1

InterlockingDirectoratesNameRank

The Median—Example

61New York Life666Chase Manhattan Bank569First National Bank470Chemical Bank of New York372Morgan Guaranty Trust275Equitable Life Insurance1

InterlockingDirectoratesNameRank

The Median—Grouped Data

$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900

Number of CasesLess Than:FfTrue LimitsStated Limits

Look for the interval containing the median or the case.2n

189 94.52 2n= =


$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900

Number of CasesLess Than:FFTrue LimitsStated Limits

There are 51 cases in this interval. We divide the interval into 51 equal sub-intervals equal to $19.61.

$1,000 $19.6151

=


Then we simply count the sub-intervals from the lower class limit until we come to the median.

We could also get this number by subtracting 81 from 94.5, the location of the median.


2n F

md l if

− = +

i=width of the interval containing the medianf=number of cases in the interval containing the median

l=lower limit of the interval containing the medianF=cumulative frequency corresponding to the lower limit

Where:

The Mean

1 2 NX X XN

µ + +=

…For Populations:

1 2 nX X XXn

+ +=

…For Samples:

The Mean

1

n

ii

XX

n==∑

XX

n= ∑

1X Xn

= ∑

The Mean as the Center of Gravity

Properties of the Mean

The mean has the algebraic property that the sum of the deviations of each score from the mean will always be zero. Symbolically:

( )1

0n

ii

X X=

− =∑

Properties of the Mean

The sum of the squared deviations of each score from the mean is less than the sum of the squared deviations from any other constant (number). Symbolically:

( )2

1

minimumn

ii

X X=

− =∑

Proof that: ( ) 0X X− =∑

( )X X−∑

iX X−∑ ∑

Given:

By distribution, we can rewrite this expression as:

iX nX−∑The mean is a constant. The sum of

a constant is equal to n times that constant. So, we can rewrite this expression as:

Proof that: ( ) 0X X− =∑

iX nX−∑The mean is a constant. The sum of

a constant is equal to n times that constant. So, we can rewrite this expression as:

XX n

n

−

∑∑We also know the basic definition of the mean and can substitute it:

0

XX n

n

X X

−

− =

∑∑

∑ ∑The n’s cancel:

Sum of Squared Deviations About the Mean

The logic of this proof is that if we subtract any number otherthan the mean from each value of X, square that amount, and sum up these values for all values of X, we will get a number that is larger than if we had carried out the same procedure using the mean of X.

( )∑=

−n

ii XX

1

2Let's call this other number “X-bar prime” and start with the original expression:

( )iX X ′−However, we will strip out the

summation and exponentiation, and substitute “X-bar prime” for the mean:

Sum of Squared Deviations About the MeanNow we can add and simultaneously subtract the actual mean to

and from this expression. This has no "net" effect on the expression. This is equal to:

( ) ( ) ( )XXXXXX i ′−+−=′−

Squaring both sides of the expression brings us a step closer tothe original equation:

( ) ( ) ( ) 22iX X X X X X ′ ′− = − + −

Because when expanded this is equal to:

( )2 2 22 ,a b a ab b+ = + +

Sum of Squared Deviations About the Mean

( ) ( ) ( )( ) ( )222 2 XXXXXXXXXX ii ′−+′−−+−=′−

Because when expanded this is equal to:

( )2 2 22 ,a b a ab b+ = + +

( ) ( ) ( )( ) ( )( ) ( ) ( ) ( )( ) ( )( ) ( )

2 2 2

2 2

2 2

2 2

2

2

i i

i i

i

i

X X X X X X X X X X

X X X X X X X X

X X X X

X X n X X

′ ′ ′− = − + − − + −

′ ′= − + − − + −

′= − + −

′= − + −

∑ ∑ ∑ ∑∑ ∑ ∑∑ ∑∑

Now, add the summation symbol back in on both sides of the expression and with a little bit of algebraic manipulation we get:

The Mean from Grouped Data

n

XX

n

ii∑

== 1

∑∑==

i

n

iii

w

XwX 1

Ungrouped Data Grouped Data

The only difference in these formulas is the weight, wi.With ungrouped data, the weight is implicitly equal to one.

$41,024.61$51,195.08Averages$189,574,722.994,621$307,170.48Sums

$4,063,541.41149$27,272.09W/O Stock$3,034,003.2856$54,178.63Cooperative

$52,848,878.40852$62,029.20T/M/H$18,231,445.881,318$13,832.66Non-Connected$41,832,891.52371$112,757.12Labor$69,563,962.501,875$37,100.78Corporations

WeightedNContributionsType of PAC

The Mean from Grouped Data—Example


average=7.6991511964322%30-3481273%25-29

110225%20-241871711%15-192401220%10-14161723%5- 972236%0- 4

Weighted by MidpointMidpoint

Percentof Men

Annual Income

Annual Income of American Men in 1975

∑∑==

i

n

iii

w

XwX 1


mean≈7.7

Percentiles

A percentile is the outcome or score below which a given percentage of observations fall,

( )( )i i pi p i

p

p n cP L W

f −

= +

Wi=the width of the interval containing Pi; W=Up-Lp

fp=the frequency in the interval containing the ith percentile

cp=the cumulative frequency up to but not including the interval containing Pi

n=the total number of observationspi= the ith percentile written as a proportion (e.g. 75th = 0.75)Lp=the true lower limit of the interval containing the ith percentilePi=the score if the ith percentileWhere:

Percentiles

$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900


( )( )50

.5 189 81$4,950 1,000 $5,214.70

51P

−= + =

Percentiles

$7,950189216,950 - 7,9507,000 - 7,900$6,950168365,950 - 6,9506,000 - 6,900$5,950132514,950 - 5,9505,000 - 5,900$4,95081383,950 - 4,9504,000 - 4,900$3,95043262,950 - 3,9503,000 - 3,900$2,95017171,950 - 2,9502,000 - 2,900


( )( )75

.75 189 132$5,950 1,000 $5,220.83

36P

−= + =

measures of central tendency - the university of md · pdf filemeasures of central tendency...

Documents