statistics- landmark summaries interpreting typical values and percentiles
TRANSCRIPT
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 1/31
Slide
4-1
2/10/2012
Chapter 4
Landmark Summaries:
Interpreting Typical Values and
Percentiles
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 2/31
Slide
4-2
2/10/2012
Average or Mean
Add the data, divide by n or N (the number of
elementary units)
Divides total equally. The only such summary
A representative, central number (if data set is
appr oximately normal)
Summation notation
± 7 is capital Greek sigma
n
X X X X
n!
...21
N
X X X N
!Q...21
Sample average
Po pulation average
§!
!
n
i
i X n
X
1
1§!
!Q
N
i
i X N 1
1
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 3/31
Slide
4-3
2/10/2012
Example: Number of Defects
Def ects measured for each of 10 pr oduction lots
4, 1, 3, 7, 3, 0, 7, 14, 5, 9
0
2
0 5 10 15 20
Def ects per lot
F r e q u e n c y ( l o t s )
Average is 5.1
def ects p
er lot
Fig 4.1.1
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 4/31
Slide
4-4
2/10/2012
Median
Also summarizes the data
The middle one
± Put data in order
± Pick middleo
ne (o
r average middle two
if n is even)
± Median (9, 4, 5) = Median(4, 5, 9) = 5
± Median (9, 4, 5, 7) = Median (4, 5, 7, 9) = = 6
R ank
of the median is (1+n)/2
± If n=3, rank is (1+3)/2 = 2
± If n=4, rank is (1+4)/2 = 2.5 (so average 2nd and 3rd)
± If n=262, rank is (1+262)/2 = 131.5
5+7
2
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 5/31
Slide
4-5
2/10/2012
Median (continued)
A representative, central number
± If data set has a center
Less sensitive to outliers than the average
For skewed data, represents the ³typical case´ better than the average does
± e.g., incomes
Average income for a country equally divides the total, which
may include some very high incomes
Median income chooses the middle person (half earn less, half
earn more), giving less inf luence to high incomes (if any)
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 6/31
Slide
4-6
2/10/2012
Example: Spending
Customers plan to spend ($thousands)
3.8, 1.4, 0.3, 0.6, 2.8, 5.5, 0.9, 1.1
R ank ordered f r om smallest to largest
0.3, 0.6, 0.9, 1.1, 1.4, 2.8, 3.8, 5.51 2 3 4 5 6 7 8
Median is (1.1+1.4)/2 = 1.25 ± Smaller than the average, 2.05
Due to slight skewness?
R ank of median
= (1+8)/2 = 4.5
0 1 2 3 4 5
3 1 8 8 56 4
9
Median Average
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 7/31
Slide
4-7
2/10/2012
Example: The Crash of 1987
Dow-Jones Industrials, stock-price changes as
each stock began trading that f atef ul morning
Fairly normal
Mean and median are similar
Fig 4.1.2
0
5
-20% -10% 0%
Percent change at o pening
F r e q u e n c y
Average = -8.2%
Median = -8.6%
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 8/31
Slide
4-8
2/10/2012
Example: Incomes
Personal income of 100 peo ple
Average is higher than median due to skewness
Fig 4.1.3
0
10
20
30
40
50
$0 $100,000 $200,000 Income
Average = $38,710
Median = $27,216
F r e q u e n c y
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 9/31
Slide
4-9
2/10/2012
Mode
Also summarizes the data
Most common data value
± Middle of tallest histogram bar
Pr o blems:
± Depends on how you draw histogram (bin width)
± Might be more than one mode (two tallest bars)
Good if most data values are ³correct´
Good for nominal data (e.g., elections)
Mode
Mode
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 10/31
Slide
4-10
2/10/2012
Normal Distribution
Average, median, and mode are identical
± If the data come f r om a normal distribution
Average, median, and modeare identical
in the case of a normal distribution
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 11/31
Slide
4-11
2/10/2012
Skewed Distribution
Average, median, and mode are different
± The f ew large (or small) values inf luence the mean
more than the median
± The highest point is not in the center
Average
Median
Mode
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 12/31
Slide
4-12
2/10/2012
Which summary to use?
Average
± Best for normal data
± Preserves totals
Median ± Good for skewed data or data with outliers, pr ovided
you do not need to preserve or estimate total amounts
Mode
± Best for categories (nominal data). ± The mode is the only summary computable for nominal
data!
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 13/31
Slide
4-13
2/10/2012
Which Summary? (continued)
Average requires quantitative data (numbers)
Median works with quantitative or ordinal
Mode works with quantitative, ordinal, or nominal
Quantitative Ordinal Nominal
Average Yes - -
Median Yes Yes -
Mode Yes Yes Yes
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 14/31
Slide
4-14
2/10/2012
Weighted Average
Ordinary average gives same weight to allelementary units
Weighted average allows diff erent weights
Weights must add up to 1
± If not, then divide each by their total
n X n
X n
X n
X 1
...11
21!
nn X w X w X w X ! ...2211
1...21!
nwww
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 15/31
Slide
4-15
2/10/2012
Weighted Average (continued)
Average is per element ary unit
± The average of your course grades is your ³average per
course´
Weighted average is per unit of weig ht
± Your GPA (grade point average) is a weighted average,
using credit hours to def ine the weights. The weighted
average is your ³average per credit hour´
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 16/31
Slide
4-16
2/10/2012
Example: Portfolio Rate of Return
Portfolio ex pected return (an interest rate,indicating per formance) is the weig ht ed aver a g e
of the ex pected rates of return of assets in the
portfolio, weighted by $dollars invested
Portfolio contains three stocks. One ($1,000
invested) is ex pected to return 20%. Another
($1,800 invested) ex pects 15%. Third is $2,200
and 30%.
Total invested is 1,000+1,800+2,200 = $5,000
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 17/31
Slide
4-17
2/10/2012
Example (continued)
Weights are
w1 = $1,000/$5,000 = 0.20
w2 = $1,800/$5,000 = 0.36
w3
= $2,200/$5,000 = 0.44
Weighted average is
0.20v(20%) + 0.36v(15%) + 0.44v(30%) = 22.6%
± The ex pected return for the portfolio.
± Each stock is represented in pr o portion to $ invested
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 18/31
Slide
4-18
2/10/2012
Percentiles
Landmark summaries in the same measurementunits as the data
± e.g., dollars, peo ple, miles per gallon, «
Some f amiliar percentiles
± Smallest data value is 0th percentile
± Median is 50th percentile
± Largest data value is 100th percentile
± 90th
percentile is larger than 90%of
elementary units Finding percentiles
± Diff icult to see f r om histogram
± Easy using CDF (Cumulative Distribution Function)
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 19/31
Slide
4-19
2/10/2012
Cumulative Distribution Function
Data axis horizontally (as in histogram)
Cumulative percent vertically
Equal vertical jump at each data value
0.3, 0.6, 0.9, 1.1, 1.4, 2.8, 3.8, 5.5
0%
50%
100%
$0 $2 $4 $6
Spending
C u m
u l a t i v e
P e r c e n t
80th percentile
is $3.80
80%
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 20/31
Slide
4-20
2/10/2012
Five-Number Summary
Selected landmarks to represent entire data set
± Median = 50th percentile
± Quartiles
LQ = Lower Quartile = 25th percentile
± R ank =
UQ = U pper Quartile = 75th percentile
± R ank is n+1±[rank of lower quartile]
± Extremes
Smallest = 0th percentile
Largest = 100th percentile
2
2
1int1 ¼½
»¬-
«
n
R ank of median
Discard decimal,
if any.
int(10.5)=10int(35)=35
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 21/31
Slide
4-21
2/10/2012
Five-Number Summary (continued)
Pr ovides information about
± Central summary
Median
± R ange of the data
Largest ± smallest
± ³Middle half ́ of the data
Fr om LQ to UQ
± Skewness
If median is not appr oximately half way between quartiles
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 22/31
Slide
4-22
2/10/2012
Box Plot
Displays f ive-number summary
Less detail than histogram
± Easier to compare many gr oups
0 2 4 6 8
Smallest Largest
Lower
QuartileU pper
Quartile
Median
{
Middle half of the data
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 23/31
Slide
4-23
2/10/2012
Spending rank ordered f r om smallest to largest
0.3, 0.6, 0.9, 1.1, 1.4, 2.8, 3.8, 5.5
1 2 3 4 5 6 7 8
LQ is (0.6+0.9)/2 = 0.75
UQ is (2.8+3.8)/2 = 3.3
Example: Spending
R ank of median= (1+8)/2 = 4.5
R ank of UQ= 8+1-2.5=6.5
R ank of LQ= (1+4)/2 = 2.5
4 = int(4.5)
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 24/31
Slide
4-24
2/10/2012
Example: Spending (continued)
Five-number summary
0.3, 0.75, 1.25, 3.3, 5.5
Smallest, LQ, Median, UQ, Largest
Box plot
± Shows some skewness (lack of symmetry)
0 5
Spending ($thousands)
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 25/31
Slide
4-25
2/10/2012
Identifying Outliers
Outliers are def ined as o bservations, if any, either:
± More than UQ + 1.5 (UQ LQ), or
± Less than LQ 1.5 (UQ LQ)
Outliers aref
ar f
r o
m the center of
the distributio
n ± and may be interesting as special cases
UQ LQ
LQ UQ
1.5(UQ LQ)1.5(UQ LQ) U pper
outliers
Lower
outliers
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 26/31
Slide
4-26
2/10/2012
Example: Technology CEO Pay
CEO compensation in technology companies
± Detailed box plot identif ies outliers
and identif ies the most extreme non-outliers,
gives more detail than the (ordinary) box plot
Fig 4.2.3
$0 $5,000,000 $10,000,000
Detailed Box Plot
$0 $5,000,000 $10,000,000
IBMAMD
Sun
Micr osystems
A pple
Computer
Box Plot
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 27/31
Slide
4-27
2/10/2012
Example: CEO Compensation
Box plots to compare f irms within industry gr oups
± Utilities gr oup generally shows lower compensation
± Highest-paid are in Financial Services gr oup
Fig 4.2.3
$0 $10,000,000 $20,000,000 $30,000,000
Energy
Financial
Technology
Utilities
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 28/31
Slide
4-28
2/10/2012
CEO Compensation (continued)
Detailed box plots (with outliers and most extremenon-outliers named)
Fig 4.2.3
IBMAMD
Enr on
Citigr oupGoldmanSachs
Bear Stearns
MerrillLynch
Morgan StanleyDean Witter
LehmanBr others
Phillips Petr oleum
SunMicr osystems
DukeEnergy
GPU
A ppleComputer
Baker Hughes
BerkshireHathaway
$0 $10,000,000 $20,000,000 $30,000,000
Energy
Financial
Technology
Utilities
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 29/31
Slide
4-29
2/10/2012
Mining the Donations Database
More f requent donors (to p) tend to give smaller current donation amounts (shif t to lef t)
Fig 4.2.4
$0 $50 $100
Size of current donation
N u m b
e r o f
p r e v i o u s
g i f t s
p a s t 2 y e a r s
1
2
3
4+
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 30/31
Slide
4-30
2/10/2012
Example: Business Failures
Per million peo ple, by state90th percentile is 432.4
50th percentile is 260.2
0%
50%
100%
0 100 200 300 400 500 600 700
Failures
C u m u
l a t i v e P e r c e n t
Fig 4.2.9
8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles
http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 31/31
Slide
4-31
2/10/2012
Example: Business Failures
Compare histogram, box plot, and CDF
Histogram
Box plot
CDF
0
10
0 500Failures
0 500Failures
0%
100%
0 500Failures
Fig 4.2.10