sta 291 spring 2010
DESCRIPTION
STA 291 Spring 2010. Lecture 5 Dustin Lueker. Measures of Central Tendency. Mean - Arithmetic Average . Median - Midpoint of the observations when they are arranged in increasing order. Notation: Subscripted variables n = # of units in the sample N = # of units in the population - PowerPoint PPT PresentationTRANSCRIPT
STA 291Spring 2010
Lecture 5Dustin Lueker
Measures of Central Tendency
2
Mode - Most frequent value.
Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured xi = Measurement of the ith unit
Mean - Arithmetic Average
Mean of a Sample - x
Mean of a Population -
μ
Median - Midpoint of the observations when they are arranged in increasing order
STA 291 Spring 2010 Lecture 5
Symbols
3
2
(mu)
(sigma)
(sigma-squared)
or (x-i)
(x-bar)
i
population mean
population standard deviation
population variance
x x observation
x sample mean
s
s
2
sample standard deviation
s sample variance
ummation symbol
STA 291 Spring 2010 Lecture 5
Variance and Standard Deviation Sample
◦ Variance
◦ Standard Deviation
Population◦ Variance
◦ Standard Deviation
4
22 ( )
1
ix xs
n
2( )
1
ix xs
n
22 ( )ix
N
2( )ix
N
STA 291 Spring 2010 Lecture 5
5
Variance Step By Step
1. Calculate the mean2. For each observation, calculate the
deviation3. For each observation, calculate the squared
deviation4. Add up all the squared deviations5. Divide the result by (n-1)
Or N if you are finding the population variance
(To get the standard deviation, take the square root of the result)
STA 291 Spring 2010 Lecture 5
Empirical Rule If the data is approximately symmetric and
bell-shaped then◦ About 68% of the observations are within one
standard deviation from the mean◦ About 95% of the observations are within two
standard deviations from the mean◦ About 99.7% of the observations are within
three standard deviations from the mean
6STA 291 Spring 2010 Lecture 5
Empirical Rule
STA 291 Spring 2010 Lecture 5 7
The pth percentile (Xp) is a number such that p% of the observations take values below it, and (100-p)% take values above it◦ 50th percentile = median◦ 25th percentile = lower quartile◦ 75th percentile = upper quartile
The index of Lp
◦ (n+1)p/100
Percentiles
8STA 291 Spring 2010 Lecture 5
25th percentile ◦ lower quartile◦ Q1◦ (approximately) median of the observations
below the median
75th percentile◦ upper quartile◦ Q3◦ (approximately) median of the observations
above the median
Quartiles
9STA 291 Spring 2010 Lecture 5
Find the 25th percentile of this data set◦ {3, 7, 12, 13, 15, 19, 24}
Example
10STA 291 Spring 2010 Lecture 5
Use when the index is not a whole number Want to start with the closest index lower
than the number found then go the distance of the decimal towards the next number
If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value◦ In essence we are going to the 5.4th value
Interpolation
STA 291 Spring 2010 Lecture 5 11
Find the 40th percentile of the same data set◦ {3, 7, 12, 13, 15, 19, 24}
Must use interpolation
Example
12STA 291 Spring 2010 Lecture 5
Five Number Summary◦ Minimum◦ Lower Quartile◦ Median◦ Upper Quartile◦ Maximum
Example◦ minimum=4◦ Q1=256◦ median=530◦ Q3=1105◦ maximum=320,000.
What does this suggest about the shape of the distribution?
Data Summary
13STA 291 Spring 2010 Lecture 5
The Interquartile Range (IQR) is the difference between upper and lower quartile◦ IQR = Q3 – Q1◦ IQR = Range of values that contains the middle
50% of the data◦ IQR increases as variability increases
Murder Rate Data◦ Q1= 3.9◦ Q3 = 10.3◦ IQR =
Interquartile Range (IQR)
14STA 291 Spring 2010 Lecture 5
Displays the five number summary (and more) graphical
Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile)
A line within the box that marks the median, And whiskers that extend to the maximum
and minimum values
This is assuming there are no outliers in the data set
Box Plot
15STA 291 Spring 2010 Lecture 5
An observation is an outlier if it falls ◦ more than 1.5 IQR above the upper quartile
or◦ more than 1.5 IQR below the lower quartile
Outliers
16STA 291 Spring 2010 Lecture 5
Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles
If an observation is an outlier, it is marked by an x, +, or some other identifier
Box Plot
17STA 291 Spring 2010 Lecture 5
Values Min = 148 Q1 = 158 Median = Q2 = 162 Q3 = 182 Max = 204
Create a box plot
Example
18STA 291 Spring 2010 Lecture 5
On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away.
For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum.
Symmetric distributions?
5 Number Summary/Box Plot
STA 291 Spring 2010 Lecture 5 19
Value that occurs most frequently◦ Does not need to be near the center of the distribution
Not really a measure of central tendency◦ Can be used for all types of data (nominal, ordinal,
interval) Special Cases
◦ Data Set {2, 2, 4, 5, 5, 6, 10, 11} Mode =
◦ Data Set {2, 6, 7, 10, 13} Mode =
Mode
20STA 291 Spring 2010 Lecture 5
Mean◦ Interval data with an approximately symmetric
distribution Median
◦ Interval or ordinal data Mode
◦ All types of data
Mean vs. Median vs. Mode
21STA 291 Spring 2010 Lecture 5
Mean is sensitive to outliers◦ Median and mode are not
Why? In general, the median is more appropriate
for skewed data than the mean◦ Why?
In some situations, the median may be too insensitive to changes in the data
The mode may not be unique
Mean vs. Median vs. Mode
22STA 291 Spring 2010 Lecture 5
Example “How often do you read the newspaper?”
23
Response Frequency
every day 969
a few times a week
452
once a week 261
less than once a week
196
Never 76
TOTAL 1954
• Identify the mode
• Identify the median response
STA 291 Spring 2010 Lecture 5
Measures of Variation Statistics that describe variability
◦ Two distributions may have the same mean and/or median but different variability Mean and Median only describe a typical value,
but not the spread of the data
◦ Range◦ Variance◦ Standard Deviation◦ Interquartile Range
All of these can be computed for the sample or population
24STA 291 Spring 2010 Lecture 5
Range Difference between the largest and smallest
observation◦ Very much affected by outliers
A misrecorded observation may lead to an outlier, and affect the range
The range does not always reveal different variation about the mean
25STA 291 Spring 2010 Lecture 5
Example Sample 1
◦ Smallest Observation: 112◦ Largest Observation: 797◦ Range =
Sample 2◦ Smallest Observation: 15033◦ Largest Observation: 16125◦ Range =
26STA 291 Spring 2010 Lecture 5