bcor 1020 business statistics lecture 4 – january 29, 2008
TRANSCRIPT
BCOR 1020Business Statistics
Lecture 4 – January 29, 2008
Overview
• Chapter 4 – Descriptive Statistics…– Numerical Description– Central Tendency– Dispersion
Chapter 4 – Numerical Description
Population (Size = N): Characterized by Parameterse.g., = pop. Mean, = pop. Std. dev.
Sample (Size = n): Statistics are computed and estimate parameters
e.g., = sample mean, S = sample std. dev.X
Recall:• Statistics are descriptive measures derived from a sample (n items).
• Parameters are descriptive measures derived from a population (N items).
Chapter 4 – Numerical Description
There are three key characteristics of numerical data:CharacteristicCharacteristic InterpretationInterpretation
Central TendencyCentral Tendency Where are the data values concentrated? Where are the data values concentrated? What seem to be typical or middle data values?What seem to be typical or middle data values?
DispersionDispersion How much variation is there in the data? How much variation is there in the data? How spread out are the data values? How spread out are the data values? Are there unusual values?Are there unusual values?
ShapeShape Are the data values distributed symmetrically? Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?Skewed? Sharply peaked? Flat? Bimodal?
Chapter 4 – Numerical Description
Example: Vehicle Quality• Consider the data set of vehicle defect rates from J. D.
Power and Associates.
• Numerical statistics can be used to summarize this random sample of brands.
• Must allow for sampling error since the analysis is based on sampling.
• Defect rate = total no. defectsno. inspected
x 100
Chapter 4 – Numerical Description
• Number of defects per 100 vehicles, 2004 models.
Chapter 4 – Numerical Description
• Sorted data provides insight into central tendency and dispersion.
Chapter 4 – Numerical Description
Visual Displays:• The dot plot offers a visual impression of the data.
• Histograms with 5 bins (suggested by Sturges’ Rule) and 10 bins are shown below.
• Both are symmetric with no extreme values and show a modal class toward the low end.
Chapter 4 – Numerical Description• We can compute descriptive statistics using Excel and
discuss measures of central tendency and dispersion…– Figures 4.4 and 4.5 in your text details the Excel menus for
computing descriptive statistics.– Figure 4.7 in your text details the MegaStat menus for computing
descriptive statistics.
Chapter 4 – Numerical DescriptionMegaStat output…
Chapter 4 – Central Tendency• The central tendency is the middle or typical
values of a distribution. • Central tendency can be assessed using a dot
plot, histogram or more precisely with numerical statistics.
• The Text presents six measures of central tendency…– Mean – Median– Mode – Midrange– Geometric Mean (G) – Trimmed Mean
• The mean and median are the most frequently used, but we will discuss the merits of all six.
Chapter 4 – Central Tendency
Mean –
• A familiar measure of central tendency.• In Excel, use function =AVERAGE(Data) where Data is
an array of data values.• For the sample of n = 37 car brands:
Population Formula Sample Formula
1
N
ii
x
N
1
n
ii
xx
n
1 87 93 98 ... 159 164 173 4639125.38
37 37
n
ii
xx
n
Chapter 4 – Central Tendency
Characteristics of the Mean:• Arithmetic mean is the most familiar average.• Affected by every sample item.• The balancing point or fulcrum for the data.
• Regardless of the shape of the distribution, distances from the mean to the data points always sum to zero.
1
( ) 0n
ii
x x
Chapter 4 – Central TendencyMedian (M) – the 50th percentile or midpoint of the sorted sample data.• Use Excel’s function =MEDIAN(Data) where Data is an
array of data values.• M separates the upper and lower half of the sorted
observations.– If n is even, the median is the average of the middle two
observations in the data array.– If n is odd, the median is the middle observation in the data
array.
Chapter 4 – Central Tendency
Median:• To compute the median by hand, sort the n observations To compute the median by hand, sort the n observations
in the data: in the data:
nxxxx ,...,,, 321
For even For even nn,, Median = Median = / 2 ( / 2 1)
2n nx x
For odd For odd nn,, Median = Median = ( 1) / 2nx
where nxxxx ...321
Chapter 4 – Central Tendency
Example:• Consider the following n = 6 data values:
11 12 15 17 21 32• What is the median?
M = (x3+x4)/2 = (15+17)/2 = 16
For even For even nn, Median = , Median = / 2 ( / 2 1)
2n nx x
nn/2 = 6/2 = 3 and /2 = 6/2 = 3 and nn/2+1 = 6/2 + 1 = 4/2+1 = 6/2 + 1 = 4
Clickers
Consider the following n = 7 data values:12 23 23 25 27 34 41
What is the median?
A = 24
B = 25
C = 26
D = 27
Chapter 4 – Central Tendency
Median• For the 37 vehicle quality ratings (odd n) the
position of the median is (n+1)/2 = (37+1)/2 = 19.
• So, the median is x19 = 121.
• When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.
Chapter 4 – Central TendencyCharacteristics of the Median• The median is insensitive to extreme data values.• For example, consider the following quiz scores for 3
students:
• What does the median for each student tell you?
Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350
Chapter 4 – Central TendencyMode – The most frequently occurring data value.• Similar to mean and median if data values occur often
near the center of sorted data.• May have multiple modes or no mode.• Easy to define, not easy to calculate in large samples.• Use Excel’s function =MODE(Array)
– will return #N/A if there is no mode.– will return first mode found if multimodal.
• May be far from the middle of the distribution and not at all typical.
• Generally isn’t useful for continuous data since data values rarely repeat.– Best for attribute data or a discrete variable with a small range
(e.g., Likert scale).
Chapter 4 – Central Tendency
Mode:• A bimodal distribution refers to the shape of the
histogram rather than the mode of the raw data.• Occurs when dissimilar populations are
combined in one sample. For example,
Chapter 4 – Central TendencySkewness:• Compare mean and median or look at histogram
to determine degree of skewness.
Mean, Median & Skewness:If median > mean, skewed left.If median = mean, symmetric.If median < mean, skewed right.
Mean, Mode & Skewness:If mode > mean, skewed left.If mode = mean, symmetric.If mode < mean, skewed right.
Chapter 4 – Central Tendency
Midrange – the point halfway between the lowest and highest values of X.
• Easy to use but sensitive to extreme data values.
min max
2
x xMidrange =
ClickersConsider the J. D. Power quality data (n=37):
What is the midrange?
A = 121 B = 122
C = 130 D = 173
Chapter 4 – Central Tendency
Trimmed Mean:• To calculate the trimmed mean, first remove the highest
and lowest k percent of the observations.• To determine how many observations to trim, multiply
k x n:– Remove (k x n) highest and lowest observations.
• Mitigates the effects of extreme values.• May exclude relevant data values.
Chapter 4 – Dispersion
• Variation is the “spread” of data points about the center of the distribution in a sample. The text considers the following measures of dispersion:– Range– Variance (S2)– Standard Deviation (S)– Coefficient of Variation (CV)– Mean Absolute Deviation (MAD)
• The variance and standard deviation are the most frequently used, but we will briefly discuss the merits of all five.
Chapter 4 – Dispersion
Range – The difference between the largest and smallest observation.
• Easy to calculate, but sensitive to extreme data values.
Range = xmax – xmin
Chapter 4 – Dispersion
Variance:• The population variance (2) is
defined as the sum of squared deviations around the mean divided by the population size.
• For the sample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance 2.
22 1
N
ii
x
N
22 1
1
n
ii
x xs
n
Chapter 4 – Dispersion
Standard Deviation – The square root of the variance.• Explains how individual values in a data set vary from
the mean.• Units of measure are the same as X.
• For the 37 vehicle quality ratings …
Population standard deviation
21
N
ii
x
N
Sample
standard deviation
21
1
n
ii
x xs
n
Chapter 4 – Dispersion
38.125)173...989387(371 x
89.22
))38.125173(...)38.12593()38.12587(( 222137
1
S
S
Chapter 4 – DispersionCalculating Standard Deviation:• Excel’s built in functions are…
• The standard deviation is nonnegative because deviations around the mean are squared.
• When every observation is exactly equal to the mean, the standard deviation is zero.
• Standard deviations can be large or small, depending on the units of measure.
• Compare standard deviations only for data sets measured in the same units and only if the means do not differ substantially.
StatisticStatistic Excel Excel populationpopulation formulaformula
Excel Excel sample sample formulaformula
VarianceVariance =VARP(Array)=VARP(Array) =VAR(Array)=VAR(Array)
Standard deviationStandard deviation =STDEVP(Array)=STDEVP(Array) =STDEV(Array)=STDEV(Array)
Chapter 4 – Dispersion
Coefficient of Variation – A unit-free measure of dispersion.• Expressed as a percent of the mean.
• Useful for comparing variables measured in different units or with different means.
• Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.
100s
CVx
ClickersRecall from the J. D. Power quality data (n=37):
What is the Coefficient of Variation ?
A = 5.48%
B = 18.26%
C = 22.89%
D = 125.38%
38.125)173...989387(371 x
89.22
))38.125173(...)38.12593()38.12587(( 222137
1
S
S
Chapter 4 – DispersionMean Absolute Deviation (MAD) – reveals the average distance from an individual data point to the mean (center of the distribution).
• Uses absolute values of the deviations around the mean.
• Excel’s function is =AVEDEV(Array).
1
n
ii
x xMAD
n
Chapter 4 – Dispersion
• Consider the histograms of hole diameters drilled in a steel plate during manufacturing.
• The desired distribution is outlined in red.
Machine A Machine B
Central Tendency vs. Dispersion: Manufacturing
Desired mean (5mm) but too much variation.
Acceptable variation but mean is less than 5 mm.