mba7020_04.ppt/june 120, 2005/page 1 georgia state university - confidential mba 7020 business...
TRANSCRIPT
MBA7020_04.ppt/June 120, 2005/Page 1Georgia State University - Confidential
MBA 7020
Business Analysis Foundations
Descriptive Statistics
June 20, 2005
MBA7020_04.ppt/June 120, 2005/Page 2Georgia State University - Confidential
Agenda
Confidence Interval
Descriptive Summary Measures
1. Measures of Central LocationMean, Median, Mode
2. Measures of VariationThe Range, Variance and
Standard Deviation
3. Measures of AssociationCovariance and Correlation
MBA7020_04.ppt/June 120, 2005/Page 3Georgia State University - Confidential
1. Measures of Central Location Mean, Median, Mode
2. Measures of Variation The Range, Variance and Standard
Deviation
3. Measures of Association Covariance and Correlation
Describing Data: Summary Measures
MBA7020_04.ppt/June 120, 2005/Page 4Georgia State University - Confidential
1. It is the Arithmetic Average of data values:
2. The Most Common Measure of Central Tendency
3. Affected by Extreme Values (Outliers)
n
xn
ii
1 n
xxx ni 2
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
xSample Mean
Mean
MBA7020_04.ppt/June 120, 2005/Page 5Georgia State University - Confidential
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
1. Important Measure of Central Tendency
2. In an ordered array, the median is the “middle” number.• If n is odd, the median is the middle number.• If n is even, the median is the average of the 2
middle numbers.
3. Not Affected by Extreme Values
Median
MBA7020_04.ppt/June 120, 2005/Page 6Georgia State University - Confidential
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
1. A Measure of Central Tendency
2. Value that Occurs Most Often
3. Not Affected by Extreme Values
4. There May Not be a Mode
5. There May be Several Modes
6. Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
Mode
MBA7020_04.ppt/June 120, 2005/Page 7Georgia State University - Confidential
Agenda
Confidence Interval
Descriptive Summary Measures
1. Measures of Central LocationMean, Median, Mode
2. Measures of VariationThe Range, Variance and
Standard Deviation
3. Measures of AssociationCovariance and Correlation
MBA7020_04.ppt/June 120, 2005/Page 8Georgia State University - Confidential
Variation
Variance / Standard Deviation
Coefficient of Variation
Population
Sample
Range / Percentiles
100%
X
SCV
Measures of Variability
MBA7020_04.ppt/June 120, 2005/Page 9Georgia State University - Confidential
• Measure of Variation
• Difference Between Largest & Smallest
Observations: Range =
• Ignores How Data Are Distributed:
The Range
SmallestLa xx rgest
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
MBA7020_04.ppt/June 120, 2005/Page 10Georgia State University - Confidential
Percentile Scores
1. Arrange data in ascending order.
2. The middle number is the median.
3. The number halfway to the median is the first quartile.
4. The number halfway past the median is the 3rd quartile.
5. A number with (no more than) 66% of the values less than it is the 66th percentile, and so forth.
MBA7020_04.ppt/June 120, 2005/Page 11Georgia State University - Confidential
Box Plot
Median
Q1 Q3Smallest Largest
MBA7020_04.ppt/June 120, 2005/Page 12Georgia State University - Confidential
• Important Measure of Variation
• Shows Variation About the Mean:
• For the Population:
• For the Sample:
Variance
N
X i
2
2
1
2
2
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
MBA7020_04.ppt/June 120, 2005/Page 13Georgia State University - Confidential
• Most Important Measure of Variation
• Shows Variation About the Mean:
• For the Population:
• For the Sample:
Standard Deviation
N
X i
2
1
2
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
MBA7020_04.ppt/June 120, 2005/Page 14Georgia State University - Confidential
Sample Standard Deviation
1
2
n
XX i For the Sample : use n - 1 in the denominator.
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
18
)1624()1618()1617()1615()1614()1612()1610( 2222222
= 4.2426
s
:iX
MBA7020_04.ppt/June 120, 2005/Page 15Georgia State University - Confidential
Comparing Standard Deviations
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C
MBA7020_04.ppt/June 120, 2005/Page 16Georgia State University - Confidential
Agenda
Confidence Interval
Descriptive Summary Measures
1. Measures of Central LocationMean, Median, Mode
2. Measures of VariationThe Range, Variance and
Standard Deviation
3. Measures of AssociationCovariance and Correlation
MBA7020_04.ppt/June 120, 2005/Page 17Georgia State University - Confidential
• Measure of Relative Variation
• Always a %
• Shows Variation Relative to Mean
• Used to Compare 2 or More Groups
• Formula (for Sample):
100%
X
SCV
Coefficient of Variation
MBA7020_04.ppt/June 120, 2005/Page 18Georgia State University - Confidential
• Stock A: Average Price last year = $50
Standard Deviation = $5
• Stock B: Average Price last year = $100
Standard Deviation = $5
100%
X
SCV
Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Comparing Coefficient of Variation
MBA7020_04.ppt/June 120, 2005/Page 19Georgia State University - Confidential
• Describes How Data Are Distributed
• Measures of Shape:
Symmetric or skewed
Right-SkewedLeft-Skewed SymmetricMean = Median = ModeMean Median Mode Median MeanMode
Shape
MBA7020_04.ppt/June 120, 2005/Page 20Georgia State University - Confidential
Agenda
Confidence Interval
Descriptive Summary Measures
MBA7020_04.ppt/June 120, 2005/Page 21Georgia State University - Confidential
Confidence Interval
• Sample Mean + Margin of Error (MOE)
• Called a Confidence Interval
• To Compute Margin of Error, One of Two Conditions Must Be True:
• The Distribution of the Population of Incomes Must Be Normal, or
• The Distribution of Sample Means Must Be Normal.
MBA7020_04.ppt/June 120, 2005/Page 22Georgia State University - Confidential
A Side-Trip Before Constructing Confidence Intervals
1. What is a Population Distribution?
2. What is a Distribution of the Sample Mean?
3. How Does Distribution of Sample Mean Differ From a Population Distribution?
4. What is the Central Limit Theorem?
MBA7020_04.ppt/June 120, 2005/Page 23Georgia State University - Confidential
190 105 254 345 340 363 79 197 334 117149 89 85 124 141 80 161 178 241 182187 182 348 215 203 166 368 317 372 152361 91 287 320 367 215 165 300 180 30897 135 94 183 221 228 187 371 87 14476 353 105 152 308 279 318 292 101 115302 263 127 196 241 288 242 129 366 281234 314 317 154 128 335 109 93 303 297371 353 346 238 225 277 222 119 86 314276 295 250 121 343 188 135 137 175 173165 316 284 156 346 87 288 211 230 152162 316 312 278 302 360 261 292 365 186330 242 337 207 140 333 159 286 287 188174 101 368 161 235 197 374 343 318 348247 287 195 108 344 191 104 308 310 275272 153 305 285 333 76 279 354 88 230349 361 253 242 365 220 152 320 224 330275 353 211 125 94 77 237 260 223 249256 354 235 115 100 248 324 95 156 285199 185 206 174 138 297 232 344 256 232
Assume Small Population of Lexus Owners’ Incomes (N = 200)
Estimating the Population Mean Income of Lexus Owners
MBA7020_04.ppt/June 120, 2005/Page 24Georgia State University - Confidential
Distribution of N = 200 Incomes (Population Mean )
75 125 175 225 275 325 75 125 175 225 275 325
3030
MeanMean
MBA7020_04.ppt/June 120, 2005/Page 25Georgia State University - Confidential
Obs 1 Obs 2 Obs 3 Obs 4 Obs 5 Mean
362 79 197 333 116 217.480 160 177 241 182 168.0
166 367 316 372 151 274.4214 165 300 180 307 233.2228 187 370 87 144 203.2278 317 292 100 114 220.2288 241 129 366 281 261.0335 109 92 303 296 227.0277 221 118 86 313 203.0188 135 136 175 172 161.286 287 211 229 151 192.8
359 260 291 365 185 292.0332 159 285 287 187 250.0
Constructing a Distribution of Samples of Size 5 from N = 200 Owners
x
MBA7020_04.ppt/June 120, 2005/Page 26Georgia State University - Confidential
Distribution of Sample Mean Incomes (Column #7)
125 175 225 275 325
2
6
21
8
3
0
5
10
15
20
25
125 175 225 275 325
Distribution of Sample Means Near Normal! ( )X
Estimated Std. ErrorEstimated Std. Error
MBA7020_04.ppt/June 120, 2005/Page 27Georgia State University - Confidential
Central Limit Theorem
• Even if Distribution of Population is Not Normal, Distribution of Sample Mean Will Be Near Normal Provided You Select Sample of Five or Ten or Greater From the Population.
• For a Sample Sizes of 30 or More, Distribution of the Sample Mean Will Be Normal, with
– mean of sample means = population mean, and
– standard error = [population deviation] / [sqrt(n)]
• Thus Can Use Expression:
MOEX
MBA7020_04.ppt/June 120, 2005/Page 28Georgia State University - Confidential
Why Does Central Limit Theorem Work?
As Sample Size Increases:
1. Most Sample Means will be Close to Population Mean,
2. Some Sample Means will be Either Relatively Far Above or Below Population Mean.
3. A Few Sample Means will be Either Very Far Above or Below Population Mean.
MBA7020_04.ppt/June 120, 2005/Page 29Georgia State University - Confidential
Impact of Side-Trip on MOE
1. Determine Confidence, or Reliability, Factor.
2. Distribution of Sample Mean Normal from Central Limit Theorem.
3. Use a “Normal-Like Table” to Obtain Confidence Factor.
4. Determine Spread in Sample Means (Without Taking Repeated Without Taking Repeated Samples)Samples)
sXin Spreadfactor) e(Confidenc= MOE
MBA7020_04.ppt/June 120, 2005/Page 30Georgia State University - Confidential
Drawing Conclusions about a Population Mean Using a Sample Mean
Select Simple Random SampleSelect Simple Random Sample
Compute Sample Mean andCompute Sample Mean andStd. Dev. For n < 10, Sample Bell-Shaped?Std. Dev. For n < 10, Sample Bell-Shaped?For n >10 For n >10 CLT Ensures Dist of NormalCLT Ensures Dist of Normal
x MOEx MOE
Draw Conclusion about Draw Conclusion about Population MeanPopulation Mean
x
MBA7020_04.ppt/June 120, 2005/Page 31Georgia State University - Confidential
Federal Aid Problem
• Suppose a census tract with 5000 families is eligible for aid under program HR-247 if average income of families of 4 is between $7500 and $8500 (those lower than 7500 are eligible in a different program). A random sample of 12 families yields data on the next page.
MBA7020_04.ppt/June 120, 2005/Page 32Georgia State University - Confidential
Federal Aid Study Calculations
7,300 7,700 8,100 8,4007,800 8,300 8,500 7,600 7,400 7,800 8,300 8,600
x
s
$7,
$441( ) .. ( )
983
7300 7983 8600 7983
12 1
2 2
Representative Sample
x MOE 7 983, MOE
MBA7020_04.ppt/June 120, 2005/Page 33Georgia State University - Confidential
Estimated Standard Error
• Measures Variation Among the Sample Means If We Took Repeated If We Took Repeated SamplesSamples.
• But We Only Have One Sample! How Can We Compute Estimated Standard Error?
• Based on Constructing Distribution of Sample Mean Slide, Will Estimated Standard Error Be Smaller or Larger Than Sample Standard Deviation (s)?
• Estimated Std. Error ______ than s.
• Estimated Standard Error Expression:
sizesample
deviationdardssample
tan
s
nFor Federal
Aid Study
MBA7020_04.ppt/June 120, 2005/Page 34Georgia State University - Confidential
Confidence Factor for MOE:
Df = n-1 2-Sided90%
2-Sided95%
2-Sided99%
8 1.86 2.306 3.355
11 1.796 2.201 3.106
14 1.761 2.145 2.977
17 1.740 2.110 2.898
Can Use t-Table Provided Distribution of Sample Mean is Normal
MBA7020_04.ppt/June 120, 2005/Page 35Georgia State University - Confidential
95% Confidence Interval
ErrordardSEstConfidenceX tan .
7 983 2 201441
127 703 8 263
, ( . )
$ , $ ,
Interpretation of Confidence Interval
• 95% Confident that Interval $7,983 + $280 Contains Unknown PopulationPopulation (Not SampleNot Sample) ) Mean Income.
• If We Selected 1,000 Samples of Size 12 and Constructed 1,000 Confidence Intervals, about 950 Would Contain Unknown Population Mean and 50 Would Not.
• So Is Tract Eligible for Aid???
MBA7020_04.ppt/June 120, 2005/Page 36Georgia State University - Confidential
Sample Means versus Sample Proportion
• Income/Loss
• Time to Complete Loan Papers
• Number of Fat Calories in Burger
• Breaking Strength of Cellular Phone Housing
• Americans Who Believe that Japan is #1 Economic Power
• Circuit Boards with One or More Failed Solder Connections
• African-Americans Who Pass CPA
Mean Proportion of
Means and Proportions Not the Same!!!!
MBA7020_04.ppt/June 120, 2005/Page 37Georgia State University - Confidential
Similarities and Differences Between Sample Means and Proportions
• Sample Means Computed from Data that Are MeasuredMeasured. Estimate Population Means.
• Sample Proportions Computed from Data that Are CountedCounted. Estimate Population Proportions.
MBA7020_04.ppt/June 120, 2005/Page 38Georgia State University - Confidential
Drawing Conclusions about a Population Proportion From a Sample Proportion
Select Simple Random SampleSelect Simple Random Sample
Compute Sample ProportionCompute Sample ProportionCheck for Normality - Table 7.8Check for Normality - Table 7.8
p MOE ( ) /p z p p n 1
Draw Conclusion AboutDraw Conclusion About
Population Proportion, pPopulation Proportion, p
MBA7020_04.ppt/June 120, 2005/Page 39Georgia State University - Confidential
Japan Business Survey
• N =200 Californians
• Yes = 116
• No = 84
Is Japan the ForemostEconomic Power Today?
.p 116
2000 58
MBA7020_04.ppt/June 120, 2005/Page 40Georgia State University - Confidential
90% Confidence Interval on P
.
.(. )(. )
.
. .
58
164558 42
200057
58 057
MOE
MOE
(.523, .637)
90% Confident that Between 52.3%and 63.7% of Californians Believe Japan is Leading Economic Power.
Common Practice to Report Proportions as Percentages. After After Calculations Done.Calculations Done.