chapter 01 - introduction and descriptive statistics
TRANSCRIPT
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 1/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
1
STATISTICS FOR BUSINESS [IUBA]
CHAPTER 01
INTRODUCTION AND DESCRIPTIVE STATISTICS
1. SAMPLES AND POPULATIONS
Population consists of the set of all
measurements in which the
investigator is interested.
Sample is a subset of measurements
selected from the population.
Random sample is a sample selected
in the way that sampling from the
population is often done randomly,
such that every possible sample of n
elements will have an equal chance
of being selected.
2. PERCENTILES AND QUARTILES
Percentiles: The th percentile of a group of numbers is that value below which lie %
( percent) of the numbers in the group. The position of the th percentile is given by
( + )/, where is the number of data points.
Quartiles: The percentage points that break down the data set into quarters—first
quarter, second quarter, third quarter, and fourth quarter.
+ The 1st
quartile/lower quartile is the 25th
percentile.
+ The median is the 50th
percentile.
+ The 3rd
quartile/lower quartile is the 75th
percentile.
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 2/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
2
+ Interquartile Range = 3rd
Quartile – 1st
Quartile
= Upper Quartile – Lower Quartile
= 75
th
Quartile – 25
th
Quartile
+ Range = Largest Observation – Smallest Observation
Example The following data are numbers of passengers on flights of Delta Air Lines between San
Francisco and Seattle over 33 days in April and early May.
128, 121, 134, 136, 136, 118, 123, 109, 120, 116, 125, 128, 121, 129, 130,
131, 127, 119, 114, 134, 110, 136, 134, 125, 128, 123, 128, 133, 132, 136,
134, 129, 132
Find the lower, middle, and upper quartiles of this data set. Also find the 10th, 15th, and
65th percentiles. What is the interquartile range?
(Hint : Use ( + )/ )Solution Firstly, let’s order the data from smallest to largest
109, 110, 114, 116, 118, 119, 120, 121, 121, 123,
123, 125, 125, 127,128, 128, 128, 128, 129, 129,
130, 131, 132, 132, 133, 134, 134, 134, 134, 136,
136, 136, 136
n=33 The lower quartile is the observation in position
(33 + 1)25/ 100 = 8.5, which is 121.
The middle quartile (median) is the observation in position
(33 + 1)50/ 100 = 17, which is 128.
The upper quartile is the observation in position
(33 + 1)75/100 = 25.5, which is 133.5.
The 10th percentile is the observation in position
(33 + 1)10/100 = 3.4, which is 114+(116− 114)(0.4) = 114.8.
The 15th percentile is the observation in position
(33 + 1)15/100 = 5.1, which is 118+(119
−118)(0.1) = 118.1.
The 65th percentile is the observation in position(33 + 1)65/100 = 22.1, which is 131 +(132− 131)(0.1) = 131.1.
The interquartile range is equal to
Third quartile – First quartile = 133.5− 121 = 12.5
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 3/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
3
3. MODE/MEAN/VARIANCE/STANDARD DEVIATION
Sample Population
Mode
The mode of the data setis the value that occurs
most frequently.
MeanThe mean of a set of
observations is their
average.
= / n µ =
/ N
VarianceThe variance of a set of
observations is the
average squared
deviation of the data
points from their mean.
s =(x − x)
/ (n− 1) σ =(x − µ)
/ N
Standard DeviationThe standard deviation
of a set of observations
is the (positive) square
root of the variance of
the set.
s = s = (x − x) / (n− 1) σ = σ = (x − µ)
/ N
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 4/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
4
CALCULATOR INSTRUCTIONS FOR STATISTICS
Note: This page is only relevant for CASIO scientif ic calculator FX-570ES
Computing Mean and Standard Deviation of Sample / Population.
(Chapter 01 | Introduction and Descriptive Statistics)
Step 01: Press MODE + 3: STAT
Step 02: Press 1: 1 – VAR
Step 03: Input the data
Step 04: Press SHIFT + 1 [STAT]
Step 05: Press 5: VAR
Step 06:
Press 2: to compute the sample mean or population mean
Press 3: to compute the population standard deviation
Press 4: − to compute the sample standard deviation
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 5/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
5
Example Case of population Case of sample
The future Euroyen is the price of the
Japanese yen as traded in the European
futures market. The following are 30-day
Euroyen prices on an index from 0 to100%:
99.24, 99.37, 98.33, 98.91, 98.51, 99.38,
99.71, 99.21, 98.63, 99.10.
Find the mean, standard deviation, and
variance, viewed as a population.
The daily expenditure on food by a
traveler, in dollars in summer 2006, was as
follows:
17.5, 17.6, 18.3, 17.9, 17.4, 16.9, 17.1,17.1, 18.0, 17.2, 18.3, 17.8, 17.1, 18.3,
17.5, 17.4.
Find the mean, standard deviation, and
variance.
(Hint: Use the calculator) (Hint : Use the calculator)
Solution It is not necessary to order the data f rom smallest to largest in bot h cases
Step 01 Press MODE + 3: STAT Press MODE + 3: STAT
Step 02 Press 1: 1 – VAR Press 1: 1 – VAR
Step 03 Input the data Input the data
Step 04 Press SHIFT + 1 [STAT] Press SHIFT + 1 [STAT]
Step 05 Press 5: VAR Press 5: VAR
Step 06 Press 2: to compute the
population mean
Press 2: to compute the sample
mean
The result we can get is 99.039 The result we can get is 17.588
Press 3: to compute the
population standard deviation
The result we can get is 0.414
Press 4: − to compute the
sample standard deviation
The result we can get is 0.466
Finally, to compute the
population variance, we use
the following formula: = (0.414) ≈ 0.172
Finally, to compute the sample
variance, we use the following
formula: = (0.466) ≈ 0.217
Conclusion Population mean = 99.039
Population standard deviation = 0.414
Population variance
= 0.172
Sample mean̅ = 17.588
Population standard deviation = 0.466
Population variance
= 0.217
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 6/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
6
4. CHEBYSHEV’S THEOREM AND THE EMPIRICAL RULE
Chebychev’s Theorem
No condition: The Chebychev’s theorem can apply in any case.
1. At least three-quarters of the observations in a set will lie within 2 standard
deviations of the mean.
2. At least eight-ninths of the observations in a set will lie within 3 standard deviations
of the mean.
PROCEDURE OF CHEBYCHEV’S THEOREM
STEP 01: Determine the sample mean (
) and the sample standard deviation (
)
STEP 02: Choose the rule of Chebyshev’s theorem and determine the value of
STEP 03: Calculate the interval ± STEP 04: Determine the percentage of observations lying into the specified range̅± (Divide the number of observations lying into the specified range
by the total number of observations in the data set)
STEP 05: Draw a conclusion
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 7/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
7
Empirical Rule
Condition: The empirical rule can apply if the distribution of the data is mound-shaped—
that is, if the histogram of the data is more or less symmetric with a single mode or high
point.
1. Approximately 68% of the observations will be within 1 standard deviation of the mean.
2. Approximately 95% of the observations will be within 2 standard deviations of the mean.
3. A vast majority of the observations (all, or almost all) will be within 3 standard
deviations of the mean.
PROCEDURE OF THE EMPIRICAL RULE
STEP 01: Draw the histogram of the data and check the condition that the
distribution of the data is mound-shaped If the distribution of the data is mound-shaped, follow the next five steps.
If not, do nothing more.
STEP 02: Determine the sample mean () and the sample standard deviation ()
STEP 03: Choose the rule of the Empirical Rule and determine the value of
STEP 04: Calculate the interval ± STEP 05: Determine the percentage of observations lying into the specified range̅± (Divide the number of observations lying into the specified range
by the total number of observations in the data set)
STEP 06: Draw a conclusion.
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 8/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
8
Example Check the applicability of Chebyshev’s theorem and the empirical rule for the following
data set
12.5, 13, 14.8, 11, 16.7, 9, 8.3, 1.2, 3.9, 15.5, 16.2, 18, 11.6, 10, 9.5
SolutionChebyshev’s Theorem:
We found that:
the sample mean = .
the sample standard deviation = .
According to rule 1 of Chebyshev’s Theorem, the value of = and the interval± = .±×.= [., .] From the data set itself, we see that there are 14 of 15 observations in the set,
⁄ ≈
.
=
.
% are within the specified range, so the rule that at least
three-quarters will be within range is satisfied.
The Empi rical Rule:
Since the distribution of the data is not mound-shaped, the empirical rule cannot apply.
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 9/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
9
5. BOX PLOT
Introduction
+ A box plot (also called a box-and-whisker plot) is another way of looking at a data setin an effort to determine its central tendency, spread, skewness, and the existence of
outliers
+ A box plot is a set of five summary measures of the distribution of the data:
1. The median of the data
2. The lower quartile
3. The upper quartile
4. The smallest observation
5. The largest observation
The elements of a box plot
- The median is marked as a vertical line across the box.
- The hinges of the box are the upper and lower quartiles (the rightmost and
leftmost sides of the box).
- The interquartile range (IQR) is the distance from the upper quartile to the lower
quartile (the length of the box from hinge to hinge): = − - The inner fence as a point at a distance of .() above the upper quartile;
similarly, the lower inner fence is Q − 1.5(IQR).
- The outer fences are defined similarly but are at a distance of () above or
below the appropriate hinge.
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 10/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
10
THE ELEMENTS OF A BOX PLOT
Box plots are very useful for the following purposes.
1. To identify the location of a data set based on the median.2. To identify the spread of the data based on the length of the box, hinge to hinge (the
interquartile range), and the length of the whiskers (the range of the data without extreme
observations: outliers or suspected outliers).
3. To identify possible skewness of the distribution of the data set. If the portion of the box to the
right of the median is longer than the portion to the left of the median, and/or the right whisker
is longer than the left whisker, the data are right-skewed. Similarly, a longer left side of the box
and/or left whisker implies a left-skewed data set. If the box and whiskers are symmetric, the
data are symmetrically distributed with no skewness.
4. To identify suspected outliers (observations beyond the inner fences but within the outer fences)
and outliers (points beyond the outer fences).
5. To compare two or more data sets. By drawing a box plot for each data set and displaying thebox plots on the same scale, we can compare several data sets.
8/13/2019 Chapter 01 - Introduction and Descriptive Statistics
http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 11/11
International University IU
Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s
|
C h a p t e r 0 1 : I n t r o d u c t i o n a
n d
D e s c r i p t i v e S t a t i s t i c s
11
Example Construct a box plot for the following data set
5, 8, 6, 9, 17, 24, 10, 5, 6, 13, 5, 3, 6, 12, 11, 10, 9, 10, 14, 15
Solution Let’s order the data from smallest to largest
3, 5, 5, 5, 6, 6, 6, 8, 9, 9, 10, 10, 10, 11, 12, 13, 14, 15, 17, 24
= 20
The median is the observation in position (20 + 1)50/ 100 = 10.5, which is 9.5.
The lower quartile is the observation in position (20 + 1)25/100 = 5.25, which is 6.
The upper quartile is the observation in position (20 + 1)75/100 = 15.75, which is 12.75.
The smallest observation is 3.
The largest observation is 24.
Table 1
Smallest
Observation
Lower
Quartile Median
Upper
Quartile
Largest
ObservationPosition 5.25 10.5 15.75
Observation 3 6 9.5 12.75 24
IQR = Upper Quartile – Lower Quartile = 12.75 – 6 = 6.75
Lower Inner Fence= Q − 1.5(IQR) = 6− 10.125=−4.125
Upper Inner Fence= Q +1.5(IQR) = 12.75+10.125 = 22.875
Lower Outer Fence= Q − 3(IQR) = 6− 20.25=−14.25
Upper Outer Fence= Q +3(IQR) =12.75+20.25 = 33
Table 2
Lower Outer
Fence
Lower Inner
FenceMedian
Upper Inner
Fence
Upper Outer
Fence
Q − 3(IQR) Q − 1.5(IQR) 9.5
Q +1.5(IQR) +3(IQR) −1 .25 −4.125 22.875 33
Box Plot
Conclusion:
Based on the box plot, we can see that the distribution of the data is relatively symmetric.
And there is one suspected outlier, 24.