chapter 01 - introduction and descriptive statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 1/11

International University IU

Powered by statisticsforbusinessiuba.blogspot.com S t a t i s t i c s f o r B u s i n e s s

|

C h a p t e r 0 1 : I n t r o d u c t i o n a

n d

D e s c r i p t i v e S t a t i s t i c s

1

STATISTICS FOR BUSINESS [IUBA]

CHAPTER 01

INTRODUCTION AND DESCRIPTIVE STATISTICS

1. SAMPLES AND POPULATIONS

Population consists of the set of all

measurements in which the

investigator is interested.

Sample is a subset of measurements

selected from the population.

Random sample is a sample selected

in the way that sampling from the

population is often done randomly,

such that every possible sample of n

elements will have an equal chance

of being selected.

2. PERCENTILES AND QUARTILES

Percentiles: The th percentile of a group of numbers is that value below which lie %

( percent) of the numbers in the group. The position of the th percentile is given by

( + )/, where is the number of data points.

Quartiles: The percentage points that break down the data set into quarters—first

quarter, second quarter, third quarter, and fourth quarter.

+ The 1st

quartile/lower quartile is the 25th

percentile.

+ The median is the 50th

percentile.

+ The 3rd

quartile/lower quartile is the 75th

percentile.





|


n d


2

+ Interquartile Range = 3rd

Quartile – 1st

Quartile

= Upper Quartile – Lower Quartile

= 75

th

Quartile – 25

th

Quartile

+ Range = Largest Observation – Smallest Observation

Example The following data are numbers of passengers on flights of Delta Air Lines between San

Francisco and Seattle over 33 days in April and early May.

128, 121, 134, 136, 136, 118, 123, 109, 120, 116, 125, 128, 121, 129, 130,

131, 127, 119, 114, 134, 110, 136, 134, 125, 128, 123, 128, 133, 132, 136,

134, 129, 132

Find the lower, middle, and upper quartiles of this data set. Also find the 10th, 15th, and

65th percentiles. What is the interquartile range?

(Hint : Use ( + )/ )Solution Firstly, let’s order the data from smallest to largest

109, 110, 114, 116, 118, 119, 120, 121, 121, 123,

123, 125, 125, 127,128, 128, 128, 128, 129, 129,

130, 131, 132, 132, 133, 134, 134, 134, 134, 136,

136, 136, 136

n=33 The lower quartile is the observation in position

(33 + 1)25/ 100 = 8.5, which is 121.

The middle quartile (median) is the observation in position

(33 + 1)50/ 100 = 17, which is 128.

The upper quartile is the observation in position

(33 + 1)75/100 = 25.5, which is 133.5.

The 10th percentile is the observation in position

(33 + 1)10/100 = 3.4, which is 114+(116− 114)(0.4) = 114.8.

The 15th percentile is the observation in position

(33 + 1)15/100 = 5.1, which is 118+(119

−118)(0.1) = 118.1.

The 65th percentile is the observation in position(33 + 1)65/100 = 22.1, which is 131 +(132− 131)(0.1) = 131.1.

The interquartile range is equal to

Third quartile – First quartile = 133.5− 121 = 12.5





|


n d


3

3. MODE/MEAN/VARIANCE/STANDARD DEVIATION

Sample Population

Mode

The mode of the data setis the value that occurs

most frequently.

MeanThe mean of a set of

observations is their

average.

= / n µ =

/ N

VarianceThe variance of a set of

observations is the

average squared

deviation of the data

points from their mean.

s =(x − x)

/ (n− 1) σ =(x − µ)

/ N

Standard DeviationThe standard deviation

of a set of observations

is the (positive) square

root of the variance of

the set.

s = s = (x − x) / (n− 1) σ = σ = (x − µ)

/ N





|


n d


4

CALCULATOR INSTRUCTIONS FOR STATISTICS

Note: This page is only relevant for CASIO scientif ic calculator FX-570ES

Computing Mean and Standard Deviation of Sample / Population.

(Chapter 01 | Introduction and Descriptive Statistics)

Step 01: Press MODE + 3: STAT

Step 02: Press 1: 1 – VAR

Step 03: Input the data

Step 04: Press SHIFT + 1 [STAT]

Step 05: Press 5: VAR

Step 06:

Press 2: to compute the sample mean or population mean

Press 3: to compute the population standard deviation

Press 4: − to compute the sample standard deviation





|


n d


5

Example Case of population Case of sample

The future Euroyen is the price of the

Japanese yen as traded in the European

futures market. The following are 30-day

Euroyen prices on an index from 0 to100%:

99.24, 99.37, 98.33, 98.91, 98.51, 99.38,

99.71, 99.21, 98.63, 99.10.

Find the mean, standard deviation, and

variance, viewed as a population.

The daily expenditure on food by a

traveler, in dollars in summer 2006, was as

follows:

17.5, 17.6, 18.3, 17.9, 17.4, 16.9, 17.1,17.1, 18.0, 17.2, 18.3, 17.8, 17.1, 18.3,

17.5, 17.4.

Find the mean, standard deviation, and

variance.

(Hint: Use the calculator) (Hint : Use the calculator)

Solution It is not necessary to order the data f rom smallest to largest in bot h cases

Step 01 Press MODE + 3: STAT Press MODE + 3: STAT

Step 02 Press 1: 1 – VAR Press 1: 1 – VAR

Step 03 Input the data Input the data

Step 04 Press SHIFT + 1 [STAT] Press SHIFT + 1 [STAT]

Step 05 Press 5: VAR Press 5: VAR

Step 06 Press 2: to compute the

population mean

Press 2: to compute the sample

mean

The result we can get is 99.039 The result we can get is 17.588

Press 3: to compute the

population standard deviation

The result we can get is 0.414

Press 4: − to compute the

sample standard deviation

The result we can get is 0.466

Finally, to compute the

population variance, we use

the following formula: = (0.414) ≈ 0.172

Finally, to compute the sample

variance, we use the following

formula: = (0.466) ≈ 0.217

Conclusion Population mean = 99.039

Population standard deviation = 0.414

Population variance

= 0.172

Sample mean̅ = 17.588

Population standard deviation = 0.466

Population variance

= 0.217





|


n d


6

4. CHEBYSHEV’S THEOREM AND THE EMPIRICAL RULE

Chebychev’s Theorem

No condition: The Chebychev’s theorem can apply in any case.

1. At least three-quarters of the observations in a set will lie within 2 standard

deviations of the mean.

2. At least eight-ninths of the observations in a set will lie within 3 standard deviations

of the mean.

PROCEDURE OF CHEBYCHEV’S THEOREM

STEP 01: Determine the sample mean (

) and the sample standard deviation (

)

STEP 02: Choose the rule of Chebyshev’s theorem and determine the value of

STEP 03: Calculate the interval ± STEP 04: Determine the percentage of observations lying into the specified range̅± (Divide the number of observations lying into the specified range

by the total number of observations in the data set)

STEP 05: Draw a conclusion





|


n d


7

Empirical Rule

Condition: The empirical rule can apply if the distribution of the data is mound-shaped—

that is, if the histogram of the data is more or less symmetric with a single mode or high

point.

1. Approximately 68% of the observations will be within 1 standard deviation of the mean.

2. Approximately 95% of the observations will be within 2 standard deviations of the mean.

3. A vast majority of the observations (all, or almost all) will be within 3 standard

deviations of the mean.

PROCEDURE OF THE EMPIRICAL RULE

STEP 01: Draw the histogram of the data and check the condition that the

distribution of the data is mound-shaped If the distribution of the data is mound-shaped, follow the next five steps.

If not, do nothing more.

STEP 02: Determine the sample mean () and the sample standard deviation ()

STEP 03: Choose the rule of the Empirical Rule and determine the value of

STEP 04: Calculate the interval ± STEP 05: Determine the percentage of observations lying into the specified range̅± (Divide the number of observations lying into the specified range

by the total number of observations in the data set)

STEP 06: Draw a conclusion.





|


n d


8

Example Check the applicability of Chebyshev’s theorem and the empirical rule for the following

data set

12.5, 13, 14.8, 11, 16.7, 9, 8.3, 1.2, 3.9, 15.5, 16.2, 18, 11.6, 10, 9.5

SolutionChebyshev’s Theorem:

We found that:

the sample mean = .

the sample standard deviation = .

According to rule 1 of Chebyshev’s Theorem, the value of = and the interval± = .±×.= [., .] From the data set itself, we see that there are 14 of 15 observations in the set,

⁄ ≈

.

=

.

% are within the specified range, so the rule that at least

three-quarters will be within range is satisfied.

The Empi rical Rule:

Since the distribution of the data is not mound-shaped, the empirical rule cannot apply.





|


n d


9

5. BOX PLOT

Introduction

+ A box plot (also called a box-and-whisker plot) is another way of looking at a data setin an effort to determine its central tendency, spread, skewness, and the existence of

outliers

+ A box plot is a set of five summary measures of the distribution of the data:

1. The median of the data

2. The lower quartile

3. The upper quartile

4. The smallest observation

5. The largest observation

The elements of a box plot

- The median is marked as a vertical line across the box.

- The hinges of the box are the upper and lower quartiles (the rightmost and

leftmost sides of the box).

- The interquartile range (IQR) is the distance from the upper quartile to the lower

quartile (the length of the box from hinge to hinge): = − - The inner fence as a point at a distance of .() above the upper quartile;

similarly, the lower inner fence is Q − 1.5(IQR).

- The outer fences are defined similarly but are at a distance of () above or

below the appropriate hinge.





|


n d


10

THE ELEMENTS OF A BOX PLOT

Box plots are very useful for the following purposes.

1. To identify the location of a data set based on the median.2. To identify the spread of the data based on the length of the box, hinge to hinge (the

interquartile range), and the length of the whiskers (the range of the data without extreme

observations: outliers or suspected outliers).

3. To identify possible skewness of the distribution of the data set. If the portion of the box to the

right of the median is longer than the portion to the left of the median, and/or the right whisker

is longer than the left whisker, the data are right-skewed. Similarly, a longer left side of the box

and/or left whisker implies a left-skewed data set. If the box and whiskers are symmetric, the

data are symmetrically distributed with no skewness.

4. To identify suspected outliers (observations beyond the inner fences but within the outer fences)

and outliers (points beyond the outer fences).

5. To compare two or more data sets. By drawing a box plot for each data set and displaying thebox plots on the same scale, we can compare several data sets.





|


n d


11

Example Construct a box plot for the following data set

5, 8, 6, 9, 17, 24, 10, 5, 6, 13, 5, 3, 6, 12, 11, 10, 9, 10, 14, 15

Solution Let’s order the data from smallest to largest

3, 5, 5, 5, 6, 6, 6, 8, 9, 9, 10, 10, 10, 11, 12, 13, 14, 15, 17, 24

= 20

The median is the observation in position (20 + 1)50/ 100 = 10.5, which is 9.5.

The lower quartile is the observation in position (20 + 1)25/100 = 5.25, which is 6.

The upper quartile is the observation in position (20 + 1)75/100 = 15.75, which is 12.75.

The smallest observation is 3.

The largest observation is 24.

Table 1

Smallest

Observation

Lower

Quartile Median

Upper

Quartile

Largest

ObservationPosition 5.25 10.5 15.75

Observation 3 6 9.5 12.75 24

IQR = Upper Quartile – Lower Quartile = 12.75 – 6 = 6.75

Lower Inner Fence= Q − 1.5(IQR) = 6− 10.125=−4.125

Upper Inner Fence= Q +1.5(IQR) = 12.75+10.125 = 22.875

Lower Outer Fence= Q − 3(IQR) = 6− 20.25=−14.25

Upper Outer Fence= Q +3(IQR) =12.75+20.25 = 33

Table 2

Lower Outer

Fence

Lower Inner

FenceMedian

Upper Inner

Fence

Upper Outer

Fence

Q − 3(IQR) Q − 1.5(IQR) 9.5

Q +1.5(IQR) +3(IQR) −1 .25 −4.125 22.875 33

Box Plot

Conclusion:

Based on the box plot, we can see that the distribution of the data is relatively symmetric.

And there is one suspected outlier, 24.

chapter 01 - introduction and descriptive statistics

Documents