lect01

Quantitative Methods for Decision Making

Lecture 1

Dr. Akhter

5 t h e d i t i o n

Marking Scheme Mid term 30%

Final Exam 40%

Quizzes 15% (mean of best five quizzes each of 15 points)

Assignments 15% (mean of best 7 assignments each of 15 points)

Book Introductory STATISTICS

9TH EDITION

ISBN-13: 978-0-321-69122-4

ISBN-10: 0-321-69122-9

Neil A. Weiss

Addison-Wesley

Topics

Gathering information and its Presentation

Measures of central tendency

Measures of Dispersion-

Probability Concepts

Random & Non Random Variables

Some Special Distributions

The Normal distribution

Fitting of a distribution

Sampling distributions

Topics

Estimation Theory

Mathematical Models

Regression & Correlation

Decision Theory (p-value approach)

Decision based on risk

Experimental Designs

Case studies related to the CRD and RBD using some

industrial and financial data sets

Setting up ANOVA tables and Decision Making

Computer Support producing group research

Statistics

Statistics (as subject) Science of collecting and analyzing data for the purpose

of drawing conclusions and making decisions Provides data collection methods to reduce biases, and

analysis methods to identify patterns and draw inference from noisy data

Statistics (facts and figures)

Aggregate of numerical facts: Statistics of scores,

statistics of marks, statistics of wages etc.

Statistic (constant) A characteristics of sample

Important terms

Population: Homogeneous, Heterogeneous, finite, Infinite,

Hypothetical, Existent,

Census Complete enumeration

Sampling frame or frame A complete list of all elements in our

population

Sampling, Sample, Random Sample

Parameter Characteristic of population

Statistic Characteristic of sample

Statistical Methods

Statistical

Methods

Descriptive

Statistics

Inferential

Statistics

• Descriptive statistics consists of methods for organizing,

displaying, and describing data by using tables, graphs, and

summary measures.

• Descriptive statistics is concerned with exploring, visualising, and

summarizing data but without fitting the data to any models.

• This kind of analysis is used to explore the data in the initial stages

of data analysis.

• Since no models are involved, it can not be used to test hypotheses

or to make testable predictions.

• Nevertheless, it is a very important part of analysis that can reveal

many interesting features in the data.

Descriptive statistics

Inferential statistics

Involves the identification of a suitable model. The data is then fit to the model to obtain an optimal estimation of the model's parameters.

The model then undergoes validation by testing either predictions or hypotheses of the model.

Models based on a unique sample of data can be used to infer generalities about features of the whole population.

Using Statistics (Two Categories)

Inferential Statistics Predict and forecast

values of population

parameters

Test hypotheses about

values of population

parameters

Make decisions

Descriptive Statistics Collect

Organize

Summarize

Display

Analyze

Qualitative -

Categorical or

Nominal: Color

Gender

Nationality

Quantitative -

Measurable or

Countable: Temperatures

Salaries

Number of points scored

on a 100 point exam

Types of Data - Two Types

Data

Collection of facts and figures

May be qualitative or quantitative

May be discrete or continuous

May be in un-group or group form

Data

Qualitative Quantitative

Discrete Continuous

A population consists of the set of all

measurements for which the investigator

is interested.

A sample is a subset of the measurements

selected from the population.

A census is a complete enumeration of

every item in a population.

Samples and Populations

Sampling from the population is often

done randomly, such that every possible

sample of equal size (n) will have an

equal chance of being selected.

A sample selected in this way is called a

simple random sample or just a random

sample.

A random sample allows chance to

determine its elements.

Simple Random Sample

Random Sampling

Stratified Sampling

Cluster Sampling

Systematic Sampling

Judgment Sampling

Quota Sampling

Sampling Techniques

Parameter A population constant

Statistic A sample constant

Parameter and Statistic

,,, 2

prsx ,,, 2

Population (N) Sample (n)

Samples and Populations

Census of a population may be:

Impossible

Impractical

Too costly

Why Sample?

Subscript Notation

iXList Name

Subscript

Subscript Notation

iXList Name

Subscript

ijXDouble Subscript

11 12 13

21 22 23

31 32 33

X X X

X X X

X X X

Summation Notations

1

N

i

i

X

summation

index

start value

stop value

Sigma Notation

Suppose our list has just 5 numbers, and

they are 1,3,2,5,6.

52

1

i

i

X

2 2 2 2 21 3 2 5 6 75

25

1

i

i

X

2 21 3 2 5 6 17 289

Properties of Sigma

1

N

i

a Na

1 1

N N

i i

i i

aX a X

1 1 1

N N N

i i i i

i i i

X Y X Y

1 1 1

N N N

i i i i

i i i

X Y X Y

( 1)y

i x

a y x a

Properties of Sigma

2

1

2

1

2xnxxx

n

i

i

n

i

i

Show that

xnx

or

n

x

x

x

n

i

i

n

i

i

1

1

is which data ofmean arithmetic theis where

Sigma Notation

=

Commonly used Greek Letters

2

1

2 5N

j

i

X

Expand

Exercise

In a survey it was found that 64 families bought milk in the

following quantities (liters) in a particular month:

19 22 09 22 12 39 19 14 23 06 24 16 18

7 17 20 25 28 18 10 24 20 21 10 07 18

28 24 20 14 24 25 34 22 05 33 23 26 29

13 36 11 26 11 37 30 13 08 15 22 21 32

21 31 17 16 23 12 09 15 27 17 21 16

(a) Construct a frequency distribution using 5 intervals

(b) Construct histogram, polygon, and frequency curve

(c) Construct c.f. distributions and draw Ogives

(d) Construct relative, cumulative relative, percentage relative dist’n.

Group data, ungroup data

Unweighted , weighted

Combined arithmetic mean

Assumed mean, trimmed mean

Arithmetic Mean The central value

Ungroup data (even, odd # of observations)

Group data

Graphical method of finding median

Median The most middle observation in arranged data

Ungroup data

Group data

Graphical method of finding mode

Relationship b/w mean, median, & moade

Mode The most frequent observation

Quartiles are the percentage points that break down

the ordered data set into quarters.

The first quartile is the 25th percentile. It is the point

below which lie 1/4 of the data.

The second quartile is the 50th percentile. It is the

point below which lie 1/2 of the data. This is also

called the median.

The third quartile is the 75th percentile. It is the

point below which lie 3/4 of the data.

Quartiles

The first quartile, Q1, (25th percentile) is

often called the lower quartile.

The second quartile, Q2, (50th

percentile) is often called median or the

middle quartile.

The third quartile, Q3, (75th percentile)

is often called the upper quartile.

The interquartile range is the difference

between the first and the third quartiles.

Quartiles and Interquartile Range

Sorted Sales Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24

First Quartile

Median

Third Quartile

(n+1)P/100 Quartiles

Example : Finding Quartiles

Measures of Variability

Range

Interquartile range

Variance

Standard Deviation

Measures of Central Tendency

Median

Mode

Mean

Other summary

measures:

Skewness

Kurtosis

Summary Measures: Population Parameters Sample Statistics

Median Middle value when

sorted in order of

magnitude

50th percentile

Mode Most frequently-

occurring value

Mean Average

Measures of Central Tendency or Location

Sales Sorted Sales

9 6

6 9

12 10

10 12

13 13

15 14

16 14

14 15

14 16

16 16

17 16

16 17

24 17

21 18

22 18

18 19

19 20

18 21

20 22

17 24

Median

Median

50th Percentile

(20+1)50/100=10.5 16 + (.5)(0) = 16

The median is the middle

value of data sorted in

order of magnitude. It is

the 50th percentile.

Example – Median (Data is used from previous example )

.

. . . . . : . : : : . . . . . ---------------------------------------------------------------

6 9 10 12 13 14 15 16 17 18 19 20 21 22 24

Mode = 16

The mode is the most frequently occurring value. It

is the value with the highest frequency.

Example - Mode (Data is used from Example 1-2)

The mean of a set of observations is their average -

the sum of the observed values divided by the

number of observations.

Population Mean Sample Mean

x

N

i

N

1 x

x

n

i

n

1

Arithmetic Mean or Average

x

x

n

i

n

1 317

20 15 85 .

Sales

9

6

12

10

13

15

16

14

14

16

17

16

24

21

22

18

19

18

20

17

317

Example – Mean

.

. . . . . : . : : : . . . . . ---------------------------------------------------------------

6 9 10 12 13 14 15 16 17 18 19 20 21 22 24

Median and Mode = 16

Mean = 15.85

Example - Mode

Dividing data into groups or classes or intervals

Groups should be:

Mutually exclusive

• Not overlapping - every observation is assigned to only one group

Exhaustive

• Every observation is assigned to a group

Equal-width (if possible)

• First or last group may be open-ended

Group Data and the Histogram

Table with two columns listing:

Each and every group or class or interval of values

Associated frequency of each group

• Number of observations assigned to each group

• Sum of frequencies is number of observations

– N for population

– n for sample

Class midpoint is the middle value of a group or class or interval

Relative frequency is the percentage of total observations in each class

Sum of relative frequencies = 1

Frequency Distribution

x f(x) f(x)/n

Spending Class ($) Frequency (number of customers) Relative Frequency

0 to less than 100 30 0.163

100 to less than 200 38 0.207

200 to less than 300 50 0.272

300 to less than 400 31 0.168

400 to less than 500 22 0.120

500 to less than 600 13 0.070

184 1.000

• Example of relative frequency: 30/184 = 0.163

• Sum of relative frequencies = 1

Example : Frequency Distribution

x F(x) F(x)/n

Spending Class ($) Cumulative Frequency Cumulative Relative Frequency

0 to less than 100 30 0.163

100 to less than 200 68 0.370

200 to less than 300 118 0.641

300 to less than 400 149 0.810

400 to less than 500 171 0.929

500 to less than 600 184 1.000

The cumulative frequency of each group is the sum of the

frequencies of that and all preceding groups.

Cumulative Frequency Distribution

A histogram is a chart made of bars of

different heights.

Widths and locations of bars correspond to

widths and locations of data groupings

Heights of bars correspond to frequencies or

relative frequencies of data groupings

Histogram

Frequency Histogram

Histogram Example

Relative Frequency Histogram

Histogram Example

Skewness – Measure of asymmetry of a frequency distribution

• Skewed to left

• Symmetric or unskewed

• Skewed to right

Kurtosis – Measure of flatness or peakedness of a frequency

distribution

• Platykurtic (relatively flat)

• Mesokurtic (normal)

• Leptokurtic (relatively peaked)

Skewness and Kurtosis

Skewed to left

Skewness

Skewness

Symmetric

Skewness

Skewed to right

Kurtosis

Platykurtic - flat distribution

Kurtosis

Mesokurtic - not too flat and not too peaked

Kurtosis

Leptokurtic - peaked distribution

Pie Charts

Categories represented as percentages of total

Bar Graphs

Heights of rectangles represent group frequencies

Frequency Polygons

Height of line represents frequency

Ogives Height of line represents cumulative frequency

Time Plots

Represents values over time

Methods of Displaying Data

Pie Chart

Bar Chart

Average Revenues

Average Expenses

Fig. 1-11 Airline Operating Expenses and Revenues

1 2

1 0

8

6

4

2

0

A i r l i n e

American Continental Delta Northwest Southwest United USAir

Relative Frequency Polygon Ogive

Frequency Polygon and Ogive

5 0 4 0 3 0 2 0 1 0 0

0 . 3

0 . 2

0 . 1

0 . 0

Sales

5 0 4 0 3 0 2 0 1 0 0

1 . 0

0 . 5

0 . 0

Sales

O S A J J M A M F J D N O S A J J M A M F J D N O S A J J M A M F J

8 . 5

7 . 5

6 . 5

5 . 5

M o n t h

M i l l

i o n

s o

f T

o n

s

M o n t h l y S t e e l P r o d u c t i o n

( P r o b l e m 1 - 4 6 )

Time Plot

Stem-and-Leaf Displays

Quick-and-dirty listing of all observations

Conveys some of the same information as a histogram

Box Plots

Median

Lower and upper quartiles

Maximum and minimum

Techniques to determine relationships and trends,

identify outliers and influential observations, and

quickly describe or summarize data sets.

1-9 Exploratory Data Analysis - EDA

1 122355567 2 0111222346777899 3 012457 4 11257 5 0236 6 02

Example: Stem-and-Leaf Display

Construct a stem & leaf graph of the following data

11,12, 12, 13, 15, 15, 15,16,17,20,21,21,

21,22,22,22,23,24,26,27,27,27,28,29,29, 56

30,31,32,34,35,37,41,41,42,45,47,50,52,53,62

X X * o

Median Q1 Q3 Inner

Fence Inner

Fence

Outer

Fence

Outer

Fence

Interquartile Range

Smallest data

point not below

inner fence

Largest data point

not exceeding

inner fence

Suspected

outlier Outlier

Q1-3(IQR)

Q1-1.5(IQR) Q3+1.5(IQR)

Q3+3(IQR)

Elements of a Box Plot

Box Plot

Example: Box and Whisker Plots

Order numbers

3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7

• First, order your numbers from least to

greatest:

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

Median

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Then find the median (from the ordered list):

• Cross off one number from each side until you reach

the middle number (or numbers).

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

Median (continued):

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• If there are two numbers in the middle,

Add those 2 middle numbers together:

6 + 7 = 13

• Then divide by 2:

13 ÷ 2 = 6.5

• The median is 6.5.

Quartiles (page 1)

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Then split the numbers on left and right sides

of the median:

1, 2, 3, 4, 5, 6, 6, │7, 8, 9, 10, 11, 13, 14

Quartiles (page 2)

1, 2, 3, 4, 5, 6, 6, │7, 8, 9, 10, 11, 13, 14

• Find the median for each half:

1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14

1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14

Left Right

Median = 4 Median = 10

Quartiles (page 3)

1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14

Left Right

Median = 4 Median = 10

• The left median is called the LOWER

QUARTILE.

• The right median is called the UPPER

QUARTILE.

Number line

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Draw a number line from the smallest to the

largest number without skipping any numbers.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Quartiles on number line

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Put circles at the LOWER and UPPER

Quartiles.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Box on Quartiles on number line

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Draw a box connecting the circles at the

LOWER and UPPER Quartiles.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Median on number line

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Put a circle at the median (6.5).

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Median on number line

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Draw a line connecting the median to the box.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Low and high numbers

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Put circles at the high and low points.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Low and high numbers

1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14

• Draw lines that connect the high and low

points to the box.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Box and Whisker Plot

3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Here is the completed Box and Whisker Plot!

Example: Box Plot

Histogram

Histograms

Frequency Polygons & the Ogive

Two Frequency Polygons

Pie Chart

Bar Chart

Box Plot

Box Plot Compare Two Data Sets

Time Plot

Testing Normality

Check the normality of the following data

3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7

Table of normal scores

Questions?

lect01

Documents