statistical methods

156
1 QUANTITATIVE METHODS I Ashok K Mittal Department of IME IIT Kanpur

Upload: ics-of-india

Post on 15-Jul-2015

90 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical methods

1

QUANTITATIVE METHODS I

Ashok K Mittal Department of IME

IIT Kanpur

Page 2: Statistical methods

Statistics

Q1 I want to invest Rs 1000 , in which company should I invest

GrowthReturnsRisk

Page 3: Statistical methods

Statistics

• Q2 How do I know which Company will give me

• High Return

• Or

• Growth

• But Risk should be Low

Page 4: Statistical methods

Statistics

• Collect Information from the Past • Qualitative • Quantitative ( Data)• Analyze Information ( Data) to provide

patterns of past performance ( Descriptive Statistics)

• Project these patterns to answer your questions ( Inference)

Page 5: Statistical methods

Types of data

Primary: You do a survey to find out the percentage of people living below the poverty line in Allahabad

Secondary: You are interested in studying the performance of banks and for that you study the RBI published documents

Page 6: Statistical methods

Descriptive Statistics

Presentation of data Non frequency data Frequency data

Page 7: Statistical methods

Non frequency dataTime series representation of data

BSE(30) Close

0500

100015002000250030003500400045005000

3-Ja

n-94

5-Ja

n-94

7-Ja

n-94

11-J

an-9

4

13-J

an-9

4

17-J

an-9

4

19-J

an-9

4

24-J

an-9

4

27-J

an-9

4

31-J

an-9

4

2-Fe

b-94

4-Fe

b-94

8-Fe

b-94

10-F

eb-9

4

14-F

eb-9

4

16-F

eb-9

4

18-F

eb-9

4

22-F

eb-9

4

24-F

eb-9

4

28-F

eb-9

4

1-M

ar-9

4

3-M

ar-9

4

7-M

ar-9

4

9-M

ar-9

4

15-M

ar-9

4

17-M

ar-9

4

21-M

ar-9

4

23-M

ar-9

4

25-M

ar-9

4

29-M

ar-9

4

31-M

ar-9

4

Date

BSE(

30) C

lose

Page 8: Statistical methods

Non frequency data Spatial series representation of

dataFertiliser Consumption for few Indian states for 1999-2000 (in tonnes)

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

Andhr

a Pra

desh

Karna

taka

Keral

a

Tamil Nad

u

Gujar

at

Mad

hya P

rades

h

Mah

arash

tra

Raj

asth

an

Har

yana

Punjab

Utta

r Pra

desh

Bihar

Oris

sa

West

Beng

al

Assam

States

Fert

iliser

consum

ption (

in t

onnes)

Fertiliser Consumption

Page 9: Statistical methods

Frequency data:Tabular representation

India at a glance Year(% of GDP) 1983 1993 2002 2003Agriculture 36.6 31.0 22.7 22.2Industry 25.8 26.3 26.6 26.6Mfg 16.3 16.1 15.6 15.8Services 37.6 42.8 50.7 51.2Pvt Consump 71.8 37.4 65.0 64.9GOI consump 10.6 11.4 12.5 12.8Import 8.1 10.0 15.6 16.0Domes save 17.6 22.5 24.2 22.2Interests paid 0.4 1.3 0.7 18.3Note: 2003 refers to 2003-2004; data are preliminary. Gross domesticsavings figures are taken directly from India′s central statisticalorganization.

Page 10: Statistical methods

Frequency data Line diagram representation

Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes)

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

Andhr

a Pra

desh

Karna

taka

Keral

a

Tamil N

adu

Guja

rat

Mad

hya P

rades

h

Mah

arash

tra

Rajas

than

Harya

na

Punjab

Uttar P

rade

sh

Bihar

Oris

sa

West

Bengal

Assam

States

Fert

iliser

consum

ption (

in t

onnes)

Fertiliser Consumption

Page 11: Statistical methods

Frequency dataBar diagram (histogram)

representationWorld Population (projected mid 2004)

0

1000000000

2000000000

3000000000

4000000000

5000000000

6000000000

7000000000

1950 1960 1970 1980 1990 2000

Year

Pop

ulat

ion

World Population

Page 12: Statistical methods

Frequency dataBar diagram (histogram)

representationHeight and Weight of individuals

0

20

40

60

80

100

120

140

160

180

200

Ram Shyam Rahim Praveen Saikat Govind Alan

Individual

Hei

ght/

Wei

ght

Height (in cms) Weight (in kgs)

Page 13: Statistical methods

Frequency dataPie diagram/chart representation

Median marks in JMET (2003)

Verbal Quantitative Analytical Data Interpresentation

Page 14: Statistical methods

Frequency dataBox plot representation

The box plot is also called the box whisker plot. A box plot is a set of five summary measures of distribution of the data which are median lower quartile upper quartile smallest observation largest observation.

Page 15: Statistical methods

Frequency dataBox plot representation

MedianLQ UQ

Whisker

Page 16: Statistical methods

Frequency dataBox plot representation

Here: UQ – LQ = Inter quartile range

(IQR) X = Smallest observation within

certain percentage of LQ Y = Largest observation within

certain percentage of UQ

Page 17: Statistical methods

Important note

A cumulative frequency diagram is called the ogive. The abscissa of the point of intersection represents the median of the data.

Page 18: Statistical methods

ExampleFrequency

0

2

4

6

8

10

500

to 5

50

550

to 6

00

600

to 6

50

650

to 7

00

700

to 7

50

750

to 8

00

800

to 8

50

850

to 9

00

900

to 9

50

950

to 1

000

1000

to 1

050

1050

to 1

100

1100

to 1

150

1150

to 1

200

1200

to 1

250

1250

to 1

300

Class Interval

Fre

quen

cy

Frequency

Page 19: Statistical methods

Analysis

Page 20: Statistical methods

Measurements of uncertainty

Concept of uncertainty and different measures of uncertainty, Probability as a measure of uncertainty, Description of qualitative as well as quantitative probability, Assessment of probability, Concepts of Decision trees, Random Variables, Distributions, Expectations, Probability plots, etc.

Page 21: Statistical methods

Definitions

Quantitative variable: It can be described by a number for which arithmetic operations such as averaging make sense.Qualitative (or categorical) variable: It simply records a qualitative, e.g., good, bad, right, wrong, etc.

We know statistics deals with measurements, some being qualitative others being quantitative. The measurements are the actual numerical values of a variable. Qualitative variables could be described by numbers, although such a description might be arbitrary, e.g., good = 1, bad = 0, right = 1, wrong = 0, etc.

Page 22: Statistical methods

Scales of measurement

Nominal scale: In this scale numbers are used simply as labels for groups or classes. If we are dealing with a data set which consists of colours blue, red, green and yellow, then we can designate blue = 3, red = 4, green = 5 and yellow = 6. We can state that the numbers stand for the category to which a data point belongs. It must be remembered that nothing is sacrosanct regarding the numbering against each category. This scale is used for qualitative data rather than quantitative data.

Page 23: Statistical methods

Scales of measurement

Ordinal Scale: In this scale of measurement, data elements may be ordered according to relative size or quality. For example a customer or a buyer can rank a particular characteristics of a car as good, average, bad and while doing so he/she can assign some numeric value which may be as follows, characteristic good = 10, average = 5 and bad = 0.

Page 24: Statistical methods

Scales of measurement

Interval Scale: For the interval scale we specify intervals in a way so as to note a particular characteristic, which we are measuring and assign that item or data point under a particular interval depending on the data point. Consider we are measuring the age of school going students between classes 5 to 12 in the city of Kanpur. We may form intervals 10-12 years, 12-14 years,....., 18-20 years. Now when we have one data point, i.e., the age of a student we put that data under any one particular interval, e.g. if the student's age is 11 years, we immediately put that under the interval 10-12 years.

Page 25: Statistical methods

Scales of measurement

Ratio Scale: If two measurements are in ratio scale, then we can take ratios of measurements. The ratio scale represents the reading for each recorded data in a way which enables us to take a ratio of the readings in order to depict it either pictorially or in figures. Examples of ratio scale are measurements of weight, height, area, length etc.

Page 26: Statistical methods

Definitions: different measures1) Measure of central tendency

Mean (Arithmetic mean (AM), Geometric mean (GM), Harmonic mean (HM))

Median Mode

1) Measure of dispersion Variance or Standard deviation Skewness Kurtosis

Page 27: Statistical methods

Definition: Mean

Given N number of observations, x1, x2,….., xN we define the following

].....[1

21 NXXXN

AM +++=

NNXXXGM1

21 ]*.....**[=

1

21)]

1.....

11(

1[ −+++=

NXXXNHM

Page 28: Statistical methods

Definition: Median and Mode

Median(µe) : The median of a data set is the value below which lies half of the data points. To find the median we use F (µe) = 0.5.

Mode (µo): The mode of a data set is the value that occurs most frequently. Hence f(µo) ≥ f(x); ∀ x.

Page 29: Statistical methods

Definition: Variance, Standard deviation, Skewness, Kurtosis

Variance: V[X] =

Standard deviation (SD) = σ

Skewness =

Kurtosis =

( )33

2

3

2

31 σ

µ

µ

µβγ ===

( )[ ]22 XEXE −=σ

−=−= 33

44

22 σµβγ

Page 30: Statistical methods

Example

Consider we have the following data points:

5, 7, 10, 7, 10, 11, 3, 5, 5

For these data points we have

µ = 7; µe = 10; µo = 5; σ2 = 6.89

Page 31: Statistical methods

Descriptive statistics

Suppose the data are available in the form of a frequency distribution. Assume there are k classes and the mid-points of the corresponding class intervals being x1, x2,…., xk. While the corresponding frequencies are f1, f2,….., fk, such that n = f1+f2+…..+fk

Then:

∑=

=k

iii fxn 1

1µ ∑=

−=k

iii fxx

n 1

212 )(

Page 32: Statistical methods

Consider m groups of observations with respective means µ1, µ2,….., µm and standard deviations σ1, σ2,….., σm. Let the group sizes be n1, n2,….., nm such that n = n1, n2,….., nm.

Then:

Descriptive statistics

∑=

=m

iiiOVERALL f

n 1

1 µµ

21

1

2

1

2 ])(1[ ∑∑

==−+=

m

iOVERALLii

m

iiiOVERALL nn

nµµσσ

Page 33: Statistical methods

Probability

Page 34: Statistical methods

Random event

Random experiment: Is an experiment whose outcome cannot be predicted with certainty.

Sample space (Ω) : The set of all possible outcomes of a random experiment

Sample point (ωi): The elements of the sample space

Event (A): Is a subset of the sample space such that it is a collection of sample point(s).

Page 35: Statistical methods

Probability

Probability (P(A)): Of an event is defined as a quantitative measure of uncertainty of the occurrence of the event

Objective probability: Based on game of chance and which can be mathematically proved or verified . If the experiment is the same for two different persons, then the value of objective probability would remain the same. It is the limiting definition of relative frequency. Example: be probability of getting the number 5 when we roll a fair die.

Subjective probability: Based on personal judgment, intuition and subjective criteria. Its value will change from person to person. Example one person sees the chance of India winning the test series with Australia high while the other person sees it to be low.

Page 36: Statistical methods

Random event

For a random experiment, we denote

P(ωi) = pi

Where: P(ωi) = pi = Probability of occurrence of the sample

point ωi

P(A) = Probability of occurrence of the event

∑∈

=Ai

i

pAPϖ

)(

1)( ==Ω ∑Ω∈iipP

ϖ

Page 37: Statistical methods

Example 1

Suppose there are two dice each with faces 1, 2,....., 6 and they are rolled simultaneously. This rolling of the two dice would constitute our random experimentThen we have:

Ω = (1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1),.….., (5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6).

ωi = (1,1), (1,2),…., (6,5), (6,6) We define the event is such that the outcomes for

each die are equal in one simultaneous throw, then A = (1, 1), (2, 2),….., (6, 6)

P(ωi): p1 = p2 = ….. = p36 = 1/36 P(A) = p1 + p8 + p15 + p22 + p29 + p36 = 6/36

Page 38: Statistical methods

Example 2

Suppose a coin is tossed repeatedly till the first head is obtained.Then we have:

Ω = (H), (T,H), (T,T,H),……… ωi = (H), (T,H), (T,T,H),….. We define the event such that at most 3 tosses

are needed to obtain the first head, then A = (H), (T,H), (T,T,H)

P(ωi): p1 = ½, p2 = (½)2, p3 = (½)3, p4 = (½)4,..… P(A) = p1 + p2 + p3 = 7/8

Page 39: Statistical methods

Example 3

In a club there 10 members of whom 5 are Asians and the rest are Americans. A committee of 3 members has to be formed and these members are to be chosen randomly. Find the probability that there will be at least 1 Asian and at least 1 American in the committeeTotal number of cases = 10C2 and the number of cases favouring the formation of the committee is 5C2*5C1 + 5C1*5C2

Hence P(A) = 100/120

Page 40: Statistical methods

Example 4

Suppose we continue with example 2 which we have just discussed and we define the event B , that al least 5 tosses are needed to produce the first head

Ω = (H), (T,H), (T,T,H),………

ωi = (H), (T,H), (T,T,H),…..

P(ωi): p1 = ½, p2 = (½)2, p3 = (½)3, p4 = (½)4,..…

P(B) = p5+p6+p7+ ….. = 1 – (p1+p2+p3+p4)

Page 41: Statistical methods

Theorem in probability

For any event A, B ∈ Ω 0 ≤ P(A) ≤ 1 If A ⊂ B, then P(A) ≤ P(B) P(A U B) = P(A) + P(B) – P(A ∩ B) P(AC) = 1 – P(A) P(Ω) = 1 P(φ) = 0

Page 42: Statistical methods

Definitions

Mutually exclusive: Consider n events A1, A2,….., An. They are mutually exclusive if no two of them can occur together, i.e., P(Ai ∩ Aj) = 0. ∀ i, j (i ≠ j) ∈ n

Mutually exhaustive: Consider n events A1, A2,….., An. They are mutually exhaustive if at least one of them must occur and P(A1UA2U…..UAn) = 1

Page 43: Statistical methods

Example 5

Suppose a fair die with faces 1, 2,….., 6 is rolled. Then Ω = 1, 2, 3, 4, 5, 6. Let us define the events A1 = 1, 2, A2 = 3, 4, 5, 6 and A3 = 3, 5

The events A2 and A3 are neither mutually exclusive nor exhaustive

A1 and A3 are mutually exclusive but not exhaustive

A1, A2 and A3 are not mutually exclusive but are exhaustive

A1 and A2 are mutually exclusive and exhaustive

Page 44: Statistical methods

Conditional probability

Let A and B be two events such that P(B) > 0. Then the conditional probability of A given B is

Assume Ω = 1, 2, 3, 4, 5, 6, A = 2, B = 2, 4, 6. Then A ∩ B = 2 and

)(

)()|(

BP

BAPBAP

∩=

3

1

63

61)|( ==BAP 1

61

61)|( ==ABP

Page 45: Statistical methods

Baye′s Theorem

Let B1, B2,….., Bn be mutually exclusive and exhaustive events such that P(B i) > 0, for every i =1, 2,…., n and A be any event such that then we have∑

==n

iii BPBAPAP

1)()|()(

∑=

=n

iii

jjj

BPBAP

BPBAPABP

1)()|(

)()|()|(

Page 46: Statistical methods

Independence of events

Two events A and B are called independent if P(A∩B) = P(A)*P(B)

Page 47: Statistical methods

Distribution

Page 48: Statistical methods

Distribution

Depending what are the outcomes of an experiment a random variable (r.v) is used to denote the outcome of the experiment and we usually denote the r.v using X, Y or Z and the corresponding probability distribution is denoted by f(x), f(y) or f(z)

Discrete: probability mass function (pmf) Continuous: probability density function

(pdf)

Page 49: Statistical methods

Discrete distribution

1) Uniform discrete distribution

2) Binomial distribution

3) Negative binomial distribution

4) Geometric distribution

5) Hypergeometric distribution

6) Poisson distribution

7) Log distribution

Page 50: Statistical methods

Bernoulli Trails

1) Each trial has two possible outcomes, say a success and a failure.

2) The trials are independent

3) The probability of success remains the same and so does the probability of failure from one trial to another

Page 51: Statistical methods

Uniform discrete distribution[X ~ UD (a , b) ]

f(x) = 1/n x = a, a+k, a+2k,….., b a and b are the parameters where a, b ∈

R E[X] =

V[X] = Example: Generating the random

numbers 1, 2, 3,…, 10. Hence X~UD(1,10) where a=1, k=1, b=10. Hence n=10.

)1(2

−+ nk

a

)12

1(

22 −nk

Page 52: Statistical methods

Uniform discrete distribution

Uniform discrete distribution

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x

f(x)

Page 53: Statistical methods

Binomial distribution[X ~ B (p , n)]

f(x) = nCxpxqn-x x = 0, 1, 2,…..,n

n and p are the parameters where p ∈ [0, 1] and n ∈ Z+

E[X] = np V[X] = npq Example: Consider you are checking the quality of the

product coming out of the shop floor. A product can either pass (with probability p = 0.8) or fail (with probability q = 0.2) and for checking you take such 50 products (n = 50). Then if X is the random variable denoting the number of success in these 50 inspections, we have

X~50Cx(0.8)x(0.2)50-x

Page 54: Statistical methods

Binomial distributionBinomial distribution

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 2 4 6 8 10 12 14 16 18 20 22 24 30 32 34 36 38 40 42 44 46 48 50

x

f(x)

Page 55: Statistical methods

Negative binomial distribution[X ~ NB (p , r)]

f(x) = r+x-1Cr-1prqx x = r, r+1,.….. p and r are the parameters where p ∈ [0, 1] and r ∈ Z+

E[X] = rq/p V[X] = rq/p2

Example: Consider the example above where you are still inspecting items from the production line. But now you are interested in finding the probability distribution of the number of failures preceding the 5th success of getting the right product. Then, we have, considering p=0.8, q=0.2

X ~ 5+x-1C5-1(0.8)5(0.2)x

Page 56: Statistical methods

Negative binomial distribution

Negative binomial distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

x

f(x)

Page 57: Statistical methods

Geometric distribution[X ~ G (p)]

f(x) = pqx x = 0,1,2,….. p is the parameter where p ∈ [0, 1] E[X] = q/p (r = 1 in the Negative Binomial distribution

case) V[X] = q/p2 (r = 1 in the Negative Binomial distribution

case) Example: Consider the example above. But now you

are interested in finding the probability distribution of the number of failures preceding the 1st success of getting the right product. Then, we have considering p=0.8, q=0.2

X~ (0.8)(0.2)x

Page 58: Statistical methods

Geometric distribution

Geometric distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

x

f(x)

Page 59: Statistical methods

Hypergeometric distribution[X ~ HG (N, n, p)]

f(x) = NpCxNqCn-x/NCn 0 ≤ x ≤ Np and 0 ≤ (n – x) ≤ Nq

N, n and p are the parameters E[X] = np V[X] = npq(N – n)/(N – 1) Example: Consider the example above. But now you are interested in

finding the probability distribution of the number of failures(success) of getting the wrong(right) product when we choose n number of products for inspection out of the total population N. If the population is 100 and we choose 10 out of those, then the probability distribution of getting the right product, denoted by X is given by

X~85Cx15C10-x/100C10

Remember p (0.85) and q (0.15) are the proportions of getting a good item and

bad item respectively. In this distribution we are considering the choosing is done without

replacement

Page 60: Statistical methods

Hypergeometric distributionHypergeometric distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1 2 3 4 5 6 7 8 9 10

x

f(x)

f(x)

Page 61: Statistical methods

Poisson distribution[X ~ P (λ)]

f(x) = e-λλx/x! x = 0,1,2,….. λ is the parameter where λ > 0 E[X] = λ V[X] = λ Example: Consider the arrival of the number of

customers at the bank teller counter. If we are interested in finding the probability distribution of the number of customers arriving at the counter in specific intervals of time and we know that the average number of customers arriving is 5, then, we have

X~ e-55x/x!

Page 62: Statistical methods

Poisson distributionPoisson distribution

0

0.02

0.04

0.06

0.080.1

0.12

0.14

0.16

0.18

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x

f(x)

Page 63: Statistical methods

Log distribution[X ~ L (p)]

f(x) = -(loge p)-1x-1(1 – p)x x = 1,2,3,…..

p is the parameter where p ∈ (0, 1)

E[X] = -(1-p)/(plogep)

V[X] = -(1-p)[1 + (1 - p)/logep]/(p2logep)

Example1) Emission of gases from engines against fuel type

2) Used to represent the distribution of the number of items of a product purchased by a buyer in a specified period of time

Page 64: Statistical methods

Log distribution

Log distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x

f(x)

Page 65: Statistical methods

Continuous distribution

1) Uniform distribution

2) Normal distribution

3) Exponential distribution

4) Chi-Square distribution

5) Gamma distribution

6) Beta distribution

7) Cauchy distribution

Page 66: Statistical methods

Continuous distribution

8) t-distribution

9) F-distribution

10)Log-normal distribution

11)Weibull distribution

12)Double exponential distribution

13)Pareto distribution

14)Logistic distribution

Page 67: Statistical methods

Uniform distribution[X ~ U (a, b)]

f(x) = 1/(b – a) a ≤ x ≤ b a and b are the parameters where a, b ∈

R and a < b E[X] = (a+b)/2 V[X] = (b-a)2/12 Example: Choosing any number between

1 and 10, both inclusive, from the real line

Page 68: Statistical methods

Uniform distribution

Uniform distribution

0

0.02

0.04

0.06

0.08

0.1

0.12

1 10

x

f(x)

Page 69: Statistical methods

Normal distribution[X ~ N ( µ, σ2)]

-∞ < x < ∞

µX, σ2X are the parameters where µX ∈ R and σ2

X > 0

E[X] = µX

V[X] = σ2X

Example: Consider the average age of a student between class VII and VIII selected at random from all the schools in the city of Kanpur

2

2

2

)(

2

1)( X

Xx

Xexf σ

µ

σπ

−−=

Page 70: Statistical methods

Normal distributionNormal distribution

0

0.05

0.1

0.15

0.2

0.25

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

x

f(x)

Page 71: Statistical methods

Log-normal distribution[X ~ LN ( µ, σ2)]

0 < x < ∞

µX, σ2X are the parameters where µX ∈ R

and σ2X > 0

E[X] = exp(µX+σ2X/2)

V[X] = exp(2µX+σ2X)exp(σ2

X)-1

Example: Stock prices return distribution

2

2

2

)(log1

2

1)( X

Xe x

Xex

xf σµ

σπ

−−=

Page 72: Statistical methods

Log-normal distributionLog-normal distribution

0

0.002

0.004

0.006

0.008

0.01

0.012

0.5 3 5.5 8 10.5 13 15.5 18 20.5 23 25.5 28 30.5 33 35.5 38

x

f(x)

Page 73: Statistical methods

Relationship between Poisson and Exponential distribution

If a process has the intervals between successive events as independent and identical and it is exponentially distributed then the number of events in a specified time interval will be a Poisson distribution

Page 74: Statistical methods

Normal distribution results

Normal distribution

0

0.05

0.1

0.15

0.2

0.25

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

x

f(x)

a b

Page 75: Statistical methods

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .090.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.03590.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.07530.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.11410.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.15170.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.18790.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.22240.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.25490.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.28520.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.31330.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.33891.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.36211.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.38301.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.40151.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.41771.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.43191.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.44411.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.45451.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.46331.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.47061.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.47672.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.48172.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.48572.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.48902.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.49162.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.49362.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.49522.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.49642.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.49742.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.49812.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.49863.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

543210-1-2-3-4-5

0.4

0.3

0.2

0.1

0.0

Z

f( z)

Standard Normal Distribution

1.56

Standard Normal Probabilities

Look in row labeled 1.5 and column labeled .06 to findP(0 ≤ z ≤ 1.56) = .4406

Finding Probabilities using Standard Normal Distribution: P(0 < Z < 1.56)

Page 76: Statistical methods

Standard Normal distribution

Z ~ N(0,1) given by the equation

The area within an interval (a,b) is given by

which is not integrable

algebraically. The Taylor’s expansion of the above assists in speeding up the calculation, which is

2

2

2

1)(

z

ezf−

dzebZaFb

a

z

∫−

=≤≤ 2

2

2

1)(

π

∑∞

=

+

+−+=≤

0

12

!2)12(

)1(

2

1

2

1)(

kk

kk

kk

zzZF

π

Page 77: Statistical methods

Cumulative distribution function (cdf) or the distribution function

We denote the distribution function by F(x)

∑≤

=≤=xx

ii

xfxXPxF )()()(

∫∫∞−∞−

==≤=xx

xdFdxxfxXPxF )()()()(

Page 78: Statistical methods

Properties of distribution function

1) F(x) is non-decreasing in x, i.e., if x1 ≤ x2, then F(x1) ≤ F(x2)

2) Lt F(x) = 0 as x → - ∞3) Lt F(x) = 1 as x → + ∞4) F(x) is right continuous

Page 79: Statistical methods

Standard normal distribution

Putting Z=(X-µX)/σX in the normal distribution we have the standard normal distribution

Where: µZ = 0 and σZ = 1

Remember• F(x) = P(X ≤ x) = F(z) = F(Z ≤ z)

• f(z) = φ(z)

2

2

2

1)(

z

ezf−

)()()()()( zxdFdxzfzZPzFzz

Φ===≤= ∫∫∞−∞−

Page 80: Statistical methods

Standard normal distribution

Standard Normal distribution

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

z

f(z)

Page 81: Statistical methods

Exponential distribution[ X ~ E (a, θ)]

a < x < ∞ a and θ are the parameters where a ∈ R

and θ > 0 E[X] = a + θ V[X] = θ2

`Example: The life distribution of the number of hours a electric bulb survives.

θθ

)(1

)(

ax

exf

−−=

Page 82: Statistical methods

Exponential distributionExponential distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x

f(x)

Page 83: Statistical methods

Normal CDF Plot

Normal F(x)

0

0.2

0.4

0.6

0.8

1

1.2

1.8

3

2.4

6

2.7

4

2.9

9

3.1

2

3.2

5

3.3

6

3.4

5

3.5

1

3.5

6

3.7

4

3.8

3.9

3.9

8

4.0

8

4.2

8

4.4

8

4.7

8

5.1

6

X

Norm

al F

(x)

Page 84: Statistical methods

Exponential CDF Plot

Exponential F(x)

0

0.0020.004

0.006

0.0080.01

0.012

0.0140.016

0.018

1.8

3

2.4

6

2.7

4

2.9

9

3.1

2

3.2

5

3.3

6

3.4

5

3.5

1

3.5

6

3.7

4

3.8

3.9

3.9

8

4.0

8

4.2

8

4.4

8

4.7

8

5.1

6

X

F(X

)

Page 85: Statistical methods

Arrival time problem # 1

In a factory shop floor for a certain CNC machine (machine marked # 1) the number of jobs arriving per unit time are given below# of Arrivals Frequency0 21 42 43 14 25 16 47 68 49 1

Page 86: Statistical methods

Arrival time problem # 1

Frequency distribution of arrivals

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7 8 9

# of Arrivals

Fre

quen

cy

Page 87: Statistical methods

Arrival time problem # 1

Relative Frequency of Number of Arrivals

0

0.05

0.1

0.15

0.2

0.250 1 2 3 4 5 6 7 8 9

# of Arrivals

Rel

ativ

e F

requ

ency

Page 88: Statistical methods

Arrival time problem # 1Cumulative Relative Frequency of Number of Arrivals

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10

# of Arrivals

Cum

ulat

ive

Rel

ativ

e F

requ

ency

Page 89: Statistical methods

Arrival time problem # 1

1) The probability of number of arrivals of jobs being equal to or more than 7 is about 0.18.

2) The average number of arrival of jobs is 5.

3) The probability of number of arrivals of jobs being less than or equal to 4 is about 0.45.

Page 90: Statistical methods

Covariance

Page 91: Statistical methods

SAT problem

This data set [ ] includes eight variables:1) STATE: Name of state2) COST: Current expenditure per pupil (measured in thousands of

dollars per averagedaily attendance in public elementary and secondary schools)

3) RATIO: Average pupil/teacher ratio in public elementary and secondary schools during Fall 1994

4) SALARY: Estimated average annual salary of teachers in public elementary and secondary schools during 1994-95 (in thousands of dollars)

5) PERCENT: percentage of all eligible students taking the SAT in 1994-95

6) VERBAL: Average verbal SAT score in 1994-957) MATH: Average math SAT score in 1994-958) TOTAL: Average total score on the SAT in 1994-95

Page 92: Statistical methods

SAT problem

Histogram for Cost

0

2

4

6

8

10

12

St at eCOST

Page 93: Statistical methods

SAT problem

Histograms for Cost and Ratio

0

5

10

15

20

25

30

Ala

bam

a

Ark

ansas

Conn

ect

icut

Geor

gia

Illin

ois

Kan

sas

Main

e

Mic

hig

an

Mis

souri

Neva

da

New

Mexi

co

Nor

th D

akot

a

Ore

gon

Sou

th C

aro

lina

Texas

Virgi

nia

Wis

cons

in

State

Cost

and

Rat

io

COST RATIO

Page 94: Statistical methods

SAT problem

Histogram of Cost, Ratio and Salary

0

10

20

30

40

50

60

Alab

ama

Ariz

ona

Cal

iforn

ia

Con

nect

icut

Flo

rida

Haw

aii

Illin

ois

Iow

a

Ken

tuck

y

Mai

ne

Mas

sach

uset

ts

Min

neso

ta

Mis

sour

i

Neb

rask

a

New

New

Mex

ico

Nor

th C

arol

ina

Ohi

o

Ore

gon

Rho

de Is

lan

Sou

th D

akot

a

Tex

as

Ver

mon

t

Was

hing

ton

Wis

cons

in

State

Cos

t, R

atio

and

Sal

ary

COST RATIO SALARY

Page 95: Statistical methods

SAT problem

Average value is given by

Variance is given by

Covariance is given by

( ) ∑=

=n

iiXn

XE1

1

( ) ( ) ∑=

−=n

ii XEX

nXV

1

21

( ) ( ) ( ) [ ] ( ) ( )YVXVYEYXEXEYXCov YX ,, ρ=−−=

Page 96: Statistical methods

SAT problem

COST RATIO SALARY PERCENT VERBAL MATH TOTAL

Mean 5.90526 16.858 34.82892 35.24 457.14 508.78 965.92

Median 5.7675 16.6 33.2875 28 448 497.5 945.5

Maximum 9.774 24.3 50.045 81 516 592 1107

Minimum 3.656 13.8 25.994 4 401 443 844

Standard Deviation(1) 1.362807 2.266355 5.941265 26.76242 35.17595 40.20473 74.82056

Standard Deviation(2) 1.34911 2.243577 5.881552 26.49344 34.82241 39.80065 74.06857

Page 97: Statistical methods

SAT problem

COST RATIO SALARY PERCENT VERBAL MATH TOTAL

COST 1.820097 -1.12303 6.901753 21.18202 -19.2638 -18.7619 -38.0258

RATIO -1.12303 5.033636 -0.01512 -12.6639 4.98188 8.52076 13.50264

SALARY 6.901753 -0.01512 34.59266 96.10822 -97.6868 -93.9432 -191.63

PERCENT 21.18202 -12.6639 96.10822 701.9024 -824.094 -916.727 -1740.82

VERBAL -19.2638 4.98188 -97.6868 -824.094 1212.6 1344.731 2557.331

MATHS -18.7619 8.52076 -93.9432 -916.727 1344.731 1584.092 2928.822

TOTAL -38.0258 13.50264 -191.63 -1740.82 2557.331 2928.822 5486.154

Page 98: Statistical methods

SAT problem

COST RATIO SALARY PERCENT VERBAL MATH TOTAL

COST 1 -0.37103 0.869802 0.592627 -0.41005 -0.34941 -0.38054

RATIO -0.37103 1 -0.00115 -0.21305 0.063767 0.095422 0.081254

SALARY 0.869802 -0.00115 1 0.61678 -0.47696 -0.40131 -0.43988

PERCENT 0.592627 -0.21305 0.61678 1 -0.89326 -0.86938 -0.88712

VERBAL -0.41005 0.063767 -0.47696 -0.89326 1 0.970256 0.991503

MATHS -0.34941 0.095422 -0.40131 -0.86938 0.970256 1 0.993502

TOTAL -0.38054 0.081254 -0.43988 -0.88712 0.991503 0.993502 1

Page 99: Statistical methods

Inference

1) Point estimation

2) Interval estimation

3) Hypothesis testing

Page 100: Statistical methods

Sampling

• Population: N–Population distribution–Parameter (θ)

• Sample: n–Sampling distribution

–Statistic (tn)

Page 101: Statistical methods

Types of sampling

• Probability Sampling– Simple Random Sampling– Stratified Random Sampling– Cluster Sampling– Multistage Sampling– Systematic Sampling

• Judgement Sampling– Quota Sampling– Purposive Sampling

Page 102: Statistical methods

Simple Random Sampling

A simple random sampling of size (n) from a finite population (N) is a sample selected such that each possible sample of size n has he same probability of being selected. This would be akin to SRSWOR

Page 103: Statistical methods

Simple Random Sampling

A simple random sampling of size (n) from an infinite population (N) is a sample selected such that the following conditions hold

Each element selected comes from the population.

Each element is selected independently

This would be akin to SRSWR

Page 104: Statistical methods

Some Special Distribution Used

in Inference

Page 105: Statistical methods

Chi-square distribution

Suppose Z1, Z2,….., Zn are ′n′ independent observations from N(0, 1), then

Z21 + Z2

2 + Z23 +….. + Z2

n ~ χ2n

Page 106: Statistical methods

Chi-square distribution

0 ≤ x <∞

n is the parameter (degree of freedom) where n ∈ Z+

E[X] = n V[X] = 2n

)2(1)

2(

22)2(

1)(

xn

nex

nxf

−−

Γ

=

]~[ 2nX χ

Page 107: Statistical methods

Chi-square distribution

Page 108: Statistical methods

t-distribution

Suppose Z ~ N(0, 1), Y ~ χ2n and they are

independent, then

nt

n

Y

Z~

Page 109: Statistical methods

t-distribution[X ~ tn]

• n is the parameter where k ∈ Z+ • E[X] = 0 (n > 1)

• V[X] = n/(n – 2), (n > 2)

]1[)2(

]2

1[

)(2

n

xn

n

n

xf +Γ

+Γ=

π

Page 110: Statistical methods

t-distribution

Page 111: Statistical methods

F-distribution

Suppose X ~ χ2n, Y ~ χ2

m and they are

independent, then

mnF

m

Yn

X

,~

Page 112: Statistical methods

F-distribution[X ~ Fn,m]

0 < x < ∞

n, m are the parameter (degrees of freedom) where n, m ∈ Z+

E[X] = m/(m - 2), (m > 2) V[X] = 2m2(n + m -2)/[n(m – 2)2(m – 4)], (m > 4)

2

1)2(

22

))(2()

2(

)2

()(

mn

nmn

mnxmn

xmn

mnxf +

+ΓΓ

+Γ=

Page 113: Statistical methods

F-distribution

Page 114: Statistical methods

Some results

If X1, X2,….., Xn are ′n′ observations from X ~ N(µX, σ2

X) and

then n

n Xn

XXX =++ .....21

)1,0(~ N

n

X

X

Xnσ

µ−

Page 115: Statistical methods

Some results

If and

then

212

2, ~

)1(−

−n

X

XnSnχ

σ

∑=

−−

=n

jnjXn XX

nS

1

22,

)1(

1∑=

−=n

jXjXn X

nS

1

22/,

1 µ

22

2/, ~ nX

XnnS χσ

1,

~)(

−−

nXn

Xn tS

Xn µ

Page 116: Statistical methods

Some results

If X1, X2,….., Xn are ′mX′ observations from X ~ N(µX, σ2

X) and Y1, Y2,….., Ym are ′mY′ observations from Y ~ N(µY, σ2

Y) and more over these samples are independent then

1,12

2

2

2

2

2

2

2

~))((

)1(

)1(

)1(

)1(

−−=

−−

YX mmY

X

X

Y

YY

YY

XX

XX

FS

S

m

Sm

m

Sm

σσ

σ

σ

Page 117: Statistical methods

Estimation

Page 118: Statistical methods

Estimators and their properties

Estimator: Any statistic (a random function) which is used to estimate the population parameter

Unbiasedness

Eθ (tn) = θ Consistency

P[|tn - θ| < ε] = 1 as n → ∞

Page 119: Statistical methods

Estimators (Discrete distribution)

1) X ~ UD (a , b) then and

2) X ~ P (λ), then

3) X ~ B (n, p), then

),...,min(ˆ 1 nXXa =

),...,max(ˆ1 nXXb =

nX=λ

n

favouringp

#ˆ =

Page 120: Statistical methods

Estimators (Continuous distribution)

1) X ~ N(µ, σ2), then

2) X ~ N(µ, σ2) and if µ is known then

3) X ~ N(µ, σ2) and if µ is unknown then

4) X ~ E(θ), then

∑=

−=n

iiXn 1

22 1

ˆ µσ

nX=µ

nX=θ

∑=

−−

=n

ini XX

n 1

22 )1(

Page 121: Statistical methods

Examples (Estimation)

Number of jobs arriving in a unit time for a CNC machine Consider we choose from a discrete distribution whose population distribution [X ~ UD (20, 35)] is not know. We select the values from the distribution and the numbers sampled are 22, 34, 33, 21, 29, 29, 30. Then the best estimate for a, i.e., and the best estimate for b, i.e.,

21ˆ =a34ˆ =b

Page 122: Statistical methods

Examples (Estimation)

You are testing the components coming out of the shop floor and find that 9 out of 30 components fail. Then the estimated value of p (proportion of bad items in the population) = 9/30.

Page 123: Statistical methods

Examples (Estimation)

At a particular teller machine in one of the bank branches it was found that the number of customers arriving in an unit time span were 4, 6, 7, 4, 3, 5, 6, and 5. Then for this Poisson process the estimated value of λ is 5.

Page 124: Statistical methods

Examples (Estimation)

Suppose it is known that the survival time of a particular type of bulb has the exponential distribution. You test 10 such bulbs and find their respective survival times as 150, 225, 275, 300, 95, 155, 325, 75, 20 and 400 hours respectively. Then the estimated value of θ = 202.

Page 125: Statistical methods

Prediction

Page 126: Statistical methods

Multiple Linear Regression

Given ′k′ independent variables X1, X2,….., Xk and one dependent variable Y we predict the value of Y given by or y using the values of Xi ′s. We need ′n′ ( n ≥ k+1) data points and the multiple linear regression (MLR) equation is as follows:

Yj = β1x1,j + β2x2,j +…..+ βkxk,j + εj

∀ j = 1, 2,….., n

Y

Page 127: Statistical methods

Multiple Linear Regression

Note

There is no randomness in measuring X i

The relationship is linear and not non-linear. By non-linear we mean that at least one derivative of Y wrt βi′s is a function of at least one of the parameters. By parameters we mean the βi′s.

Page 128: Statistical methods

Multiple Linear Regression

Assumptions for the MLR Xi, Y are normally distributed Xi are all non-stochastic εj ~ N(0,σ2I) rank(X) = k n ≥ k No dependence between the X j′s, i.e., the rank

of the matrix X is E(εjεl) = 0 ∀ i, j = 1, 2,….., n Cov(Xi,εj) = 0 ∀ i ≠ j, i, j = 1, 2,….., n

Page 129: Statistical methods

Multiple Linear Regression

Find β1, β2,….., βk using the concept of minimizing the sum of square of errors. This is also known as least square method or method of ordinary least square. The estimates found are the estimates of β1, β2,….,βk respectively.

Utilize these estimates to find the forecasted value of Y (i.e., or y) and compare those with actual values of Y obtained in future.

kβββ ˆ,.....,ˆ,ˆ 21

Y

Page 130: Statistical methods

To check for normality of data

We need to check for the normality of Xi′s and Y1) List the observation number in the column # 1,call it i.2) List the data in column # 2.3) Sort the data from the smallest to the largest and place in

column # 3.4) For each ith of the n observations, calculate the

corresponding tail area of the standard normal distribution (Z) as follows, A = (i – 0.375)/(n + 0.25). Put the values in column # 4.

5) Use NORMSINV(A) function in MS-EXCEL to produce a column of normal scores. Put these values in column # 5.

6) Make a copy of the sorted data (be sure to use paste special and paste only the values) in column # 6.

7) Make a scatter plot of the data in columns # 5 and # 6.

Page 131: Statistical methods

To check for normality of dataChecking normality of data

0

50

100

150

200

250

300

350

400

450

-2.5 -1.6 -1.2 -1.0 -0.8 -0.6 -0.5 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.5 0.6 0.8 1.0 1.2 1.6 3.0

Normal Score

Dat

a

Page 132: Statistical methods

Basic transformation for MLR

Some transformation to convert to MLR X → X(p) = Xp – 1/p, as p → 0 then X(p) =

logeX

If the variability in Y increases with increasing values of Y, then we use

logeY = β1logeX1 + β2logeX2 +….. + βklogeXk + ε

Page 133: Statistical methods

Simple linear regression

In the simple linear regression we have

Yj = α + βXj + εj ∀ j = 1,2,…..,n

The question is how do we find α and β, provided we have ′n′ number of observations which constitutes the sample.

We minimize the sum of square of the error wrt to α and β

Finally:

)(ˆˆ)( XEYE βα+=

∑=

+−=∆n

jjj XY

1

2)ˆˆ( βα

)()(ˆ)(ˆ)()(),cov( 2XEXVXEYEXEYX ++=+ βα

Page 134: Statistical methods

Simple linear regression

After we have found out the estimators of α and β, we use these values to predict/forecast the subsequent future values of Y, i.e., we find out y and compare those y′s with the corresponding values of Y. Thus we find

and compare them with corresponding values of Yk , for k = n+1, n+2,…….

kk Xy βα ˆˆ +=

Page 135: Statistical methods

Simple linear regression

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

-6.69 -6.39 -6.07 -6.68 -6.38 -6.07 -5.75 -5.13 -4.81 -4.10 -3.66 -3.26

X

Y, Y

-hat

, err

or

Y Y-hat error

Page 136: Statistical methods

Non-linear regression

• y = (β + γX)/(1 + αX)• y = α(X - β)γ

• y = α - βloge(X + γ)

• y = α - βloge(X + γ)

• y = α[1 – exp(-βX)]γ

NOTE: For all these and other models we minimize the sum of squares and find the parameters α, β and γ.

Page 137: Statistical methods

Simple Moving Average3MA = 3 month Moving Average5MA = 5 month Moving Average

Month Actual 3MA 5MA

Jan 266.00

Feb 145.90 198.33

Mar 183.10 149.43 178.92

Apr 119.30 160.90 159.42

May 180.30 156.03 176.60

Jun 168.50 193.53 184.88

Jul 231.80 208.27 199.58

Aug 224.50 216.37 188.10

Sep 192.80 180.07 221.70

Oct 122.90 217.40 212.52

Nov 336.50 215.10 206.48

Dec 185.90 238.90 197.82

Jan 194.30 176.57 215.26

Feb 149.50 184.63 202.62

Mar 210.10 210.97 203.72

Apr 273.30 224.93 222.26

May 191.40 250.57 237.56

Jun 287.00 234.80 256.26

Month Actual 3MA 5MA

Jul 226.00 272.20 259.58

Aug 303.60 273.17 305.62

Sep 289.90 338.37 301.12

Oct 421.60 325.33 324.38

Nov 264.50 342.80 331.60

Dec 342.30 315.50 361.70

Jan 339.7 374.13 340.56

Feb 440.4 365.33 375.52

Mar 315.9 398.53 387.32

Apr 439.3 385.50 406.86

May 401.3 426.00 433.88

Jun 437.4 471.40 452.22

Jul 575.5 473.50 500.76

Aug 407.6 555.03 515.56

Sep 682 521.63 544.34

Oct 475.3 579.53 558.62

Nov 581.3 567.83

Dec 646.9

Page 138: Statistical methods

Simple Moving AveragesActual, 3MA, 5MA

Simple Moving Averages (Yellow is actual; Blue is 3MA; Red is 5MA)

0

100

200

300

400

500

600

700

800

Jan

Mar

May Ju

lSep Nov Ja

n M

arM

ay Jul

Sep Nov Jan

Mar

May Ju

lSep Nov

Month

Va

lue

Page 139: Statistical methods

Centred Moving Average4MA(1) = 4 month Moving Average, 4MA(2) = 4 month Moving Average

2X4MA=Averages of 2MA(1) and 4MA(2)

Mth Actual 4MA(1) 4MA(2) 2X4MA

Jan 266.00

Feb 145.90

Mar 183.10 178.58 157.15 167.86

Apr 119.30 157.15 162.80 159.98

May 180.30 162.80 174.98 168.89

Jun 168.50 174.98 201.28 188.13

Jul 231.80 201.28 204.40 202.84

Aug 224.50 204.40 193.00 198.70

Sep 192.80 193.00 219.18 206.09

Oct 122.90 219.18 209.53 214.35

Nov 336.50 209.53 209.90 209.71

Dec 185.90 209.90 216.55 213.23

Jan 194.30 216.55 184.95 200.75

Feb 149.50 184.95 206.80 195.88

Mar 210.10 206.80 206.08 206.44

Apr 273.30 206.08 240.45 223.26

May 191.40 240.45 244.43 242.44

Jun 287.00 244.43 252.00 248.21

Mth Actual 4MA(1) 4MA(2) 2X4MA

Jul 226.00 252.00 276.63 264.31

Aug 303.60 276.63 310.28 293.45

Sep 289.90 310.28 319.90 315.09

Oct 421.60 319.90 329.58 324.74

Nov 264.50 329.58 342.03 335.80

Dec 342.30 342.03 346.73 344.38

Jan 339.7 346.73 359.58 353.15

Feb 440.4 359.58 383.83 371.70

Mar 315.9 383.83 399.23 391.53

Apr 439.3 399.23 398.48 398.85

May 401.3 398.48 463.38 430.93

Jun 437.4 463.38 455.45 459.41

Jul 575.5 455.45 525.63 490.54

Aug 407.6 525.63 535.10 530.36

Sep 682 535.10 536.55 535.83

Oct 475.3 536.55 596.38 566.46

Nov 581.3 596.38

Dec 646.9

Page 140: Statistical methods

Centred Moving AverageActual, 4MA(1), 4MA(2), 2X4MA

Centred Moving Average (Yellow is Actual; Red is 4MA(1); Blue is 4MA(2); Green is 2X4MA)

0

100

200

300

400

500

600

700

800

Jan

Mar

May Jul

Sep Nov Jan

Mar

May Jul

Sep Nov Jan

Mar

May Jul

Sep Nov

Month

Val

ues

Page 141: Statistical methods

Centred Moving Average

Note:

4MA(1)=(Y1+Y2+Y3+Y4)/4

4MA(2)=(Y2+Y3+Y4+Y5)/4

2X4MA=(Y1+2*Y2+2*Y3+2*Y4+Y5)/8

Similarly we can have 2X6MA, 2X8MA, 2X12MA etc.

Page 142: Statistical methods

Centred Moving Average

Note:

3MA(1)=(Y1+Y2+Y3)/3

3MA(2)=(Y2+Y3+Y4)/3

3MA(3)=(Y3+Y4+Y5)/3

3X3MA=(Y1+2*Y2+3*Y3+2*Y4+Y5)/9

Page 143: Statistical methods

Weighted Moving Averages

In general a weighted k-point moving average can be written as

Note: The total of the weights is equal to 1

Weights are symmetric, i.e., aj = a-j

∑−=

+=m

mjjtjt YaT

Page 144: Statistical methods

Weighted Moving Averages

Steps are:1) 4MA(1)=(Y1+Y2+Y3+Y4)/4

2) 4MA(2)=(Y2+Y3+Y4+Y5)/4

3) 4MA(3)=(Y3+Y4+Y5+Y6)/4

4) 4MA(4)=(Y4+Y5+Y6+Y7)/4

5) 4X4MA=(Y1+2*Y2+3*Y3+4*Y4+3*Y5+2*Y6+Y7)/16

6) 5X4X4MA = a-2*4X4MA(1) + a-1*4X4MA(2) + a0*4X4MA(3) + a1*4X4MA(4) + a2*4X4MA(5)

where a-2 = -3/4, a-1 = 3/4, a0 = 1, a1 = 3/4, a2 = -3/4

Page 145: Statistical methods

Exponential Smoothing Methods

1) Single Exponential Smoothing (one parameter, adaptive parameter)

2) Holt′s linear method (suitable for trends)

3) Holt-Winter′s method (suitable for trends and seasonality)

Page 146: Statistical methods

Single Exponential Smoothing

The general equation is:

Ft+1 = Ft + α(Yt – Ft) = αYt + (1 - α)Ft,

Note: Error term: Et = Yt - Ft

Forecast value: Ft

Actual value: Yt

Weight: α ∈ (0,1) α is such that sum of square of errors is minimized

Page 147: Statistical methods

Single Exponential Smoothing

Month Y(t) F(t, 0.1) F(t.0.5) F(t,0.9)Jan 200.0 200.0 200.0 200.0Feb 135.0 200.0 200.0 200.0Mar 195.0 193.5 167.5 141.5Apr 197.5 193.7 181.3 189.7May 310.0 194.0 189.4 196.7Jun 175.0 205.6 249.7 298.7Jul 155.0 202.6 212.3 187.4Aug 130.0 197.8 183.7 158.2Sep 220.0 191.0 156.8 132.8Oct 277.5 193.9 188.4 211.3Nov 235.0 202.3 233.0 270.9Dec ------- 205.6 234.0 238.6

Page 148: Statistical methods

Single Exponential SmoothingSingle Exponential Smoothing

0

50

100

150

200

250

300

350

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Month

Val

ues

Y(t) F(t,0.1) F(t,0.5) F(t,0.9)

Page 149: Statistical methods

Extension of Exponential Smoothing

The general equation is:

Ft+1 = α1Yt + α2Ft + α3Ft-1

Note: Error term: Et = Yt – Ft

Forecast value: Ft

Actual value: Yt

Weights: αi ∈ (0,1) ∀ i = 1, 2 and 3 α1 + α2 + α3 = 1 αi′s are such that the sum of square of errors is

minimized

Page 150: Statistical methods

Extension of Exponential Smoothing

Extension of Exponential Smoothing

0

50

100

150

200

250

300

350

400

450

Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul

Month

Val

ues

Y(t) F(t)

Page 151: Statistical methods

Adaptive Exponential Smoothing

The general equation is:Ft+1 = αtYt + (1 - αt)Ft

Note: Error term: Et = Yt – Ft

Forecast value: Ft

Actual value: Yt

Smoothed Error: At = βEt + (1 - β)At-1

Absolute Smoothed Error: Mt = β|Et| + (1 - β)Mt-1

Weight: αt+1 = |At/Mt| α and β are such that sum of square of errors is

minimized

Page 152: Statistical methods

Adaptive Exponential Smoothing

Starting values:

F2 = Y1

α2 = β = 0.2

A1 = M1 = 0

Page 153: Statistical methods

Adaptive Exponential Smoothing

Month Y(t) F(t) E(t) A(t) M(t) α βJan 200.0 0.0 0.0 0.2Feb 135.0 200.0 -65.0 -13.0 13.0 0.2Mar 195.0 187.0 8.0 -8.8 12.0 1.0Apr 197.5 188.6 8.9 -5.3 11.4 0.7May 310.0 190.4 119.6 19.7 33.0 0.5Jun 175.0 214.3 -39.3 7.9 34.3 0.6Jul 155.0 206.4 -51.4 -4.0 37.7 0.2Aug 130.0 196.2 -66.2 -16.4 43.4 0.1Sep 220.0 182.9 37.1 -5.7 42.1 0.4Oct 277.5 190.3 87.2 12.9 51.1 0.1Nov 235.0 207.8 27.2 15.7 46.4 0.3Dec 213.2 0.3

Page 154: Statistical methods

Adaptive Exponential SmoothingAdaptive Exponential Smoothing

0

50

100

150

200

250

300

350

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Month

Val

ues

Y(t) F(t)

Page 155: Statistical methods

Adaptive Exponential Smoothing

Adaptive Exponential Smoothing

-100.00

-50.00

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Time

Val

ues

A(t) M(t) E(t) F(t) Y(t)

Page 156: Statistical methods

Holt-Winter′s Method The general equations are:1) Lt = αYt/St-s + (1-α)(Lt-1 + bt-1)2) bt = β(Lt – Lt-1) + (1 - β)bt-1

3) St = γYt/Lt + (1 - γ )St-s for t > s4) Ft+m = (Lt + btm)St-s+m

5) Si = Yi/Ls where Ls = (Y1+…+Ys)/s for i ≤ s

Note: Forecast value: Ft

Actual value: Yt

Trend: bt

Seasonal component: St

Length of seasonality: s α, β and γ are chosen such that the sum of square of errors is minimized