© 2003 all rights reserved, robi polikar, rowan university, dept. of electrical and computer...

13
© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation EC 2 Polikar

Post on 20-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Lecture 8

Engineering StatisticsPart II: Estimation

EC2Polikar

Page 2: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Reviewof Basic Concepts of Statistics

Statistics is used to make generalized decisions about a population, by analyzing only a small set of sample from the population.

Parameter vs. statistic Important statistical quantities: Mean, median, mode, standard deviation,

variance

M

ii

Mx x

MM

xxx

1

21 1

N

ii

N xNN

xxxx

1

21 1

sample) (small

1 sample) (large

2/1

12

2/1

12

N

xxs

N

xxs

N

i iN

i i

11

22

N

xxs

N

i i

11

22

M

xM

i i

Population variance Sample variance

Sample standard deviation

Population mean Sample mean

Page 3: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

Normal (Gaussian) Distribution Function

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.5 1 1.5 2

distribution variable, x

dist

ribut

ion

func

tion,

no

rma

lize

d

2

4

6

inflection point marks the standard deviation,

value of x at the peak is the mean

68.2%

95.4%99.7%

Statistical Distributions

2

21

2

1)(

x

exf

95.4% 99.7%

+--3 -2 +2 +3

Page 4: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

The Gaussian Curve

Area from - x +68.2 % of the total area (x1=- ; x2=)

Area from -2 x +295.4% of the total area (x1=-2 ; x2=2)

Area from -3 x +399.7 % of the total area (x1=-3 ; x2=3

Distribution Function

Area under the curve

2

2

1

2

1)(

x

exf

2

1

)(x

x

dxxfA

The analytical computation of the area under the Gaussian curve is difficult. Therefore, standardized tables generated for this particular purpose are used. The standardization assumes a mean of zero and variance of 1.

Page 5: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Using Gaussian Tables

x

zNormalization to use standard tables: Area under the curve on each side of zero is 0.5. The curve is symmetric, so the total area is 1

Example: if z=0.82 Area under the curve for [0 0.82] : 0.294Total area for [-∞ 0.82]=0.5+0.294=0.794This value is the probability that z<0.82

Page 6: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Example

The chip manufacturing company Lentil ® produces its much anticipated chip Pantsium© XIX running at 66.666 THz. However, the rival company DAM©

manufactures its chip Craplon© 66++, also running at 66.666 THz. However, DAM claims that Lentil’s chip is flawed, and cannot run any faster than 63 THz. Lentil, which manufactures 100,000 chips everyday, decides to test its chips. They take a sample of 1% (1000 chips). They find that the mean speed of these chips is 65.980 THz with a std. dev. of 1.2 THz. Assuming that the chip speed is normally distributed, is Lentil’s speed claim justifiable?

Assume that the claim is justifiable, if 95% of the chips lie in the speed limits of 65 to 67 THz.

Now assume that the claim is justifiable, if 90% of the chips run faster than 65.0 THz.

82.02.1

980.6565

x

z

85.02.1

980.6567

x

z

-0.82 +0.85

0.3020.294 The probability that a Lentilchip has a speed in the [65 –67] THz is 0.294+0.302=0.596. Thus only 59.6% of the chips satisfy the criterion.

82.02.1

980.650.65

x

zThe probability that a Lentil chip has a speed larger than 65THz is 0.294+0.5=0.794. That means, roughly 80% of the chips satisfy the criterion. In any case, however, Lentil does better than DAM’s claim of 63 THz. What % of Lentil chips run over 63THz? (Ans. 99.3%)

Page 7: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Estimation Theory& Confidence Intervals

Point estimate vs. interval estimate Bulb wattage: 60 W vs. 60 ± 5W 55W ~ 65 W Part length: 5.28cm vs. 5.28 ± 0.03 cm 5.25 ~ 5.31 cm. Flight time: 11 hrs vs. 11 h ± 15 min 10 h 45 min ~ 11

h 15 min. Scientific polls: 59% will vote for XYZ (margin of error

4%) How confident can we be about such interval estimates?

Are we 75% sure? 90% sure? …95% sure? What does it mean to be 95% sure?

Confidence level: The percentage of confidence Confidence interval: The interval in which we have certain confidence

that a value lies.

Page 8: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Confidence Intervals

Recall: For normal distribution, the mean of a statistic lies within one, two or three sigma intervals, 68.27%. 95.45% and 99.73% of the time, respectively.

Example: Let’s assume that the average height at Rowan is 176 inches, with a standard deviation of 5 inches 68.27% of Rowan students are 176 ± 5 in 171 ~ 181 in 95.45% of Rowan students are 176 ± 2x5in 166 ~186 in 99.73% of Rowan students are 176 ± 3x5 in 161 ~ 191

in Thus, we are 95.45% sure that Rowan students are 166~186 in. Note that these numbers are true for variables that are Normally

distributed. In most practical scenarios, the statistic of a sample size greater than 30 is usually normally distributed!

Page 9: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

How to Compute Confidence Intervals

If the statistic is the sample mean, then the confidence limits (end points of the interval) are given by

Nzx c

Sample mean

Critical value obtained from normal distribution tables basedon the desired confidence

Population* std. dev.

Sample size

Confidence Level (%) 99.73 99 98 96 95.45 95 90 80 68.27 50

Critical Value zc3.00 2.58 2.33 2.05 2.00 1.96 1.645 1.28 1.00 0.675

1

M

NM

Nzx c

Use Eq. (2) for finite populations of size M, and use Eq. (1) for infinite (very large) populations.

(1)(2)

* Since population std. dev. is usually unknown, it is estimated by sample std. dev.

Page 10: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

How To…

Ex: 98% confidence means we have to be sure that thevalue we estimate must be within the specified limits 98% of the time. Thus the area under the curve on both sides of themean must be 0.98. Since the curve is symmetric, 0.49 on one sideof the curve. The zc value corresponding to 0.49 is 2.33.For 93% confidence zc=1.81

Page 11: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Example

Measurements of the diameters of a random sample of 200 ball bearings made by a certain machine has a mean of 0.824 in and a std. dev. of 0.042 in. What are the 95% and 99% confidence limits for the mean diameter of the ball bearings? 95% confidence limit Half the area under the curve = 0.475 zc=1.96.

Confidence limits are therefore 0.824 ± zc * /√N = 0.0824 ± 1.96 * 0.042 √200 = 0.0824 ± 0.0058 in.

99% confidence limit Half the area under the curve = 0.495 zc=2.58. Confidence limits are therefore 0.824 ± zc * /√N = 0.0824 ± 2.58 * 0.042 √200 = 0.0824 ± 0.0077 in.

Note 1: Note that we will use the sample std. dev. as an estimate of the population std. dev.

Note 2: Our confidence interval of 0.0116 is narrower for 95% confidence, than the 0.0154 for the 99% confidence. This makes sense, because the interval in which the true value takes place becomes larger as we demand a higher confidence.

Page 12: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

For Populations

If the statistic to be estimated is a proportion of “successes”, then the confidence limits for p the proportion of success (the probability of success) is

N

ppzP c

)1(

1

)1(

M

NM

N

ppzP c

For infinite (very large) samples sizes For a sample size of M>30

P is the sample probability of success , and p is the population probability of success. We will use the sample estimate P for the population estimate p in our calculations.

Page 13: © 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering Lecture 8 Engineering Statistics Part II: Estimation

© 2003 All Rights Reserved, Robi Polikar, Rowan University, Dept. of Electrical and Computer Engineering

Example

In an exit poll, a news network asks 300 people (from a state of 9M) for whom they voted, and 55% says they have voted for XYZ. Can the network claim the candidate XYZ the winner with a 95% confidence? For 95% confidence, the confidence interval is = 0.55 ±

0.056 0.494 ~ 0.606

This means that the network at best, can be 95 % confidence that the actual vote the candidate received is between 0.494 and 0.606. In other words, if 55% of 300 people said they voted for XYZ, than there is a 95% probability (or we can be 95% sure) that the actual vote the candidate received will lie between 49.4% and 60.6%. Since at least 50% is required to win the election, the network cannot claim XYZ as the winner.

The natural question to ask is then, how many people to they need to ask that they can claim XYZ’s success with 95% confidence? Assuming again that 55% of N people ( N is now unknown) said they voted for XYZ, and considering that XYZ needs at least 50% of the votes:

N

ppP

196.1

5.0)1(

96.155.0

N

pp N>380. Thus if 55% of 380 people say they voted for XYZ, then the confidence interval will be 60.0~50.005.055.0

380

45.055.096.155.0

380

)1(96.155.0

pp