virtual university of pakistan lecture no. 5 statistics and probability by miss saleha naghmi...

65
Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Upload: dominick-griffith

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Virtual University of Pakistan

Lecture No. 5 Statistics and Probability

by

Miss Saleha Naghmi Habibullah

Page 2: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

IN THE LAST LECTURE,

YOU LEARNT:•Frequency distribution of a continuous variable

•Relative frequency distribution and percentage frequency distribution

•Histogram

•Frequency Polygon

•Frequency Curve

Today’s lecture is in continuation with the last lecture, and today we will begin with various types of frequency curves that are encountered in practice.

Also, we will discuss the cumulative frequency distribution and cumulative frequency polygon for a continuous variable.

Page 3: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

FREQUENCY POLYGON

A frequency polygon is obtained by plotting the class frequencies against the mid-points of the classes, and connecting the points so obtained by straight line segments.

In our example of the EPA mileage ratings, the classes were:

ClassBoundaries

Mid-Point(X)

Frequency(f)

26.95 – 29.95 28.4529.95 – 32.95 31.45 232.95 – 35.95 34.45 435.95 – 38.95 37.45 1438.95 – 41.95 40.45 841.95 – 44.95 43.45 244.95 – 47.95 46.45

Page 4: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

02468

10121416

28.4

531

.45

34.4

537

.45

40.4

543

.45

46.4

5

Miles per gallon

Nu

mb

er

of

Car

s

X

Y

Page 5: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Also, it was mentioned that, when the frequency polygon is smoothed, we obtain what may be called the FREQUENCY CURVE.

02468

10121416

Miles per gallon

Nu

mb

er

of

Ca

rs

X

Y

In the above figure, the dotted line represents the frequency curve.It should be noted that it is not necessary that our frequency curve must touch all the points.

Page 6: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

The purpose of the frequency curve is simply to display the overall pattern of the distribution.

Hence we draw the curve by the free-hand method, and hence it does not have to touch all the plotted points.

It should be realized that the frequency curve is actually a theoretical concept.

If the class interval of a histogram is made very small, and the number of classes is very large, the rectangles of the histogram will be narrow as shown below:

Page 7: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah
Page 8: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

The smaller the class interval and the larger the number of classes,

the narrower the rectangles will be. In this way, the histogram

approaches a smooth curve as shown below:

Page 9: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

VARIOUS TYPES OFFREQUENCY CURVES

the symmetrical frequency curvethe moderately skewed frequency curvethe extremely skewed frequency curvethe U-shaped frequency curve

Page 10: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

THE SYMMETRIC CURVE

Page 11: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

X

f

If we place a vertical mirror in the

centre of this graph, the left hand side will

be the mirror image of the right hand side.

Page 12: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

X

f

THE POSITIVELY SKEWED CURVE

Page 13: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

X

f

THE NEGATIVELY SKEWED CURVEOn the other hand, the negatively skewed

frequency curve is the one for which the left tail is

longer than the right tail.

Page 14: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

X

f

THE EXTREMELY NEGATIVELY SKEWED

(J-SHAPED) CURVEThis is the case when the maximum frequency occurs at the end of the frequency table.

Page 15: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

For example, if we think of the death rates of adult

males of various age groups starting from age 20 and

going up to age 79 years, we might obtain something like

this:

Age GroupNo. of deathsper thousand

20 – 29 2.130 – 39 4.340 – 49 5.750 – 59 8.960 – 69 12.470 – 79 16.7

Page 16: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

X

f

THE EXTREMELY POSITIVELY SKEWED (REVERSE J-SHAPED) CURVE

This will result in a J-shaped distribution similar to the one shown above. Similarly, the extremely positively skewed distribution is known as the REVERSE J-shaped distribution.

Page 17: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Example

The following are the no. of 6’s obtained in 60 rolls of 4 dice:

00100020010000110120010001101001210031100001210011

Construct a frequency distribution and line chart, and discuss the overall shape of the distribution.

Page 18: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Solution

Applying the tally method, we obtain the following frequency distribution:

Page 19: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Frequency distribution

No. of 6’s No. of 6’s TallyTally frequencyfrequency

00|||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||||| 2828

11 |||| |||| |||| |||||| |||| |||| || 1717

22 |||||||| 44

33 || 11

TotalTotal 5050

Page 20: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Line Chart

X3210

30

20

10

0

f

Page 21: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Eventually, This is an extremely positively skewed distribution ---

Which may also be regarded as reverse j-shaped distribution.

Page 22: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

In this example, Since X is discrete variable, hence, actually we should not draw a continuous curve in this diagram. We have done so here only to indicate the overall shape of the distribution.

Page 23: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Do the above frequency distribution indicate that dice that were rolled were unfair?

Page 24: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

X

f

THE U-SHAPED CURVE

A relatively LESS frequently encountered

frequency distribution is the U-shaped distribution.

Page 25: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

If we consider the example of the death rates not for only the adult population but for the population of ALL the age groups, we will obtain the U-shaped distribution.Out of all these curves, the MOST frequently

encountered frequency distribution is the moderately skewed

frequency distribution. There are thousands of natural and social phenomena which yield the moderately skewed frequency distribution.

Page 26: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Another rather less frequently encountered distribution is the uniform distribution.

Page 27: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Example

Suppose that a fair die is rolled 120 times and the following frequency distribution is obtained:

Page 28: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Frequency distributionNo. of dots on the upper-most face

Xf

1 19

2 22

3 20

4 21

5 19

6 19

Total 120

Page 29: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Line chart

X4321

30

20

10

0

f

5 6

Page 30: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

The point to be noted is that, Since the die was absolutely fair, hence the every side of the die had equal chance of coming on the top.

As such, Out of 120 tosses, we could have expected to obtain X= 1 20 times, X= 2 20 times, X= 3 20 times and so on.

Page 31: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Whenever we are dealing with “an equally likely” situation of the type described in this example, we encounter the uniform distribution.

Page 32: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Suppose that we walk into a school and collect data of the weights, heights, marks, shoulder-lengths, finger-lengths or any other such variable pertaining to the children of any one class. If we construct a frequency distribution of this data, and draw its histogram and its frequency curve, we will find that our data will generate a moderately skewed distribution. Until now, we have discussed the various possible shapes of the frequency distribution of a continuous variable.

Similar shapes are possible for the frequency distribution of a discrete variable.

Page 33: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

I. Positively Skewed Distribution

0 1 2 3 4 5 6 7 8 9 10

X

VARIOUS TYPES OF DISCRETE FREQUENCY DISTRIBUTION

Page 34: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

II. Negatively Skewed Distribution

0 1 2 3 4 5 6 7 8 9 10X

Page 35: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

III. Symmetric Distribution

0 1 2 3 4 5 6 7 8 9 10X

Page 36: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Let us now consider another aspect of the frequency distribution i.e. the CUMULATIVE

FREQUENCY DISTRIBUTION. As in the case of the frequency distribution of a discrete variable, if we start adding the frequencies of our frequency table column-wise, we obtain the column of cumulative frequencies.

Page 37: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

ClassBoundaries

FrequencyCumulativeFrequency

29.95 – 32.95 2 232.95 – 35.95 4 2+4 = 635.95 – 38.95 14 6+14 = 2038.95 – 41.95 8 20+8 = 2841.95 – 44.95 2 28+2 = 30

30

CUMULATIVE FREQUENCY DISTRIBUTION

Page 38: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

In the above table, 2+4 gives 6, 6+14 gives 20,

and so on. The question arises: “What is the purpose of

making this column?” You will recall that, when

we were discussing the frequency

distribution of a discrete variable, any

particular cumulative frequency meant that

we were counting the number of

observations starting from the very first

value of X and going upto THAT particular

value of X against which that particular

cumulative frequency was falling.

Page 39: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

In case of a the distribution of a continuous variable, each of these cumulative frequencies represents the total frequency of a frequency distribution from the lower class boundary of the lowest class to the UPPER class boundary of THAT class whose cumulative frequency we are considering. In the above table, the total number of cars showing mileage less than 35.95 miles per gallon is 6, the total number of car showing mileage less than 41.95 miles per gallon is 28, etc.

Page 40: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

ClassBoundaries

FrequencyCumulativeFrequency

29.95 – 32.95 2 232.95 – 35.95 4 2+4 = 635.95 – 38.95 14 6+14 = 2038.95 – 41.95 8 20+8 = 2841.95 – 44.95 2 28+2 = 30

30

CUMULATIVE FREQUENCY DISTRIBUTION

Page 41: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Such a cumulative frequency distribution is called a “less than” type of a cumulative frequency distribution. The graph of a cumulative frequency distribution is called a CUMULATIVE FREQUENCY POLYGON or OGIVE. A “less than” type ogive is obtained by marking off the upper class boundaries of the various classes along the X-axis and the cumulative frequencies along the y-axis, as shown below:

Page 42: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

0

5

10

15

20

25

30

Upper Class Boundaries

cf

Page 43: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

0

5

10

15

20

25

30

35

29.95

32.95

35.95

38.95

41.95

44.95

Cumulative Frequency Polygon or OGIVEThe cumulative frequencies are plotted on

the graph paper against the upper class boundaries, and the points so obtained are joined by means of straight line segments. Hence we obtain the cumulative frequency polygon shown below:

Page 44: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

ClassBoundaries

FrequencyCumulativeFrequency

26.95 – 29.95 0 029.95 – 32.95 2 0+2 = 232.95 – 35.95 4 2+4 = 635.95 – 38.95 14 6+14 = 2038.95 – 41.95 8 20+8 = 2841.95 – 44.95 2 28+2 = 30

30

CUMULATIVE FREQUENCY DISTRIBUTION

It should be noted that this graph is touching the X-Axis on the left-hand side. This is achieved by ADDING a class having zero frequency in the beginning of our frequency distribution, as shown below:

Page 45: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Since the frequency of the first class is zero, hence the cumulative frequency of the first class will also be zero, and hence, automatically, the cumulative frequency polygon will touch the X-Axis from the left hand side.If we want our cumulative frequency polygon to be closed from the right-hand side also , we can achieve this by connecting the last point on our graph paper with the X-axis by means of a vertical line, as shown below:

Page 46: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

0

5

10

15

20

25

30

35

29.95

32.95

35.95

38.95

41.95

44.95

OGIVE

Page 47: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Example

Let us consolidate these ideas with the help of the example of the ages of the managers of child-care centers that we discussed in the last lecture.

The following table contains the ages of 50 managers of child-care centers in five cities of a developed country

Page 48: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Ages of a sample of managers of Urban child-care centers

42 26 32 34 57

30 58 37 50 30

53 40 30 47 49

50 40 32 31 40

52 28 23 35 25

30 36 32 26 50

55 30 58 64 52

49 33 43 46 32

61 31 30 40 60

74 37 29 43 54

Convert this data into Frequency Distribution.

Page 49: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Frequency Distribution of Child-Care Managers Age

Class Interval Frequency

20 – 29 6

30 – 39 18

40 – 49 11

50 – 59 11

60 – 69 3

70 – 79 1

Total 50

Construct the cumulative frequency distribution.

Page 50: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Cumulative Frequency

The cumulative frequency is the running total of the frequencies through the total.

The cumulative frequency for each class interval is the frequency for that class interval added to the preceding cumulative total.

Page 51: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Cumulative frequencies of child-Cumulative frequencies of child-care datacare data

Class Interval

Frequency Cumulative frequency

20 – 29 6 6

30 – 39 18 24

40 – 49 11 35

50 – 59 11 46

60 – 69 3 49

70 – 79 1 50

Total 50

Page 52: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Interpretation

24 of the 50 managers (i.e. 48% of the managers) are 39 years of age or less. (i.e. less than 40 years old.)

46 of 50 managers (i.e. 92% of the managers) are 59 years of age or less. (i.e. less than 60 years old.) and so on.

Page 53: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Cumulative frequency polygon or Ogive

0

10

20

30

40

50

60

19.5

29.5

39.5

49.5

59.5

69.5

79.5

Page 54: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Real-life applications

The concept of cumulative frequency is used in many ways including,

Sales cumulated over fiscal year. Sports scores during a contest.

(cumulated points) Years of service. Points earned in a course. Costs of doing business over a period

of time.

Page 55: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

EXAMPLE:

For a sample of 40 pizza products, the following data represent cost of a slice in dollars (SCost).

PRODUCT Scost

Pizza Hut Hand Tossed 1.51Domino’s Deep Dish 1.53Pizza Hut Pan Pizza 1.51Domino’s Hand Tossed 1.90Little Caesars Pan! Pizza! 1.23

Continued …...

Page 56: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Continued …...

PRODUCT SCost

Boboli crust with Boboli sauce 1.00Jack’s Super Cheese 0.69Pappalo’s Three Cheese 0.75Tombstone Original Extra Cheese 0.81Master Choice Gourmet Four Cheese 0.90Celeste Pizza For One 0.92Totino’s Party 0.64The New Weight Watchers Extra Cheese 1.54Jeno’s Crisp’N Tasty 0.72Stouffer’s French Bread 2-Cheese 1.15

Page 57: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Continued …...

PRODUCT SCost

Ellio’s 9-slice 0.52Kroger 0.72Healthy Choice French Bread 1.50Lean Cuisine French Bread 1.49DiGiorno Rising Crust 0.87Tombstone Special Order 0.81Pappalo’s 0.73Jack’s New More Cheese! 0.64Tombstone Original 0.77Red Baron Premium 0.80

Page 58: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

PRODUCT Scost

Tony’s Italian Style Pastry Cruse 0.83Red Baron Deep Dish Singles 1.13Totino’s Party 0.62The New Weight Watchers 1.52Jeno’s Crisp’N Tasty 0.71Stouffer’s French Bread 1.14Celeste Pizza For One 1.11Tombstone For One French Bread 1.11Healthy Choice French Bread 1.46Lean Cuisine French Bread 1.71

Continued …...

Page 59: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

PRODUCT Scost

Little Caesars Pizza! Pizza! 1.28Pizza Hut Stuffed Crust 1.23DiGiorno Rising Crust Four Cheese 0.90Tombstone Speical Order Four Cheese 0.85Red Baron Premium 4-Cheese 0.80

Example taken from “Business Statistics – A First Course” by Mark L. Berenson & David M. Levine (International Edition), Prentice-Hall International, Inc., Copyright © 1998.

Source: “Pizza,” Copyright 1997 by Consumers Union of United States, Inc., Yonkers, N.Y. 10703.

Page 60: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

In order to construct the frequency distribution of the above data, the first thing to note is that, in this example, all our data values are correct to two decimal places. As such, we should construct the class limits correct to TWO decimal places, and the class boundaries correct to three decimal places. As in the last example, first of all, let us find the maximum and the minimum values in our data, and compute the RANGE.

Minimum value X0 = 0.52Maximum value Xm = 1.90

Hence: Range = 1.90 - 0.52

= 1.38

Page 61: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Desired number of classes = 8

Class interval h ~= 1.38 / 8 = 0.1725 ~ 0.20

ClassesofNumber

Range

Lower limit of the first class = 0.51Hence, our successive class limits come out

to be: Class Limits0.51 – 0.700.71 – 0.900.91 – 1.101.11 – 1.301.31 – 1.501.51 – 1.701.71 – 1.90

Page 62: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

Class

Limits

Class

Boundaries

0.51 – 0.70 0.505 – 0.705

0.71 – 0.90 0.705 – 0.905

0.91 – 1.10 0.905 – 1.105

1.11 – 1.30 1.105 – 1.305

1.31 – 1.50 1.305 – 1.505

1.51 – 1.70 1.505 – 1.705

1.71 – 1.90 1.705 – 1.905

Page 63: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

By tallying the data-values in the appropriate classes, we will obtain a frequency distribution similar to the one that we obtained in the examples of the EPA mileage ratings.

By constructing the histogram of this data-set, we will be able to decide whether our distribution is symmetric, positively skewed or negatively skewed.

Page 64: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

IN TODAY’S LECTURE, YOU LEARNT

•Frequency Distribution of a continuous variable•Relative frequency distribution•Percentage frequency distribution•Histogram•Frequency polygon •Frequency curve

Page 65: Virtual University of Pakistan Lecture No. 5 Statistics and Probability by Miss Saleha Naghmi Habibullah

IN THE NEXT LECTURE, YOU WILL LEARN

•Stem and leaf plot

•Dot plot

•The Concept of Central Tendency