Download - Chapter 2 · Disadvantage Tally charts become uncomfortably long if the range of possible values is very large. For example, the individual scores below was gathered from a low

Chapter 2Methods of Presenting Data

Given below are the scores made in their final round by the 30 leading golfers in the Scottish Open golf championship:

62 65 63 65 70 68 65 67 67 69

70 70 70 67 68 68 66 69 74 67

68 69 69 69 71 71 72 69 72 68

Tally Chart

The tally chart is constructed

on a single ‘pass’ through the

data.

Five-Bar Gates

Counting the tallies is made easy by

using the ‘five-bar gates’. Here, for

each score a vertical stroke is entered

on the appropriate row, with a

diagonal stroke being used to

complete each group of five strokes.

Frequency

Frequency of the outcome is

the tally count for each

outcome. In above example,

the frequency of the outcome

65 was 3.

Frequency Distribution

Frequency Distribution is the

set of outcomes with their

corresponding frequencies,

which can be displayed in a

frequency table.

Frequency Table

Final Round Score 62 63 64 65 66 67 68 69 70 71 72 73 74

Number of Golfers 1 1 0 3 1 4 5 6 4 2 2 0 1

Disadvantage

Tally charts become uncomfortably long

if the range of possible values is very

large. For example, the individual scores

below was gathered from a low-scoring

Sunday league cricket match:

22 58 12 17 4

7 26 10 13 1

39 0 1 10 6

0 11 14 1 0

Stem-and-leaf Diagram

In stem-and-leaf diagram, the stem

represents the most significant digit

(i.e. the ‘tens’) and the leaves are less

significant digits (the ‘units’).

Split Stem

The stem-and-leaf diagram

is sometimes presented with

split stems for finer details.

In Split Stems…

Here, the units between 0 and 4

(inclusive) are separated from the

units between 5 and 9 (inclusive). It is

now particularly easy to see that most

players scored less than 15 and that

the highest score of 58 was a long way

clear of the rest.

Note:

Stem-and-leaf diagrams can be used both

with discrete data and with continuous

data (treating the latter as though it were

discrete). They are much easier to

understand when the stem involves a power

of ten, but other units may be employed if

the stem would otherwise be too long or

too short. It is often wise to provide an

explanation (a Key) with the diagram.

Example 2: The internal phone numbers of a random

selection of individuals from a large organization

are given below:

3315 3301 2205 2865 2608 2886

2527 3144 2154 2645 3703 2610

2768 3699 2345 2160 2603 2054

2302 2997 3794 3053 3001 2247

3402 2744 3040 2459 3699 3008

3062 2887 2215 2213 3310 2508

2530 2987 3699 3298 2021 3323

2329 2845 2247 3196 3412 2021

Summarize these numbers using a stem-and-leaf diagram.

Let’s Practice: The masses (in g) of a random

sample of 20 sweets were as follows:

1.13 0.72 0.91 1.44 1.03

1.39 0.88 0.99 0.73 0.91

0.98 1.21 0.79 1.14 1.19

1.08 0.94 1.06 1.11 1.01

Summarize these results using a stem-and-leaf diagram.

Bar Charts

A bar chart or bar graph is a chart that

presents grouped data with

rectangular bars with lengths

proportional to the values that they

represent.

William Playfair in 18th century

Note:

Bar charts are easier to read if the

width of the bars is different from

the width of the gaps between the

bars.

It is not necessary to show the

origin of the graph.

Example: A car salesman is interested in

the color preferences of his costumers.

For one type of car his records are as

follows:

BLUE WHITE RED OTHERS

12 23 16 18

Represent these figures using a vertical bar chart.

Bar Charts

0

5

10

15

20

25

Blue White Red Others

Bar Chart of Sales of Cars of Different Colors

Color Preference

Multiple Bar Charts

When data occur naturally in groups andthe aim is to contrast the variations withindifferent groups, a multiple bar chart maybe used.

This consists of groups of two or moreadjacent bars separated from the nextgroup by a gap having, ideally, a differentwidth to the bars themselves.

Example: The following data, taken from the

Monthly Bulletin of Statistics published by the

United Nations, show the 1970 and 1988

estimated populations (in millions) for five

countries.

France Mexico Nigeria Pakistan UK

1970 50 51 57 56 55

1988 55 82 104 105 57

Illustrate the data using a multiple bar chart.

Multiple Bar Graphs

0 20 40 60 80 100 120

France

UK

Mexico

Nigeria

Pakistan

Populations of five countries in 1970 and in 1988 (figures are in millions

1988 1970

Example 2: The Registrar office of La Salle College Antipolo

reveals the following figures concerning the students

population in the years 2011, 2012, 2013, 2014

Courses 2011 2012 2013 2014

Accountancy 231 289 346 388

Business

Administration

451 487 521 589

Education 98 103 110 123

Communication

Arts

245 230 280 210

Psychology 150 156 144 132

HRM 230 256 304 410

Illustrate the data using a multiple bar chart

Multiple Bar Chart

0

100

200

300

400

500

600

700

Accountancy BusinessAdministration

Education CommunicationArts

Psychology HRM

Students Population (According to their courses) in year 2011, 2012, 2013, and 2014

2011 2012 2013 2014

Compound Bars for Proportion

In a compound bar chart the length of a complete bar

signifies 100% of the population.

The bar is subdivided into sections that show the relative

sizes of components of the populations.

By comparing the sizes of the subdivisions of two parallel

compound bars, differences can be seen between the

compositions of the separate populations.

Note: The populations need not to be populations of living

creatures.

Example: One consequence of the dramatic growth

in population of the ‘third world’ countries is that a

high proportion of the population of these countries

is young and there are few old people. The United

Nations publication World Population Prospects

gives the following figures for 1990 populations:

France Mexico Nigeria Pakistan UK

% under 15 20.2 37.2 48.4 45.7 18.9

% 15 to 64 66.0 59.0 49.2 51.6 65.6

% 65 and over 13.8 3.8 2.4 2.7 15.5

Illustrate these figures in an appropriate diagram.

Compound Bars for Proportion

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

France

Mexico

Nigeria

Pakistan

UK

Compound Bars showing, for five countries, the proportions of the population in three age ranges

% under 15 % 15 to 64 % 65 and over

Pie Charts Pie charts are the circular equivalent of

compound bar charts.

The areas of the portions of the pie are in

proportion to the quantities being presented.

Note: When drawn correctly the areas (and not the

radii) will be in proportion to the differing

population sizes.

Example 7: The European Community Forest Health

Report 1989 classifies trees by the extent of their

defoliation (i.e. by their loss of leaves). Trees that

are in good health have defoliation levels of

between 0% and 10%. The following data show the

proportions of conifers with various amounts of

defoliation in France and the UK.

Extent of Defoliation

0%-10% 11%-25% 26%-60% 61%-100%

France 0.750 0.176 0.068 0.006

UK 0.385 0.303 0.250 0.089

Illustrate the data using pie charts.

Pie Charts Comparing the Amounts of

Defoliation of Conifers in France and in the

UK in 1989France

0%-10% 11%-25% 26%-60% 61%-100%

UK

0%-10% 11%-25% 26%-60% 61%-100%

Guidelines in Formatting

Graphs and Charts

1.Keep it simple and avoid

flashy effects.

Present only essential

information.

Avoid using gratuitous options

in graphical software programs.


Graphs and Charts

2. Title your graph or chart

clearly to convey the purpose.

The title provides the reader

with the overall message you

are conveying.


Graphs and Charts

3. Specify the units of

measurement on the x-axis and

the y-axis.

(i.e. number of people, years

etc.)


Graphs and Charts

4. Label each part of the chart

or the graph.

Use legend for too much

information

Use different colors or

variations in patterns

When it best to use these

charts and graphs?

Categorical data are grouped into

non-overlapping categories (such

as grade, race, and yes or no

responses). Bar graphs, line

graphs, and pie charts are useful

for displaying categorical data.

Lesson 2.2Frequency Distribution and their Graphic Presentation


This is a tabular

arrangement of data showing

a tallying of the number of

times each score value (or

interval of score values)

occurs in a group of scores.

Frequency (f)

This is the number

of times the value

occurs in a sample.

A. Ungrouped Frequency

Distribution

In an ungrouped

frequency distribution,

each value of “x” in the

distribution represents

only one value.

Example 2.2.1 The number of people per housing unit in a certain area was

tallied

3 0 6 7 2 4 3 6 2 6

6 3 7 3 4 7 1 7 6 9

1 9 6 0 8 2 7 3 6 2

2 9 6 3 8 3 6 1 4 6

6 3 7 7 1 4 4 6 8 2

3 6 2 6 3 8 0 6 2 6

Create a frequency distribution, and present the data in

graphical form.

Ungrouped Frequency

DistributionNumber of People

per Housing Unit

f %

0 3 5%

1 4 6.66666…%

2 8 13.33333…%

3 10 16.66666…%

4 5 8.33333…%

6 16 26.66666…%

7 7 11.66666…%

8 4 6.66666…%

9 3 5%

n=60 100%

Bar Charts

0

2

4

6

8

10

12

14

16

18

0 1 2 3 4 5 6 7 8 9

Bar Charts of Number of People Per Housing Unit

Number of People per Housing Unit

Pie Chart

5%6%

13%

17%

8%

27%

12%

7%5%

PIE CHART OF NUMBER OF PEOPLE PER HOUSING UNIT

0 1 2 3 4 6 7 8 9

Let’s Practice: The number of cellular phones

in a family of 50 students is presented below:

0 2 5 2 1 7 2 7 6 2

1 6 0 3 4 6 3 1 3 5

4 4 1 4 1 0 4 0 0 1

6 1 3 3 0 3 1 5 3 5

2 3 5 2 3 2 3 4 1 3Construct an ungrouped frequency distribution

and Present the data in graphical form.

B. Grouped Frequency

DistributionIn a grouped frequency

distribution, each value

of “x” in the distribution

represents more than one

value.

To illustrate, let us use the sample of 70

scores:

78 65 112 98 87 94 76 90 93 92

67 89 102 114 93 84 79 99 100 62

101 77 68 94 96 89 93 64 75 82

105 111 62 108 93 97 66 94 115 100

76 89 99 87 73 84 88 110 63 107

66 74 82 77 99 73 63 98 101 114

70 88 94 104 82 84 96 93 89 96

Steps in Constructing a Grouped


Step 1: Arrange the

data in ascending or

descending order.

62 70 79 88 93 98 104

62 73 82 89 93 98 105

63 73 82 89 94 99 107

63 74 82 89 94 99 108

64 75 84 89 94 99 110

65 76 84 90 94 100 111

66 76 84 92 96 100 112

66 77 87 93 96 101 114

67 77 87 93 96 101 114

68 78 88 93 97 102 115

Class

This is a grouping of

values by which data is

binned for computation

of a frequency

distribution.

Step 2:

Determine the

Range.

Range (R)

The area of variation

between upper and

lower limits on a

particular scale.

Range (R)

Range is equal to the highest

observed value minus the

lowest observed value.

Formula 2.1:

𝑅 = ℎ𝑜𝑣 − 𝑙𝑜𝑣

Step 3:

State the desired

number of classes

or class intervals.

Class Interval (CI)

This is the range of values used in

defining a class. Moreover, it defines

the number of rows desired in the

table. For uniformity, use the square

root rule:

𝑪𝑰 = 𝒏 (Formula 2.2)𝑤ℎ𝑒𝑟𝑒 𝑛 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

Other "rules of thumb" in

Choosing Class Interval:

Sturges’ Rule

Rice Rule

Sturges’ RuleSturges' rule is to set the number

of intervals as close as possible to

1 + Log2(N), where Log2(N) is the

base 2 log of the number of

observations/samples.

1 + 3.3 Log10(N)

Rice Rule

This rule set the number

of intervals to twice the

cube root of the number

of observations.

Let’s Practice: Determine the Class Interval

of the following samples using the three

methods.

1.n = 1,000

2.n = 150

3.n = 80

4.n = 5,000

5.n = 18,500

Step 4:

Determine the

class size or class

width (CW).

Class Width (CW)

This is the difference

between the lower class

limit and the next lower

class limit.

Nice-to-Know:Class Width (CW) is also sometimes

called “Bin Width”.

Your choice of bin width determines

the number of class intervals.

This decision, along with the choice

of starting point for the first

interval, affects the shape of the

histogram.

Reminder:The class width is also the

difference between the lower

limits of two consecutive classes

or the upper limits of two

consecutive classes. It is not the

difference between the upper and

lower limits of the same class

Reminder:

The class width should be an

odd number. This will

guarantee that the class

midpoints are integers

instead of decimals.

Formula 2.3:

𝑪𝑾 =𝑹

𝑪𝑰

Note:

There are two things

to be careful of here.

You must round up, not

off.

Guidelines for Classes/Class Interval (CI)

1.There should be between 5 and 20 classes.

2.The class width should be an odd number.

3.The classes must be mutually exclusive.

4.The classes must be all inclusive or

exhaustive.

5.The classes must be continuous.

6.The classes must be equal in width.

Step 5:

Set up the

frequency table by

using tally chart

first.

Class Limits (CL)

These are the smallest and the largest observations in each class. They are called the upper class limits and

lower class limits.

Lower Class Limit

This is the

smallest value in

the class.

Upper Class Limit

This is the

largest value in

every class.

Modal Class

This is the class

with the highest

frequency.

Step 6:

Solve for the

Class Marks

(CM).

Class Mark (CM)

This is also called as a midpoint. It

is the numerical value that is

exactly at the middle of each

class. It takes the value of

individual scores in a class interval

to which they belong.

Formula 2.4:

𝑪𝑴 =𝒍𝒄𝒍 + 𝒖𝒄𝒍

𝟐where:

lcl is the lower class limit

ucl is the upper class limit

Class Boundaries

These are also called as ‘true limits’,

‘real limits’, or ‘exact limits’.

These are numbers that do not occur

in the sample data but are halfway

between the upper limit of the class

and the lower limit of the next class.

Lower Real Limit

The lower real limit

is obtained by

subtracting 0.5 to

the lower class limit.

Upper Real Limit

The upper real limit

is obtained by adding

0.5 to the upper

class limit.

Grouped Frequency Distribution or

Frequency Distribution with Class Marks

Class Intervals

(Classes/Class

Limits)

Frequency

(f)

Class Mark

(CM)

62-68 10 65

69-75 5 72

76-82 9 79

83-89 11 86

90-96 14 93

97-103 11 100

104-110 5 107

111-117 5 114

n=70

Relative Grouped Frequency

DistributionClass

Intervals

(Classes/Class

Limits)

Frequency

(f)

Class Mark

(CM)

Relative

Frequency

(%F)

62-68 10 65 14.2

69-75 5 72 7.14

76-82 9 79 12.86

83-89 11 86 15.71

90-96 14 93 20.00

97-103 11 100 15.71

104-110 5 107 7.14

111-117 5 114 7.14

n=70 100.00

Step 7:

Make the

Graphs.

Cumulative Frequency

It corresponds to any class

interval that is the number of

cases within that interval plus

all the cases in intervals lower

to it on the scale (𝐹 ≤) or

greater to it on the scale 𝐹 ≥ .

Meaning and Interpretation of

Cumulative Frequency

In the example, the “less than”

cumulative frequency 24 in the

3rd class interval denoted as 76-

82, means that 24

cases/scores/samples fall below

the upper real limit 82.5.

Note:

Bar charts are not

appropriate for data with

grouped frequencies for

ranges of values.

HistogramIt is a diagram using rectangles to

represent frequency.

It differs from the bar chart in that the

rectangles may have different widths, but

the key feature is that, for each rectangle:

AREA IS PROPORTIONAL TO CLASS

FREQUENCY

Note:When all the class widths are

equal, histograms are easy to

construct, since then not only

is area ∝ to frequency, but alsoheight ∝ frequency.

Note:

A histogram is a graphical

method for displaying the shape

of a distribution.

It is particularly useful when

there are a large number of

observations.

Note:In a histogram, the class frequencies

are represented by bars.

The height of each bar corresponds to

its class frequency.

Histograms can also be used when the

scores are measured on a more

continuous scale such as the length of

time (in milliseconds) required to

perform a task.

Shapes of Histogram1.Unimodal

2.Skewed to the Right

3.Skewed to the Left

4.Bimodal

5.Multimodal

6.Symmetrical or Normal or Triangular

Symmetrical or Normal or Triangular

Symmetrical or Normal or

Triangular

Both sides of the

distribution are

identical.

Symmetric, Unimodal

Skewed to the Right

SkewedThis means one tail is

stretched out longer than

the other. The direction of

the skewness is on the

side of the longer tail.

Skewed to the Left

Bimodal

Bimodal

This is a continuous probability

distribution with two different

modes. These appear as distinct

peaks.

The two most populous classes are

separated by one or more classes.

Multimodal

Multimodal

This is a multimodal

distribution that is a

continuous probability

distribution with two or

more modes.

Did you notice?The idea of the histogram is to give

a visual impression of which values

are likely to occur and which values

are less likely to occur. The

‘chunky’ outline of a histogram is

not ‘a thing of beauty’ and an

alternative exists whenever the

classes are all of equal width.

Frequency PolygonThis is a graphical device for understanding

the shapes of distributions.

They serve the same purpose as histograms,

but are especially helpful for comparing

sets of data.

Frequency polygons are also a good choice

for displaying cumulative frequency

distributions.

How to construct?For each class, locate the point with x-

coordinate equal to the midpoint of the

class and with y-coordinate corresponding

to the class frequency.

Successive point are then joined to form

the polygon.

In order to obtain a closed figure, extra

classes with zero frequencies are added

at either end of the frequency

distribution.

Reminder:As with the histogram it is area that is

proportional to frequency. The area of

a frequency polygon equals that of the

corresponding histogram.

Since the frequency polygon is only

used with classes of equal width, class

frequencies provide a convenient

scale for the y-axis.

Classes Class

Width

(CW)

Lower

Class

Limits

Upper

Class

Limits

Class Boundaries Lower

Real

Limits

Upper

Real

Limits

Class

Marks

(CM)

28-35

35-40

68-75

5-12

43-47

77-84

120-128

114-120

195-203

60-70

34.5-40.5

16.5-21.5

95.5-99.5

64.25-70.25

74.7-80.7

Classes Class

Width

(CW)

Lower

Class

Limits

Upper

Class

Limits

Class Boundaries Lower

Real

Limits

Upper

Real

Limits

Class

Marks

(CM)

28-35 8 28 35 27.5-35.5 27.5 35.5 31.5

35-40 6 35 40 34.5-40.5 34.5 40.5 37.5

68-75 8 68 75 67.5-75.5 67.5 75.5 71.5

5-12 8 5 12 4.5-12.5 4.5 12.5 8.5

43-47 5 43 47 42.5-47.5 42.5 47.5 45

77-84 8 77 84 76.5-84.5 76.5 84.5 80.5

120-128 9 120 128 119.5-128.5 119.5 128.5 124

114-120 7 114 120 113.5-120.5 113.5 120.5 117

195-203 9 195 203 194.5-203.5 194.5 203.5 199

60-70 11 60 70 59.5-70.5 59.5 70.5 65

34.5-40.5 7 34.5 40.5 34-41 34 41 37.5

16.5-21.5 6 16.5 21.5 16-22 16 22 19

95.5-99.5 5 95.5 99.5 95-100 95 100 97.5

64.25-70.25 7 64.25 70.25 63.75-70.75 63.75 70.75 67.25

74.7-80.7 7 74.7 80.7 74.2-80.2 74.2 80.2 77.7

Download - Chapter 2 · Disadvantage Tally charts become uncomfortably long if the range of possible values is very large. For example, the individual scores below was gathered from a low

Top Related