Chapter 2Methods of Presenting Data
Given below are the scores made in their final round by the 30 leading golfers in the Scottish Open golf championship:
62 65 63 65 70 68 65 67 67 69
70 70 70 67 68 68 66 69 74 67
68 69 69 69 71 71 72 69 72 68
Tally Chart
The tally chart is constructed
on a single ‘pass’ through the
data.
Five-Bar Gates
Counting the tallies is made easy by
using the ‘five-bar gates’. Here, for
each score a vertical stroke is entered
on the appropriate row, with a
diagonal stroke being used to
complete each group of five strokes.
Frequency
Frequency of the outcome is
the tally count for each
outcome. In above example,
the frequency of the outcome
65 was 3.
Frequency Distribution
Frequency Distribution is the
set of outcomes with their
corresponding frequencies,
which can be displayed in a
frequency table.
Frequency Table
Final Round Score 62 63 64 65 66 67 68 69 70 71 72 73 74
Number of Golfers 1 1 0 3 1 4 5 6 4 2 2 0 1
Disadvantage
Tally charts become uncomfortably long
if the range of possible values is very
large. For example, the individual scores
below was gathered from a low-scoring
Sunday league cricket match:
22 58 12 17 4
7 26 10 13 1
39 0 1 10 6
0 11 14 1 0
Stem-and-leaf Diagram
In stem-and-leaf diagram, the stem
represents the most significant digit
(i.e. the ‘tens’) and the leaves are less
significant digits (the ‘units’).
Split Stem
The stem-and-leaf diagram
is sometimes presented with
split stems for finer details.
In Split Stems…
Here, the units between 0 and 4
(inclusive) are separated from the
units between 5 and 9 (inclusive). It is
now particularly easy to see that most
players scored less than 15 and that
the highest score of 58 was a long way
clear of the rest.
Note:
Stem-and-leaf diagrams can be used both
with discrete data and with continuous
data (treating the latter as though it were
discrete). They are much easier to
understand when the stem involves a power
of ten, but other units may be employed if
the stem would otherwise be too long or
too short. It is often wise to provide an
explanation (a Key) with the diagram.
Example 2: The internal phone numbers of a random
selection of individuals from a large organization
are given below:
3315 3301 2205 2865 2608 2886
2527 3144 2154 2645 3703 2610
2768 3699 2345 2160 2603 2054
2302 2997 3794 3053 3001 2247
3402 2744 3040 2459 3699 3008
3062 2887 2215 2213 3310 2508
2530 2987 3699 3298 2021 3323
2329 2845 2247 3196 3412 2021
Summarize these numbers using a stem-and-leaf diagram.
Let’s Practice: The masses (in g) of a random
sample of 20 sweets were as follows:
1.13 0.72 0.91 1.44 1.03
1.39 0.88 0.99 0.73 0.91
0.98 1.21 0.79 1.14 1.19
1.08 0.94 1.06 1.11 1.01
Summarize these results using a stem-and-leaf diagram.
Bar Charts
A bar chart or bar graph is a chart that
presents grouped data with
rectangular bars with lengths
proportional to the values that they
represent.
William Playfair in 18th century
Note:
Bar charts are easier to read if the
width of the bars is different from
the width of the gaps between the
bars.
It is not necessary to show the
origin of the graph.
Example: A car salesman is interested in
the color preferences of his costumers.
For one type of car his records are as
follows:
BLUE WHITE RED OTHERS
12 23 16 18
Represent these figures using a vertical bar chart.
Bar Charts
0
5
10
15
20
25
Blue White Red Others
Bar Chart of Sales of Cars of Different Colors
Color Preference
Multiple Bar Charts
When data occur naturally in groups andthe aim is to contrast the variations withindifferent groups, a multiple bar chart maybe used.
This consists of groups of two or moreadjacent bars separated from the nextgroup by a gap having, ideally, a differentwidth to the bars themselves.
Example: The following data, taken from the
Monthly Bulletin of Statistics published by the
United Nations, show the 1970 and 1988
estimated populations (in millions) for five
countries.
France Mexico Nigeria Pakistan UK
1970 50 51 57 56 55
1988 55 82 104 105 57
Illustrate the data using a multiple bar chart.
Multiple Bar Graphs
0 20 40 60 80 100 120
France
UK
Mexico
Nigeria
Pakistan
Populations of five countries in 1970 and in 1988 (figures are in millions
1988 1970
Example 2: The Registrar office of La Salle College Antipolo
reveals the following figures concerning the students
population in the years 2011, 2012, 2013, 2014
Courses 2011 2012 2013 2014
Accountancy 231 289 346 388
Business
Administration
451 487 521 589
Education 98 103 110 123
Communication
Arts
245 230 280 210
Psychology 150 156 144 132
HRM 230 256 304 410
Illustrate the data using a multiple bar chart
Multiple Bar Chart
0
100
200
300
400
500
600
700
Accountancy BusinessAdministration
Education CommunicationArts
Psychology HRM
Students Population (According to their courses) in year 2011, 2012, 2013, and 2014
2011 2012 2013 2014
Compound Bars for Proportion
In a compound bar chart the length of a complete bar
signifies 100% of the population.
The bar is subdivided into sections that show the relative
sizes of components of the populations.
By comparing the sizes of the subdivisions of two parallel
compound bars, differences can be seen between the
compositions of the separate populations.
Note: The populations need not to be populations of living
creatures.
Example: One consequence of the dramatic growth
in population of the ‘third world’ countries is that a
high proportion of the population of these countries
is young and there are few old people. The United
Nations publication World Population Prospects
gives the following figures for 1990 populations:
France Mexico Nigeria Pakistan UK
% under 15 20.2 37.2 48.4 45.7 18.9
% 15 to 64 66.0 59.0 49.2 51.6 65.6
% 65 and over 13.8 3.8 2.4 2.7 15.5
Illustrate these figures in an appropriate diagram.
Compound Bars for Proportion
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
France
Mexico
Nigeria
Pakistan
UK
Compound Bars showing, for five countries, the proportions of the population in three age ranges
% under 15 % 15 to 64 % 65 and over
Pie Charts Pie charts are the circular equivalent of
compound bar charts.
The areas of the portions of the pie are in
proportion to the quantities being presented.
Note: When drawn correctly the areas (and not the
radii) will be in proportion to the differing
population sizes.
Example 7: The European Community Forest Health
Report 1989 classifies trees by the extent of their
defoliation (i.e. by their loss of leaves). Trees that
are in good health have defoliation levels of
between 0% and 10%. The following data show the
proportions of conifers with various amounts of
defoliation in France and the UK.
Extent of Defoliation
0%-10% 11%-25% 26%-60% 61%-100%
France 0.750 0.176 0.068 0.006
UK 0.385 0.303 0.250 0.089
Illustrate the data using pie charts.
Pie Charts Comparing the Amounts of
Defoliation of Conifers in France and in the
UK in 1989France
0%-10% 11%-25% 26%-60% 61%-100%
UK
0%-10% 11%-25% 26%-60% 61%-100%
Guidelines in Formatting
Graphs and Charts
1.Keep it simple and avoid
flashy effects.
Present only essential
information.
Avoid using gratuitous options
in graphical software programs.
Guidelines in Formatting
Graphs and Charts
2. Title your graph or chart
clearly to convey the purpose.
The title provides the reader
with the overall message you
are conveying.
Guidelines in Formatting
Graphs and Charts
3. Specify the units of
measurement on the x-axis and
the y-axis.
(i.e. number of people, years
etc.)
Guidelines in Formatting
Graphs and Charts
4. Label each part of the chart
or the graph.
Use legend for too much
information
Use different colors or
variations in patterns
When it best to use these
charts and graphs?
Categorical data are grouped into
non-overlapping categories (such
as grade, race, and yes or no
responses). Bar graphs, line
graphs, and pie charts are useful
for displaying categorical data.
Lesson 2.2Frequency Distribution and their Graphic Presentation
Frequency Distribution
This is a tabular
arrangement of data showing
a tallying of the number of
times each score value (or
interval of score values)
occurs in a group of scores.
Frequency (f)
This is the number
of times the value
occurs in a sample.
A. Ungrouped Frequency
Distribution
In an ungrouped
frequency distribution,
each value of “x” in the
distribution represents
only one value.
Example 2.2.1 The number of people per housing unit in a certain area was
tallied
3 0 6 7 2 4 3 6 2 6
6 3 7 3 4 7 1 7 6 9
1 9 6 0 8 2 7 3 6 2
2 9 6 3 8 3 6 1 4 6
6 3 7 7 1 4 4 6 8 2
3 6 2 6 3 8 0 6 2 6
Create a frequency distribution, and present the data in
graphical form.
Ungrouped Frequency
DistributionNumber of People
per Housing Unit
f %
0 3 5%
1 4 6.66666…%
2 8 13.33333…%
3 10 16.66666…%
4 5 8.33333…%
6 16 26.66666…%
7 7 11.66666…%
8 4 6.66666…%
9 3 5%
n=60 100%
Bar Charts
0
2
4
6
8
10
12
14
16
18
0 1 2 3 4 5 6 7 8 9
Bar Charts of Number of People Per Housing Unit
Number of People per Housing Unit
Pie Chart
5%6%
13%
17%
8%
27%
12%
7%5%
PIE CHART OF NUMBER OF PEOPLE PER HOUSING UNIT
0 1 2 3 4 6 7 8 9
Let’s Practice: The number of cellular phones
in a family of 50 students is presented below:
0 2 5 2 1 7 2 7 6 2
1 6 0 3 4 6 3 1 3 5
4 4 1 4 1 0 4 0 0 1
6 1 3 3 0 3 1 5 3 5
2 3 5 2 3 2 3 4 1 3Construct an ungrouped frequency distribution
and Present the data in graphical form.
B. Grouped Frequency
DistributionIn a grouped frequency
distribution, each value
of “x” in the distribution
represents more than one
value.
To illustrate, let us use the sample of 70
scores:
78 65 112 98 87 94 76 90 93 92
67 89 102 114 93 84 79 99 100 62
101 77 68 94 96 89 93 64 75 82
105 111 62 108 93 97 66 94 115 100
76 89 99 87 73 84 88 110 63 107
66 74 82 77 99 73 63 98 101 114
70 88 94 104 82 84 96 93 89 96
Steps in Constructing a Grouped
Frequency Distribution
Step 1: Arrange the
data in ascending or
descending order.
62 70 79 88 93 98 104
62 73 82 89 93 98 105
63 73 82 89 94 99 107
63 74 82 89 94 99 108
64 75 84 89 94 99 110
65 76 84 90 94 100 111
66 76 84 92 96 100 112
66 77 87 93 96 101 114
67 77 87 93 96 101 114
68 78 88 93 97 102 115
Class
This is a grouping of
values by which data is
binned for computation
of a frequency
distribution.
Step 2:
Determine the
Range.
Range (R)
The area of variation
between upper and
lower limits on a
particular scale.
Range (R)
Range is equal to the highest
observed value minus the
lowest observed value.
Formula 2.1:
𝑅 = ℎ𝑜𝑣 − 𝑙𝑜𝑣
Step 3:
State the desired
number of classes
or class intervals.
Class Interval (CI)
This is the range of values used in
defining a class. Moreover, it defines
the number of rows desired in the
table. For uniformity, use the square
root rule:
𝑪𝑰 = 𝒏 (Formula 2.2)𝑤ℎ𝑒𝑟𝑒 𝑛 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Other "rules of thumb" in
Choosing Class Interval:
Sturges’ Rule
Rice Rule
Sturges’ RuleSturges' rule is to set the number
of intervals as close as possible to
1 + Log2(N), where Log2(N) is the
base 2 log of the number of
observations/samples.
1 + 3.3 Log10(N)
Rice Rule
This rule set the number
of intervals to twice the
cube root of the number
of observations.
Let’s Practice: Determine the Class Interval
of the following samples using the three
methods.
1.n = 1,000
2.n = 150
3.n = 80
4.n = 5,000
5.n = 18,500
Step 4:
Determine the
class size or class
width (CW).
Class Width (CW)
This is the difference
between the lower class
limit and the next lower
class limit.
Nice-to-Know:Class Width (CW) is also sometimes
called “Bin Width”.
Your choice of bin width determines
the number of class intervals.
This decision, along with the choice
of starting point for the first
interval, affects the shape of the
histogram.
Reminder:The class width is also the
difference between the lower
limits of two consecutive classes
or the upper limits of two
consecutive classes. It is not the
difference between the upper and
lower limits of the same class
Reminder:
The class width should be an
odd number. This will
guarantee that the class
midpoints are integers
instead of decimals.
Formula 2.3:
𝑪𝑾 =𝑹
𝑪𝑰
Note:
There are two things
to be careful of here.
You must round up, not
off.
Guidelines for Classes/Class Interval (CI)
1.There should be between 5 and 20 classes.
2.The class width should be an odd number.
3.The classes must be mutually exclusive.
4.The classes must be all inclusive or
exhaustive.
5.The classes must be continuous.
6.The classes must be equal in width.
Step 5:
Set up the
frequency table by
using tally chart
first.
Class Limits (CL)
These are the smallest and the largest observations in each class. They are called the upper class limits and
lower class limits.
Lower Class Limit
This is the
smallest value in
the class.
Upper Class Limit
This is the
largest value in
every class.
Modal Class
This is the class
with the highest
frequency.
Step 6:
Solve for the
Class Marks
(CM).
Class Mark (CM)
This is also called as a midpoint. It
is the numerical value that is
exactly at the middle of each
class. It takes the value of
individual scores in a class interval
to which they belong.
Formula 2.4:
𝑪𝑴 =𝒍𝒄𝒍 + 𝒖𝒄𝒍
𝟐where:
lcl is the lower class limit
ucl is the upper class limit
Class Boundaries
These are also called as ‘true limits’,
‘real limits’, or ‘exact limits’.
These are numbers that do not occur
in the sample data but are halfway
between the upper limit of the class
and the lower limit of the next class.
Lower Real Limit
The lower real limit
is obtained by
subtracting 0.5 to
the lower class limit.
Upper Real Limit
The upper real limit
is obtained by adding
0.5 to the upper
class limit.
Grouped Frequency Distribution or
Frequency Distribution with Class Marks
Class Intervals
(Classes/Class
Limits)
Frequency
(f)
Class Mark
(CM)
62-68 10 65
69-75 5 72
76-82 9 79
83-89 11 86
90-96 14 93
97-103 11 100
104-110 5 107
111-117 5 114
n=70
Relative Grouped Frequency
DistributionClass
Intervals
(Classes/Class
Limits)
Frequency
(f)
Class Mark
(CM)
Relative
Frequency
(%F)
62-68 10 65 14.2
69-75 5 72 7.14
76-82 9 79 12.86
83-89 11 86 15.71
90-96 14 93 20.00
97-103 11 100 15.71
104-110 5 107 7.14
111-117 5 114 7.14
n=70 100.00
Step 7:
Make the
Graphs.
Cumulative Frequency
It corresponds to any class
interval that is the number of
cases within that interval plus
all the cases in intervals lower
to it on the scale (𝐹 ≤) or
greater to it on the scale 𝐹 ≥ .
Meaning and Interpretation of
Cumulative Frequency
In the example, the “less than”
cumulative frequency 24 in the
3rd class interval denoted as 76-
82, means that 24
cases/scores/samples fall below
the upper real limit 82.5.
Note:
Bar charts are not
appropriate for data with
grouped frequencies for
ranges of values.
HistogramIt is a diagram using rectangles to
represent frequency.
It differs from the bar chart in that the
rectangles may have different widths, but
the key feature is that, for each rectangle:
AREA IS PROPORTIONAL TO CLASS
FREQUENCY
Note:When all the class widths are
equal, histograms are easy to
construct, since then not only
is area ∝ to frequency, but alsoheight ∝ frequency.
Note:
A histogram is a graphical
method for displaying the shape
of a distribution.
It is particularly useful when
there are a large number of
observations.
Note:In a histogram, the class frequencies
are represented by bars.
The height of each bar corresponds to
its class frequency.
Histograms can also be used when the
scores are measured on a more
continuous scale such as the length of
time (in milliseconds) required to
perform a task.
Shapes of Histogram1.Unimodal
2.Skewed to the Right
3.Skewed to the Left
4.Bimodal
5.Multimodal
6.Symmetrical or Normal or Triangular
Symmetrical or Normal or Triangular
Symmetrical or Normal or
Triangular
Both sides of the
distribution are
identical.
Symmetric, Unimodal
Skewed to the Right
SkewedThis means one tail is
stretched out longer than
the other. The direction of
the skewness is on the
side of the longer tail.
Skewed to the Left
Bimodal
Bimodal
This is a continuous probability
distribution with two different
modes. These appear as distinct
peaks.
The two most populous classes are
separated by one or more classes.
Multimodal
Multimodal
This is a multimodal
distribution that is a
continuous probability
distribution with two or
more modes.
Did you notice?The idea of the histogram is to give
a visual impression of which values
are likely to occur and which values
are less likely to occur. The
‘chunky’ outline of a histogram is
not ‘a thing of beauty’ and an
alternative exists whenever the
classes are all of equal width.
Frequency PolygonThis is a graphical device for understanding
the shapes of distributions.
They serve the same purpose as histograms,
but are especially helpful for comparing
sets of data.
Frequency polygons are also a good choice
for displaying cumulative frequency
distributions.
How to construct?For each class, locate the point with x-
coordinate equal to the midpoint of the
class and with y-coordinate corresponding
to the class frequency.
Successive point are then joined to form
the polygon.
In order to obtain a closed figure, extra
classes with zero frequencies are added
at either end of the frequency
distribution.
Reminder:As with the histogram it is area that is
proportional to frequency. The area of
a frequency polygon equals that of the
corresponding histogram.
Since the frequency polygon is only
used with classes of equal width, class
frequencies provide a convenient
scale for the y-axis.
Classes Class
Width
(CW)
Lower
Class
Limits
Upper
Class
Limits
Class Boundaries Lower
Real
Limits
Upper
Real
Limits
Class
Marks
(CM)
28-35
35-40
68-75
5-12
43-47
77-84
120-128
114-120
195-203
60-70
34.5-40.5
16.5-21.5
95.5-99.5
64.25-70.25
74.7-80.7
Classes Class
Width
(CW)
Lower
Class
Limits
Upper
Class
Limits
Class Boundaries Lower
Real
Limits
Upper
Real
Limits
Class
Marks
(CM)
28-35 8 28 35 27.5-35.5 27.5 35.5 31.5
35-40 6 35 40 34.5-40.5 34.5 40.5 37.5
68-75 8 68 75 67.5-75.5 67.5 75.5 71.5
5-12 8 5 12 4.5-12.5 4.5 12.5 8.5
43-47 5 43 47 42.5-47.5 42.5 47.5 45
77-84 8 77 84 76.5-84.5 76.5 84.5 80.5
120-128 9 120 128 119.5-128.5 119.5 128.5 124
114-120 7 114 120 113.5-120.5 113.5 120.5 117
195-203 9 195 203 194.5-203.5 194.5 203.5 199
60-70 11 60 70 59.5-70.5 59.5 70.5 65
34.5-40.5 7 34.5 40.5 34-41 34 41 37.5
16.5-21.5 6 16.5 21.5 16-22 16 22 19
95.5-99.5 5 95.5 99.5 95-100 95 100 97.5
64.25-70.25 7 64.25 70.25 63.75-70.75 63.75 70.75 67.25
74.7-80.7 7 74.7 80.7 74.2-80.2 74.2 80.2 77.7