presentation of datain qualitative data we are counting the number of observations in each category....
TRANSCRIPT
PRESENTATION OF DATATabular presentation
Data summarizationIs the organization of data in a way for easy
understanding.
It is the first step of data interpretation (analysis).
Consists of the following steps:
1) Data entering.
2) Ordered array.
3) Summarization
Data entering
Generally, computers are used for data entry.
Nowadays, many software are developed for data entering, presentation and data analysis.
Examples of statistical software:
MS Excel.
Epi-Info.
SPSS.
Stata.
Ordered array
It is the first step in the process of data organization after data entering.
An ordered array is a listing of values from the smallest value to the largest value.
It enables one to determine quickly the largest and smallest measurements.
It also enables to determine roughly proportion of people lying below and above certain value.
FREQUENCY DISTRIBUTIONTABLES
It determines the number of observations falling into each class
In qualitative data we are counting the number of observations in each category. These counts are called frequencies. And they are also presented as relative percentages of the total numbers.
In quantitative data frequencies can be counted by grouping data into equal intervals and counting frequency of event in each interval.
GROUPING DATA
To group a set of observations, we selecta set of contagious, non overlappingintervals, such that each value in the set ofobservation can be placed in one intervalonly, and no single observation should bemissed. The interval is called:
CLASS INTEVAL.
NUMBER OF CLASS INTERVALS
The number of class intervals should not
be too few because of the loss of important
information, and not too many because of
the loss of the needed summarization
NUMBER OF CLASS INTERVALS
When there is a priori classification of
that particular observation we can follow
that classification.
But when there is no such classification we
can follow the Sturge's Rule
NUMBER OF CLASS INTERVALS
Sturge's Rule:
k=1+3.322 log n
k= number of class intervals
n= number of observations in the set
The result should not be regarded as final,
modification is possible
WIDTH OF CLASS INTERVAL
The width of the class intervals should be the same, if possible.
RW=--------
K
W= Width of the class intervalR= Range (largest value – smallest value)K= Number of class intervals
RELATIVE FREQUENCY DISTRIBYTION
It determines the proportion of observation
in the particular class interval relative to the
total observations in the set.
CUMULATIVE FREQUENCY DISTRIBUTION This is calculated by adding the number of
observation in each class interval to the number of observations in the class interval above, starting from the second class interval onward.
CUMULATIVE RELATIVE FREQUENCY DISTRIBUTION
This is calculated by adding the relative
frequency in each class interval to the relative
frequency in the class interval above, starting
also from the second class interval onward.
CUMULATIVE DISTRIBUTION
Cumulative frequency and cumulative relative
frequency distributions are used to facilitate
obtaining information regarding the frequency or
relative frequency within two or more contagious
class intervals.
The following arethe number ofhours of 45 patientsslept following theadministration of acertain hypnoticdrug:
107717
23101212
57834
113158
513713
43171710
344411
57785
881813
Construct a table showing:
Frequency
Relative frequency
Cumulative frequency
Cumulative relative frequency distribution.
Number of class intervals:
K=1+3.322 log n
=1+3.322 log45
=1+3.322 X 1.653
=6.4
=6
Width of class interval:
R 17-1
W=------= ------- = 2.7 = 3
K 6
CUM.REL.
FREQUENCY
%
CUMULATIVE
FREQUENCY
RELATIVE
FREQUENCY
%
FREQUENCYCLASS
INTERVAL
(hour)
24.41124.4111-3
46.62122.2104-6
75.5 3428.9137-9
91.14115.6710-12
95.5434.4213-15
99.9454.42 16-18
99.945Total
The following are the weight (in ounces)
of malignant tumours removed from the
abdomen of 57 subjects:
28513641163112212211681
25521942243232222312632
45534643693349232413423
12543044473438242514274
57554345233542254415305
51564946223627266516366
23571247433731274317287
4248273850282518328
2849493938297419799
31502840213051202710
Construct a table showing :
Frequency
Relative frequency
Cumulative frequency
Cumulative relative frequency
Number of class intervals:K=1+3.322 log n
=1+3.322 log 57=1+3.322 X1.76= 6.8.3 = 7
Width of class interval:R 79-12 67
W=---------= ------------=-----------= 9.6 = 10K 7 7
Cum.Rel
Freq%
Rel.Freq
%
Cum.FreqFrequencyClass
interval
8.778.775510-19
42.1033.33241920-29
59.6417.54341030-39
82.4522.81471340-49
89.477.0251450-59
96.497.0255460-69
100.003.5157270-79
100.0057TOTAL
Tabular presentation
Presentation of data in tables so as to organize the data into a compact, concise and readily comprehensible form.
They can display the characteristics of data more efficiently than the raw data.
Types
Simple Table : including one variable (quantitativeor qualitative ) and the corresponding frequency
Cross tabulation: (Two–dimensional tables), twovariables are cross classified
Contingency table: demonstrating the relationshipbetween two or more variables
Graphical and Pictorial presentation of data
The use of diagrams or pictures to display distribution or characteristics of one or more sets of data in a compact and readily comprehensible form.
They can provide a better visual appreciation of characteristics of data than tabular presentation
Graphs
It is a pictorial display of quantitative data using a coordinate system , where the X is the horizontal axis and the Y is the vertical axis.
X-axis usually includes the independent variable (method of classification)
Y-axis includes the dependant variable
( frequency or relative frequency or other indicator)
Stem-and-Leaf Plot
Summarizes quantitative data.
Each data point is broken down into a “stem” and a “leaf.”
First, “stems” are aligned in a column.
Then, “leaves” are attached to the stems.
Stem-and-Leaf Plot
Stem-and-leaf of Shoes N = 139 Leaf Unit = 1.0
12 0 223334444444
63 0
555555555555566666666677777778888888888888999999999
(33) 1 000000000000011112222233333333444
43 1 555555556667777888
25 2 0000000000023
12 2 5557
8 3 0023
4 3
4 4 00
2 4
2 5 0
1 5
1 6
1 6
1 7
1 7 5
Histogram
Graphical display of frequency distribution of quantitative variable .
The values of the quantitative variable( as class interval) will be placed on the X-axis
( representing the width of the rectangles), and the corresponding frequency (or relative frequency) will be placed on the Y-axis (representing the height of the rectangles)
Histogram
The area is proportional to the height, and the frequencies in different categories can be directly compared by examining the relative height of the respective bar.
It is important that the class interval should be equal, otherwise the area should be compared.
Only one set of data can be shown in one histogram
Frequency Polygon
Another form of graphical presentation of frequency distribution of quantitative variables.
It is similar to the histogram , but instead of using rectangles to present data, the midpoint of the top of each rectangle are plotted , and connected together by straight lines.
Frequency Polygon
More than one set of data can be demonstrated on the same graph, to facilitate direct comparison.
It provides information about underlying characteristics of data .
The area under the frequency polygon is equal to the area under the equivalent histogram
Scatter diagram
A pair of measurements is plotted as a single point on a graph.
The value of one variable of each pair is plotted on the X axis and the value of the other variable is plotted on the Y axis
Scatter diagram
The pattern made by the plotted points is
indicative of the relationship between these
two variables, which might be linear (if they
follow straight line) or curvilinear (if the
pattern doesn't follow straight line)
Scatter diagram
A scatter diagram could suggest: No relationship: when one variable changes
with no change in the other variable ,or when the pattern is buzzard
Linear relationship: an increase in the 1st variable is associated with an increase (positive) or decrease (negative) in the 2nd variable, and the pattern follows a straight line.
Curvilinear (positive or negative) relationship: the pattern of increase or decrease will not follow a straight line .
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4
l/m
in
l/min
correlation of two methods of cardiac output measurments
Series1
Bar chart
Used to present discrete or qualitative data
It includes separated bars of equal width
The method of classification of the variable is usually placed on the X-axis, and the Y-axis usually represents the corresponding frequency or relative frequency.
Bar chart
It can be used to present more than one set of data simultaneously using different colors , shades,... In this case a key should be used
Comparison will be made on the basis of the height of the bar (frequency). i.e.: the width of the bar has no value
It is important that the vertical axis should start at the zero, otherwise the heights of the bars are not proportional to the frequencies.
Estimated Direct and Indirect Costs of
Cardiovascular Diseases and Stroke
United States: 2005
Source: Heart Disease and Stroke Statistics – 2005 Update.
254.8
142.1
56.8 59.727.9
393.5
0
50
100
150
200
250
300
350
400
450
Heart
Dis
ease
Coro
nary
Heart
Dis
ease
Str
oke
Hypert
ensiv
e
Dis
ease
Congestive
Heart
Failu
re
Tota
l C
VD
*
Bil
lio
ns o
f D
oll
ars
434
289
69 6134
494
269
64 42 39
0
100
200
300
400
500
A B C D E A B D F E
Males
Females
Deaths in Thousands
Leading Causes of Death for
All Males and Females United States: 2002
A Total CVD
(Preliminary)
B Cancer
C Accidents
D Chronic Lower Respiratory Diseases
E Diabetes Mellitus
F Alzheimer’s Disease
Source: CDC/NCHS
Fig 3: Distribution of unvaccinated children below one year by governorates
0%
5%
10%
15%
20%
25%
30%
35%
Baghda
d
Anbar
Babylon
Was
sit
Basrah
Ninev
ah
Miss
an
Qadi
siya
Diyala
Kerbala
Taamem
Muthan
a
Thi q
ar
Najaf
Salah
Al Din
Suleim
aniya
Erbil
Duh
ok
Total
Governorates
% o
f u
nvaccin
ate
d c
hild
ren
Component bar chart
It is a type of charts based on proportion.
It uses bars that are either shaded or colored to show the relative contribution of each of its components
Fig 9: Reason for unvaccination for unvaccinated children
by governorates
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Baghd
ad
Babylon
Basra
h
Missa
n
Diyala
Taam
em
Thi q
ar
Salah
Al d
inErb
il
Total
Governorates
% of other causes
% of the child abscent
% of not visited byvaccination team
<40 40-49 50-59 60-69 70-79 80+
Age (y)
17% 16% 16% 20% 20% 11%
Distribution of Hypertension Subtype in the untreated
Hypertensive Population in NHANES III by Age
ISH (SBP 140 mm Hg and DBP <90 mm Hg)
SDH (SBP 140 mm Hg and DBP 90 mm Hg)
IDH (SBP <140 mm Hg and DBP 90 mm Hg)
0
20
40
60
80
100
Numbers at top of bars represent the overall percentage distribution of untreated hypertension by
age.
Franklin et al. Hypertension 2001;37: 869-874.
Frequency of
hypertension
subtypes in all
untreated
hypertensives
(%)
POL-WARLTU-KAU
RUS-NOCUNK-GLA
FIN-NKARUS-NOI
RUS-MOC
CZE-CZEYUG-NOSRUS-MOI
BEL-CHAFRA-LIL
POL-TAR
FIN-KUOUNK-BELFIN-TUL
FRA-STRGER-EGE
ITA-FRI
GER-BREBEL-GHEUSA-STA
DEN-GLOGER-AUGSWE-GOT
NEZ-AUCITA-BRI
AUS-NEW
CAN-HALSWI-VAFICE-ICE
SWE-NSSWI-TIC
AUS-PER
FRA-TOUSPA-CATCHN-BEI
0 500 1000 1500 2000
Annual mortality rate per 100 000
CHD
Stroke
Other CVD
Non CVD
Men
UNK-GLA
POL-WARLTU-KAU
USA-STA
DEN-GLO
BEL-CHA
RUS-NOC
YUG-NOS
CZE-CZE
UNK-BELRUS-MOC
BEL-GHE
GER-EGE
RUS-NOI
RUS-MOI
NEZ-AUC
POL-TARFRA-LIL
AUS-NEW
CHN-BEI
CAN-HAL
GER-BRE
FIN-NKA
SWE-GOT
FIN-KUOITA-FRI
GER-AUG
FIN-TUL
FRA-STR
ICE-ICE
AUS-PER
ITA-BRISWE-NS
FRA-TOU
SPA-CAT
0 250 500 750 1000
Annual mortality rate per 100 000
Women
G3
Distribution of coronary risk factors among patients with chronic metabolic syndrome
48.8
27.5
53.866.3
93.1
17.5
010203040506070
8090
100
Rel
ativ
e fre
quen
cy (%
)
Hype
rtens
ion
Diabe
tes M
ellitu
s
Family
histo
ry of
ische
mic Hea
rt Di...
Smok
ing ha
bit
Dyslip
idemia
Obesit
y
Pictograms
It uses series of small identifying symbols to present the data. Each symbol represents a fixed number of units
Pie chart
It is a type of charts based on proportion
It uses wedge-shaped portions of a circle to illustrate the relative contribution of each part to the total (division of the whole into segments)
Pie chart
To demonstrate the angel of each wedge , we multiply the relative frequency of each division by 360 degrees.
Start at 12 o’clock,
It is preferable to arrange segments in order of their magnitude (starting with the largest), and proceed clockwise around the chart.
Percentage Breakdown of Deaths From
Cardiovascular DiseasesUnited States:2002 Preliminary
Source: CDC/NCHS.
18%
6%
5%
4%
0%0%
13%
53%
Coronary Heart Disease
Stroke
Congestive Heart Failure
High Blood Pressure
Diseases of the Arteries
Rheumatic Fever/Rheumatic
Heart Disease
Congenital Cardiovascular
Defects
Other
Most Myocardial Infarctions Are Causedby Low-Grade Stenoses
Pooled data from 4 studies: Ambrose et al, 1988; Little et al, 1988; Nobuyoshi et al, 1991; and Giroud et al, 1992.
(Adapted from Falk et al.)
Falk E et al, Circulation, 1995.
Box Plot
Summarizes quantitative data.
Vertical (or horizontal) axis represents measurement scale.
Lines in box represent the 25th percentile (“first quartile”), the 50th percentile (“median”), and the 75th percentile (“third quartile”), respectively.
Box and whisker plot
Largest non-outlying value
Upper
quartile
Lower
quartile
Smallest non-outlying value
*
oOutlying value
Extreme outlying value
Median
Box Whiskers
Outlying values
Box Plot
0
1
2
3
4
5
6
7
8
9
10
Hours
of sle
ep
Amount of sleep in past 24 hours
of Spring 1998 Stat 250 Students
Map charts
These are used to present the geographical distribution of one or more sets of data
Change in coronary event rate
Change in MONICA CHD mortality
Change in case fatality
Significant increase
Insignificant change
Significant decrease
Men
G24
Suggestions for the design and use of tables, graphs, and charts Choose the method most effective for data and
purpose
Point out one idea at a time
Limit the amount of data and include one kind of data in each presentation
Use adequate , properly located titles and labels
Mention the source , if it is not yours
Care and caution in proposing conclusions
Exercise
The following are the DBP measurements (mmHg) of 60 individuals.
Make a suitable graphical or pictorial presentation
No.DBP (mmHg)
365-69
570-74
975-79
1880-84
1385-89
990-94
395-99
60Total
DBP (mmHg) of 60 men
0
5
10
15
20
65-
69
70-
74
75-
79
80-
84
85-
89
90-
94
95-
99
years
No
.
Series1
Exercise
The following are the proportions of the commonest ten cancers in Iraq, 1995
Make a suitable graphical or pictorial presentation
% of total CAPrimary site
14.3Breast
11.2Bronchus &lung
7.4Urinary Bladder
6.2Non-Hodgkin
Lymphoma
5.9Larynx
5.2Leukemia
4.8Brain & other CNS
4.3Skin
3.6Stomach
3.0Hodgkin Lymphoma
Commonest 10 Ca in Iraq
02468
10121416
Bre
ast
Bro
nch
us
Uri
na
ry
No
n-
La
ryn
x
Le
uke
mia
Bra
in &
Skin
Sto
ma
ch
Ho
dg
kin
CA site
% o
f to
tal C
A
Series1
Exercise
The following is the distribution of TB cases registered in City X.
Make a suitable graphical or pictorial presentation
No.Type of TB
360Smear +ve PTB
240Smear –ve PTB
200Extra PTB
800Total
Types of TB
Smear +ve PTB
Smear –ve PTB
Extra PTB
Exercise
The following is the distribution of meningitis cases , Ibn Al-Khateeb Hospital, 1999.
Make a suitable graphical or pictorial presentation
TotalFemale
No.
Male
No.
Agent
25284168Viral
1264284Bacterial
422121TB
420147273Total
0%
20%
40%
60%
80%
100%
Viral Bacterial TB
%
type
Meningitis cases by type and sex
Series2
Series1