biostatistics:descriptive statistics
TRANSCRIPT
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 1/146
Descriptive StatisticsIt a techniques used to organize, summarize,
categorize, classify, manipulate, and present a set of data in a concise way to make suitable for .Raw data are measurements or variables that have
not been organized, summarized or other wise
manipulated. Objective of data organization, summarization
manipulation;
-To see the similarity and dissimilarity of objects.-To see the important features of the collected data.-To prepare data for summarization and analysis.
8/12/2010 1
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 2/146
Cont…d
Descriptive statistics include:
Frequency distribution.Tables.Graphs. Numerical summary measures;
- Measures of central tendency.
- Measures of variability.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
2
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 3/146
Cont…d
Before summarization, organization,
categorization/classification,
displaying/presentation, analyzation of data, we
need to know;
The concept of data.
The concept of variable.
The concept of measurement and measurement
scale
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)3
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 4/146
Cont…dData
Is facts or information which helps for makingreasoning.
Is a collection of observations on one or morevariables.
Is raw material of statistics. Is information collected from the source.
There are different criteria to classify data intodifferent groups.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)4
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 5/146
A. Based on the nature of the variable in which the data iscollected;
I. Qualitative/Categorical/Non-number data: the datacollected on a qualitative variable and obtained by simple
possession of certain attribute or characteristics.
Example:
-Breast feeding status (exclusive, partial, and none).
-Whether the mother was employed (yes, no).
-Marital status (single, married, divorced, widowed).
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
5
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 6/146
Cont…dNominal data: are categorical data where the order
of the categories is arbitrary. A good example is
race/ethnicity has values 1=White, 2=Hispanic,3=American Indian, 4=Black, 5=Other. Note thatthe order of the categories is arbitrary. Certain
statistical concepts are meaningless for nominaldata. For example it would be silly to ask what arethe mean and standard deviation are for race/ethnicity.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
6
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 7/146
Cont…dOrdinal data: are categorical data where there is a logical
ordering to the categories. A good example is the Likert scale
that you see on many surveys: 1=Strongly disagree;2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree. Whilecomputation of a median is easily justified for ordinal data,some statisticians have reservations about computing a mean
for ordinal data.II. Quantitative/number data: the data collected on
quantitative variables and obtained by count or measurement.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
7
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 8/146
Cont…dQuantitative/number data Consist of both continuous
and discrete data type.
a.Continuous data: consist of both interval and ratiodata.
Interval data is continuous data where differences
are interpretable, but where there is no "natural"
zero. A good example is temperature in Fahrenheitdegrees. Ratios are meaningless for interval data. You
cannot say, for example, that one day is twice as hotas another day.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)8
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 9/146
Cont…dRatio data: are continuous data where both differences
and ratios are interpretable. Ratio data has a natural zero.
A good example is birth weight in kg.The distinctions between interval and ratio data are subtle,
but fortunately, this distinction is often not important.
Certain specialized statistics, such as a geometric mean and a coefficient of variation can only be applied to ratio data.
b. Discrete data: quantitative data collected from discrete
variable.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
9
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 10/146
Cont…dB. Based on the source of data in which it is collected;
I. Primary Data: are those data, which are collected by theinvestigator himself. Such data are original in character andare mostly generated by census/sample survey conducted byindividuals or research institutions.
II.Secondary Data: are those data, which are collected fromsecondary source, for example journals, reports,government publications, publications of professionals andresearch organizations.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
10
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 11/146
Cont…dSource of data
There are different sources of data on health andhealth related conditions. These are; Health Surveys:
Vital statistics: Health Service Records Census:
8/12/2010
Victory College, Faculty of Health Science, Department
of Public Health Officer, Biostatistics Lecture NotePrepared By Minlikalew D. (B.Sc.)
11
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 12/146
Cont…dSystems for collecting data
1.Regular system: Registration of events as they become available.
2. Ad hoc system: A form of survey to collectinformation that is not available on regular basis.
Data collection technique/methods
There are different methods of data collection. For
selection the appropriate method we need toconsider the following points.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
12
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 13/146
Cont…d
Selection of data collection methods are based on;
The nature of the investigation whether the study isqualitative or quantitative.
The resources available and its Relevance of theinformation.
Acceptability and Accuracy of the method.The research interest to focus on and cover on.
Familiarization of the procedure.
The characteristics of the study population are under theinfluencing factors.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
13
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 14/146
Cont…d
Based on the above selection point the methods are;
For qualitative data:-1. Focus group discussion.
2. In-depth interview (unstructured/ semi-structured).
3. Observation(participant/non-participant)4. Case studies.
5. Rapid appraisal techniques.
6. Nominal group techniques.
7. Delphi techniques and life histories.
8/12/2010
Victory College, Faculty of Health Science, Department
of Public Health Officer, Biostatistics Lecture NotePrepared By Minlikalew D. (B.Sc.)
14
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 15/146
Cont…dFor quantitative data:-
1.Face-to-face and interview.
2.self-administered interview.3.Postal or mail method and telephone interview.
4.Measuring height, length, weight, BMI, MUAC, chest circumference, headcircumference, blood pressure, Hgb, Hct.
5.Using available information (record review), e.g. mortality report, morbidityreport.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
15
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 16/146
Cont…dDecision-makers need information that is:
– Relevant,
– Timely,
– Accurate and
– Usable.
The following table shows comparison of different
data collection techniques in terms of advantageand disadvantage.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)16
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 17/146
Cont…d
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 17
Summary of each data collection technique
Technique Advantage Disadvantage
Using available information• Is inexpensive, becausedata is already there.
• Permits examination of trends over the past.
• Data is not always easilyaccessible.
• Ethical issues concerningconfidentiality may
arise.• Information may be
imprecise or incomplete.• Data collection may not
be standardized.
C d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 18/146
Cont…d
Observing
• Gives more detailed andcontext related information.
• Permits collection of information on facts notmentioned in thequestionnaire.
• Ethical issues concerningconfidentiality or privacymay arise.
• Observer bias may occur (observer may only noticewhat interest him or her).
• The presence of the datacollector can influence thesituation observed.
• Thorough training of
research assistants isrequired.
Interviewing
• Is suitable for use withilliterates.
• Permits clarification of
questions.• Has high response rate than
written questionnaires.
• The presence of theinterview can influenceresponses
• Reports of events may beless complete thaninformation gained throughobservations.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
18
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 19/146
Cont…dSmall scale flexibleinterview
• Permits collection of data in depthinformation and
exploration,spontaneous remarks byrespondents
• The interviewer mayinadvertently influencethe respondents.
•
Open ended data isdifficult to analyze.
Large scale fixed interview • Is easy to analyze • Important informationmay be missed becausespontaneous remarks byrespondent are usuallynot recorded or explored.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
19
C d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 20/146
Cont…d
Administering writtenquestionnaires
• Less expensive.• Permits anonymity
and may result in
more honestresponses.
• Does not requireresearch assistants.
• Eliminates bias dueto phrasingquestions differentlywith differentrespondents.
• Cannot be used withilliteraterespondents.
• There is often a lowrate of response
• Questions may bemisunderstood.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
20
C d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 21/146
Cont…dVariable
It is a characteristic which takes different values indifferent persons, places, or things. Any aspect of anindividual or object that is measured (e.g., BP) or recorded (e.g., age, sex) and takes any value. There
may be one variable in a study or many.E.g., A study of treatment outcome of TB.
Variables can be broadly classified into:
A. Categorical (or Qualitative).
B. Quantitative (or numerical variables).
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
21
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 22/146
Cont…dA. Categorical (or Qualitative)
Variables that can be measured numerically but can bedivided in to different categories are called qualitativeor categorical variable.
A variable that can’t assume a numerical value but can
be classified in to non-numerical categories accordingto a set of rules.
The notion of magnitude is absent or implicit.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)22
C d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 23/146
Cont…dThe variable has only two categories are called binary
or dichotomous. E.g. Sex. The variable with morethan two categories are called polythumous . E.g.
Occupational status.
It can be;
1. Nominal: Variables with no inherent order or
ranking sequence, e.g. numbers used as names(group 1, group 2...), gender, etc.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
23
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 24/146
Cont…d2. Ordinal: Variables with an ordered series, e.g. "greatly dislike,
moderately dislike, indifferent, moderately like, greatly like". Numbersassigned to such variables indicate rank/order only. The "distance" between the numbers has no meaning.
B. Quantitative (or numerical variables)
A variable that can assume numerical value and measured numerically.Quantitative data measures either how much? or how many? of something, i.e. a set of observations where any single observation is anumber that represents an amount or a count.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
24
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 25/146
Quantitative variable has the notion of magnitude. It can be;
1.Discrete
It can only have a limited number of discrete values(usually whole numbers).
Characterized by gaps or interruptions in the values.
The values aren’t just labels, but are actual measurablequantities.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)25
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 26/146
Example:
The number of episodes of diarrhoea a child hashad in a year. You can’t have 12.5 episodes of
diarrhoea.
The number of accidents.
The number of students in this class.
The number of cars.
E.t.c.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
26
C d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 27/146
Cont…d2. Continuous
It can have an infinite number of possible values in any giveninterval.
Does not possess the gaps or interruptions Example:
Weight.
Income.
Age.
Time. E.t.c.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
27
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 28/146
3. Interval
Do not have a true zero. e.g. 88 degrees is not necessarily double the
temperature of 44 degrees.Equally spaced variables. e.g. temperature. The difference between a
temperature of 66 degrees and 67 degrees is taken to be the same asthe difference between 76 degrees and 77 degrees.
4. Ratio variables
Variables spaced equal intervals with a true zero point, e.g. age.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
28
C d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 29/146
Cont…d5. Independent variable
It is a hypothesized cause or influence on a dependentvariable. This might be a variable that you control, like atreatment, or a variable not under your control, like anexposure.
6. Dependent variable
The variable that you believe might be influenced or modified by some treatment or exposure or the variable
you are trying to predict. Sometimes the dependentvariable is called the outcome variable.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
29
C t d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 30/146
Cont…dThe definition of dependent and independent variable
depends on the context of the study. For example
the variable that is dependent in one study may beindependent in the other study.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 30
C t d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 31/146
Cont…dMeasurement and Measurement Scale
Measurement: the assignment of numbers or names toobjects or events according to a set of rules. Allmeasurements are not the same.
Measurement Scale: ways in which variables/numbers
are defined and categorized. It is talking about thedegree of precision of which a characteristics measured.Depending on the nature of variable and set of rules
considered to measure variable, there are four scale of measurements.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
31
C t d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 32/146
Cont…dEach scale of measurement has certain properties which
in turn determines the appropriateness for use of
certain statistical analyses.
1.Nominal scale
The simplest and lowest/weakest strength level of
measurement scale than others, in which the values fall intounordered categories or classes.
Uses names, labels, or symbols to assign each measurementand numbers have NO meaning.
Measure always qualitative data.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
32
C t d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 33/146
Cont…d
Characteristics to be fulfilled;
- Each categories should be mutually exclusive.- Each categories should be exhaustive.
- The name or symbols can interchange with
out altering essential information.Example: Blood type, sex, race, marital status, eye
color, type of tar, University attended, occupation,
residence, e.t.c.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
33
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 34/146
Cont…d2. Ordinal scale
Assigns each measurement to one of a limitednumber of categories that are ranked in terms of order.
The difference among categories are notnecessarily equal and often not even measurable.Although non-numerical, can be considered tohave a natural ordering.
It is the next higher level of measurement.
It is used usually for qualitative data.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
34
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 35/146
It is subjective in its nature.
Many health care variables are ordinal in nature.
Example: Patient status, cancer stages, social class, Pain level ,dehydration status, Glasgow coma scale e.t.c.
3. Interval scale
Measured on a continuum and differences between any twonumbers on a scale are of known size.It assign each measurement to one unlimited number of categories.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
35
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 36/146
It has no true zero point. “0” is arbitrarily chosenand doesn’t reflect the absence of temp.
The distance between each value is equal and fixed but the attribute is not equal.It is used for truly quantitative data.
Examples: Body temperature in OF or OC, directions indegrees, time of the day, IQ.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
36
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 37/146
Cont…d4. Ratio scale
Measurement begins at a true zero point and thescale has equal space.It is the highest level of measurement.
It has true zero point.
Used for purely quantitative data.
Examples: Height, weight, BP, e.t.c.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 37
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 38/146
Cont…d
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
38
D e gr e e of pr e c i s i oni nm e a s ur i n g
Nominal
Ordinal
Interval
Ratio
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 39/146
Cont…d
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
39
Summary of each measurement scale
Nominal Ordinal Interval Ratio
People or objectswith the same scalevalue are the sameon some attribute.
The values of the scalehave no 'numeric'meaning in the waythat you usually think about numbers.
People or objectswith a higher scalevalue have more of some attribute.
The intervals betweenadjacent scale valuesare indeterminate.
Scale assignment is bythe property of "greater than," "equal to," or "less than."
Intervals betweenadjacent scale valuesare equal withrespect the attribute being measured.
E.g., the difference between 8 and 9 is thesame as the difference between 76 and 77.
There is a rationalezero point for thescale.
Ratios are equivalent,e.g., the ratio of 2 to 1is the same as the ratioof 8 to 4.
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 40/146
Cont…dMethods of Data Organization and Presentation
In most cases, useful information is not immediately evident from the
mass of unsorted data and it does not impart information.Data organization: is making condensed information in a way thatwill show patterns of variation clearly.
Precise methods of analysis can be decided up on only when the
characteristics of the data are understood. For the primary objectiveof this different techniques of data organization are used.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 40
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 41/146
Cont…d
Objective of data organization
To see the similarity and dissimilarity of objects.To see the important features of the collected data.To prepare data for summarization and analysis.
The methods of organizing and presenting(describing) data differ depending on the type of data/variable whether it is numerical or categorical
that is organized and presented.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
41
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 42/146
1.Describing categorical variables: It includes;
A. Table of frequency distributions – Frequency – Relative frequency
– Cumulative frequenciesB. Charts
– Bar charts
– Pie charts
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
42
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 43/146
Cont…dFrequency Distributions
• Frequency: It is the number of times each observation(for individual data) or each class interval (for groupeddata) occurs.
Frequency Distributions: is arrangement of data in a
table that shows the possible values of the data with thecorresponding frequency or class frequency. A simpleand effective way of summarizing categorical data is toconstruct a frequency distribution table.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
43
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 44/146
Cont…dAdvantages:Data to be more easily appreciated.To draw quick comparisons.To arrange the data in the form of a table, or in one
of a number of different graphical forms.
Types of frequency distribution
I. Simple Frequency Distribution: a table
representing the frequency versus observations.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 44
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 45/146
In this table the number of
days of hospital stayrepresents the variableunder consideration,
Number of persons
represents thefrequency, and thewhole distribution iscalled simple frequency
distribution.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 45
Hospital stay (days) of 50 patients in amedical ward (Hypothetical data)
Hospital stay (Days)(xi) Frequency (f i)(the number of patients
0 5
1 10
2 2
4 23
5 5
7 5
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 46/146
i. Array (Ordered Array)
It is a serial arrangement of numerical data in anascending or descending order.
It is the first step in organizing data. It is appropriate when the number of observation is
greater than 6 and less than 20. It enables to know quickly the smallest and the largest
measurement and the range in the observation. It is the simplest method.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
46
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 47/146
Example: Raw data: 5, 6, 4, 9, 11, 0, 3, 8.
When these data are put in ordered array0, 3, 4,5,6,8,9,11.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 47
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 48/146
ii. Categorical distribution
Non-numerical informationcan also be represented in afrequency distribution.
Example: HIV positive mothers
attended at ANC unit on their future plan for infant feeding.
8/12/2010
Victory College, Faculty of Health Science,
Department of Public Health Officer, BiostatisticsLecture Note Prepared By Minlikalew D. (B.Sc.)
48
E.g. Qualitative variables
Mothers plan No of Mother
Exclusive breastfeeding
100
Replacementfeeding
50
Mixed feeding 30
Nursery 50
Total 230
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 49/146
II. Groups Frequency Distribution
It is the way of representing large sets of data in class
intervals. STEPS IN CONSTRUCTION OF GROUPED
FREQUENCY DISTRIBUTION
1.Choosing the classes. (1st
Put data in ordered array).2.Sorting (or tallying) of the data into these classes.
3.Counting the number of items in each class.
4.Displaying the results in the form of a chart or table.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
49
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 50/146
Cont…d1. Choosing the classes.
When data consisting of large number of observationsare divided in to certain groups that have definedupper and lower limits, each group is called class.
The size of the class is called class interval.
Choosing the suitable classification involves;a. Determining the appropriate number of class/class
interval.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
50
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 51/146
Cont…dThe class/class interval are determined by;
I. Non-statistical method/ convenience method:-choose class not fewer than 6 and more than 20. Theaverage is 15. The class less than 6 is muchsummarized and causes loss of information, theclass greater than 20 does not meet the objective of data organization. the exact number we use in agiven situation depends mainly on the number of
measurements or observations we have to group.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
51
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 52/146
Cont…dII. Statistical method:- choose class by using sturges’s formula.
Where K = number of class intervals.n = number of observations.
Example: Sample size are 275, How many class interval is needed?
K=1+3.322(log275)
K= 1+3.322(2.433)=9
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
52
)3.322(logn1K +=
)3.322(logn1K +=
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 53/146
Cont…d Note:
The Sturge’s rule should not be regarded as final, but should
be considered as a guide only. The number of classesspecified by the rule should be increased or decreased for convenient or clear presentation.
Classes should be mutually exclusive and do not overlap.
We must make sure that the smallest and largest values fallwithin the classification and none of the values can fall into possible gaps.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
53
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 54/146
Cont…db. Determine class width.Class width denoted by “W” which is equal for each
class.
Where W=Width of the classR=Range
Xmax=the largest value in the observation.
Xmin=the lowest value in the observation.
K=the number of class.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 54
K
X X R minmax
K W
−==
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 55/146
Example:
– Leisure time (hours) per week for 40 college students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 1310 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 1812 27 15 21 25 16
K = 1 + 3.22 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 55
Cont d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 56/146
Cont…d
c. Determining true limit/class boundary.
Class limit: the smallest and largest values that cango in to any class are regarded as its limits; they can be either lower or upper class limits.
True limit/class boundaries are those limits, whichare determined mathematically to make an intervalof a continuous variable which is continuous in bothdirections, and no gap exists between classes. Thetrue limits are what the tabulated limits wouldcorrespond with if one could measure exactly.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 56
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 57/146
Cont…dTrue limit/class boundaries used for smoothening of
the class intervals.
Obtained by subtract 0.5 from the lower and add it tothe upper limit. This is simple convention.
It can be lower or upper.
d. Determining class mark.
Class mark denoted by “Xc”. It is the mid point of each classes. The formula is;
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 57
2
LTLUTL Xc
+=
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 58/146
Where Xc=class mark.
UTL=Upper True Limit.
LTL=Lower True Limit.
2. Sorting (or tallying) of the data into these classes.
Tally mark are small vertical bars which are used in afrequency table to represent the number of times a
particular event has appeared in the collected data. Againsta particular class is a particular value has occurred four
times, we put four tally marks (////) but for the fifthoccurrence we put a cross tally mark
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 58
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 59/146
Cont…d
(////) to give it a block of five. When it occurs for thesixth time we put an other tally mark by leavingspace. If we use only continuous tally bars like(//////)there may be confusion in counting and it may leadto mistakes.
3.Counting the number of items in each class.
Relative frequency is the frequency of each classinterval (fi) divided by the total frequency (n). For
grouped data,
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 59
∑= fin
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 60/146
Cumulative Frequencies when frequencies of two or moreclasses are added up. Helps to find the total number of items
whose values are less than or greater than some value. It can be;- Less than cumulative frequency distribution: Cumulativefrequency distribution, if we start the cumulation from the lowestsize of the variable to the highest size. The most common one.
- More than cumulative frequency distribution: If thecumulation is from the highest to the lowest value.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 60
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 61/146
Cumulative relative frequency: It is computed byadding subsequent relative frequencies of interest. Itis also possible to calculate cumulative relativefrequency(frc) by dividing cumulative frequency(fc)to total frequency (n) (i.e. frc =fc/n for each class).
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 61
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 62/146
Cont…d
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 62
Exercise: Construct grouped frequency distribution. For the following data. Age of patients(years) (n=60) in a diabetic clinic in Addis Ababa, January 2000 is19,82,98,78,30,26,32,66,87,81,40,48,70,61,69,58,60,53,28,54,47,40,
80,56,36,53,65,28,90,95,45,32,34,36,20,62,51,20,17,26,70,81,39,63,33,66,61,77,41,55,76,70,42,67,22,75,24,50,50,44.
Based on the above data construct a table that contains;
1.Class interval/Class. 6.Relative frequency
2.Class boundary. a. Less than relative frequency.
3.Class mark. b. Greater than relative frequency.4.Tally mark. 7. Cumulative relative frequency.
5.Frequency. a. Less than crf.
b. Greater than crf.
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 63/146
Statistical Tables
Statistical table is an orderly and systematic presentation of numerical data in rows and columns.
o Rows are horizontal arrangements of data ,and row heading is termed stub.
o Columns are vertical arrangement of dataand its heading is called caption.
Both simple and grouped frequency distributions can be put in statistical tables.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 63
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 64/146
Cont…d
Almost any quantitative information can be
organized into a table.Tables are useful for demonstrating patterns,exceptions, differences, and other relationships.
In addition, tables usually serve as the basis for preparing more visual displays of data, such asgraphs and charts, where some of the detail may belost.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 64
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 65/146
Parts of table
1. Table number:
– Serially numbered. – Should be written in the center at the top.
2. Title:
– Should be written in the center at the top of the table below thetable number.
3. Caption:
– Refers to the name of the column heading.
– Is written at the center of the column.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 65
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 66/146
4. Stub:
– Refers to the name of the raw heading. – Written at the extreme left.
5. Body of the table:
– The numerical data expressed in the table. – When the body is empty, it is called dummy table (table
shell) and the variables are termed dummy variables.
6. Head note:
– Short statement about all or major parts of the table. – Written below the title in brackets.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 66
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 67/146
7. Foot note:
– If any clarification is needed about the parts of a table. – Written at the bottom of the table. – Indicate source of data.
The following structure shows the placements of various parts of a table.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 67
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 68/146
Common Rules of Constructing Tables
Although there are no hard and fast rules to follow, the following
general principles should be addressed in constructing tables.1. It should be as simple as possible.
2. It should be self-explanatory. To create a table
that is self-explanatory, follow the guidelines below:
I. Title should be clear and to the point.
II.Title should answer when & where it is done, & what itexplains about.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
68
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 69/146
III. Precede the title with a table number.
IV. Label each row and each column clearly and
concisely and include the units of measurement for the data. Limit the number of variables to three or
less.
V. Totals should be shown either in the top row and thefirst column or in the last row and last column. If youshow percents (%), also give their total (always 100).
VI. Explain any code, abbreviation, or symbol, or exclusion in a footnote.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 69
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 70/146
VII. Note the source of the data in a footnote if the data arenot original.
VIII. Put the title at the top of the table.IX. Numerical entities of zero should be explicitly written
rather than indicated by a dash. Dashed are reserved for
missing or unobserved data.X. In cross-tabulated data (variables put as row and column
headings), the dependent variable should be the columnheading and the independent variable should be the rowheading.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 70
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 71/146
3. If the data shows a qualitative variable , theobservations are listed in alphabetical order or their
degree of importance.4. If the data is time bound, classified by time of
occurrence, it should be arranged in chronological order.
It starts from the earlier to the latest or vise versa.5. If the data represents places, it may be placed in
alphabetical order or in terms of geographic location.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 71
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 72/146
Types of table
Based on the purpose for which the table isdesigned and the complexity of therelationship, a table could be either of;
A. Simple frequency table.B. Cross tabulation.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 72
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 73/146
A. Simple frequency table (one-way table):
• Is used when theindividual observationsinvolve only to a single
variable.• The denominators for the percentages are the sumof all observed
frequencies.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 73
Example:- Table X: Overallimmunization status of children inAdami Tullu Woreda, Feb. 1999.
Immunizationstatus Number Percent
Notimmunized
75 35.7
Partiallyimmunized 57 27.1
Fullyimmunized
78 37.2
Total 210 100.0
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 74/146
B. Cross tabulated:
Is used to obtain the frequency distribution of one
variable by the subset of another variable.The decision for the denominator is based on the
variable of interest to be compared over the subset of
the other variable.Could be two type;
I. Two-way table.
II. High order table.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 74
Cont…d Example:-Table Y: TTimmunization by maritalstatus of the women ofchildbearing age,Addis Ababa town,2006.
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 75/146
I. Two-way table:
Shows two variables/characteristics andis formed wheneither the caption or
the stub is dividedinto two or more parts.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 75
p y g g
Source: Mikael A. et al Tetanus Toxoid immunization coverageamong women ofchildbearing age inAssendabo town; Bulletin of JIHS, 1996, 7(1): 13-20
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 76/146
II. Higher Order Table:
When it is desired torepresent three or morecharacteristics/variables
in a single table.
Example:-Table Z: Distribution of HealthProfessional by Sex and Residence.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 76
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 77/146
Diagrammatic representation of data
• Appropriately drawn graph allows readers to obtainrapidly an overall grasp of the data presented.
• Well designed graphs can be incredibly powerfulmeans of communicating a great deal of information
using visual techniques.• When graphs are poorly designed, they not only do
not effectively convey message, but also they often
mislead and confuse.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 77
Cont..dI f Di i R i
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 78/146
Importance of Diagrammatic Representation
Attractiveness.They help in deriving the required information in
less time and without any mental strain.They facilitate comparison.They show unsuspected events and let to actionMemorization.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 78
Cont…dLi it ti f di ti t ti
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 79/146
Limitations of diagrammatic presentation:
• Fail to show slight differences.• They are not accurate, provide approximate
information's .• The are not suitable to all statistical data.• They are not used when comparison is not necessary
or impossible.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 79
Cont…dG l l th t l t d b t
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 80/146
General rules that are commonly accepted about
construction of graphs:
1.Self-explanatory and as simple as possible.2.Titles are usually placed below the graph and it
should again question What? Where? When?.
3.Legends or keys should be used to differentiatevariables if more than one is shown.
4.The axes label should be placed to read from the left
side and from the bottom.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 80
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 81/146
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 82/146
Common types of diagrammatic representations
1. Bar graph
It is the easiest and most adaptable general-purposechart.
Bar graph is especially satisfactory for nominal andordinal data.
The heights of bars represent the value of thefrequency (actual number or percentage) for eachcategory.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 82
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 83/146
The categories are represented on the baseline (x-axis) at regular intervals and the corresponding
values frequencies or relative frequenciesrepresented on the Y-axis (ordinate) in the case of vertical bar diagram and vis-versa in the case of
horizontal bar diagram.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 83
Cont…dTi f t ti b h
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 84/146
Tips for constructing bar graph:
1. Whenever possible it is better to construct a bar diagram
on a graph paper 2. All bars drawn in any single study should be of the same
width.
3. Leave space between the different bars and should beequal distances.
4. All the bars should rest on the same line called the baseon the x-axis.
5. Whenever possible, it is advisable to draw bars in order of magnitude.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)
84
Cont…d6 L b l b th l l
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 85/146
6. Label both axes clearly.
7. The scale should be started from zero.
8. Use of divided bars is possible to show thecomponent parts.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 85
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 86/146
Types of bar graph
A. Simple bar chart: – It is a one-dimensionaldiagram in which the bar represents the whole of
the magnitude. – The height or length of
each bar indicates thesize (frequency) of thefigure represented.
Example:
Fig. X: Distribution of pediatric patents in ahospital ward by type of admittingdiagnosis in Hospital X, Jan 2000.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 86
Cont…dB D bl b h E l
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 87/146
B. Double bar graph:
Used to depict twovariables.
Example:
Fig. Y: TT Immunization status bymarital status of women 15-49 years,Asendabo town, 1996.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 87
Cont…dC M l i l b h E l
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 88/146
C. Multiple bar chart:
– Represents the relationships
among more than twovariables.
– The component figures(bars) are shown as separate
bars adjoining each other. – The height of each bar
represents the actual valueof the component figure.
Example:
Fig. X’: Prevalence of cough in schoolchildren by smoking history of childrenand their parents, Town A Jan 2000.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 88
Cont…dD S b di id d ( ) b h
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 89/146
D. Sub-divided (component) bar graph:
It is also called segmented bar graph. If a givenmagnitude can be split up into subdivisions, or if thereare different quantities forming the subdivisions of thetotals, simple bars may be subdivided in the ratio of
the various subdivisions to exhibit the relationship of the parts to the whole. The order in which thecomponents are shown in a "bar" is followed in all
bars used in the diagram.Are constructed when each total is built up from twoor more component figures.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 89
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 90/146
Sub-divided (component) bar graph are two types. These
are;
I. Actual Component
Bar Diagrams:When the over all height of
the bars and the individualcomponent lengths
represent actual figures.
Example:
Fig.Y’: TT Immunization status bymarital status of women 15-49years, Asendabo town, 1996.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 90
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 91/146
II. Percentage
Component Bar
Diagram:
Where the individualcomponent lengths
represent the percentageeach component forms theover all total.
Note that a series of such bars
will all be the same totalheight, i.e., 100 percent.
Example:
Fig. Z: TT Immunization status by maritalstatus of women 15-49 years, Asendabotown, 1996.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 91
Cont…d2 Pi h t
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 92/146
2. Pie chart
Useful for qualitative or quantitative discrete data.Shows a relative frequency for each by dividing a
circle into sectors so that the areas of the sectors are proportional to the frequencies.
Appropriate for variables having six categories, because the circle should not be divided more than sixsectors.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 92
Cont…dM th d f t ti Example:
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 93/146
Methods of constructing
pie-chart:
– Construct a frequency table – Change the frequency in to
percentage (f/n). – Change the percentage in
degrees.Where degree = percentage ×
360 – Draw a circle and divide it
accordingly
Example:
Fig. X: Distribution of Cause of death of females in England &Wales,1999.
8/12/2010
Victory College, Faculty of Health Science,
Department of Public Health Officer,
Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 93
Cont…d3 Hi t
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 94/146
3.Histogram
Is a special kind of bar graph.
Useful for quantitative continuous data.
Is frequency distributions with continuous classintervals that have been turned in to graphs.
The area of each rectangle represents the frequencyof the corresponding class intervals.
To avoid crowding, you can use class midpoints.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 94
Cont…dI dditi t i lif i Example:
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 95/146
In addition to simplifyingcomplex data set,
histogram is importantin depicting the shape(symmetric/skewed)
and location of centraltendency (“averages”)of a frequencydistribution of acontinuous distribution.
Example:
Source: Knapp RG, Miller MC III: Clinical Epidemiology and
biostatistics: The national Medical series for Independent study.
Williams& Wilkins 1992 Baltimore, Maryland.
f.g.Z:Distribution of the RBC cholinesterasevalues (μmol/min/ml) obtained from 35workers Exposed to Pesticides.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 95
Cont…d4 Frequency polygon
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 96/146
4. Frequency polygon
To draw it connect the midpoints of the tops of the
adjacent rectangles (cells) of the histogram with linesegments a frequency polygon is obtained.
When the polygon is continued to the X-axis just out
side the range of the lengths the total area under the polygon will be equal to the total area under thehistogram.
It is not essential to draw histogram in order to obtainfrequency polygon.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 96
It b d ith t ti t l f hi t
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 97/146
It can be drawn with out erecting rectangles of histogram asfollows:
Methods of constructing frequency polygon: The scale should be marked in the numerical values of the mid-
points of intervals.
Erect ordinates on the midpoints of the interval - the length or altitude of an ordinate representing the frequency of the class onwhose mid-point it is erected and join the tops of the ordinatesand extend the connecting lines to the scale of sizes.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 97
Cont…dExample of frequency polygon Example of frequency polygon
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 98/146
Example of frequency polygon
drawn from histogram.
Fig. z:Frequency polygon for the ages of 2087 mothers with <5 children, AdamiTulu, 2003.
Example of frequency polygon
drawn with out frequency
polygon.
Fig. z’:Frequency polygon for the ages of women at the time of marriage.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 98
N1AGEMOTH
55.050.045.040.035.030.025.020.015.0
700
600
500
400
300
200
100
0
Std. Dev = 6.13
Mean = 27.6
N = 2087.00
A g e o f w o m e n a t th e t im e o f
0
5
1 0
1 5
2 0
2 5
3 0
3 5
4 0
1 2 1 7 2 2 2 7 3 2 3 7 4 2 4 7
A g
N o o f w o m e n
Cont…d5 Ogive Curve (The Cumulative Frequency Polygon)
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 99/146
5.Ogive Curve (The Cumulative Frequency Polygon)
Some times it may be necessary to know the number
of items whose values are more or less than a certainamount. To get this information it is necessary tochange the form of the frequency distribution from a‘simple’ to a ‘cumulative’ distribution.Ogive curve turns a cumulative frequency distribution in to graphs.Are much more common than frequency polygons.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 99
Cont…dTo construct an Ogive curve:
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 100/146
To construct an Ogive curve:
I) Compute the cumulative frequency of the
distribution.II)Prepare a graph with the cumulative frequency on the
vertical axis and the true upper class limits (class
boundaries) of the interval scaled along the X-axis(horizontal axis). The true lower limit of the lowestclass interval with lowest scores is included in the X-
axis scale; this is also the true upper limit of the nextlower interval having a cumulative frequency of 0.
8/12/2010
Victory College, Faculty of HealthScience, Department of Public
Health Officer, Biostatistics LectureNote Prepared By Minlikalew D. 100
Cont…dExample: Construct Ogive for Ogive Cumulative frequency curve
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 101/146
Example: Construct Ogive for the data below.
Table.X:Heart rate of patients admitted to
Hospital D, 2000.
g q y
Fig.D: Heart rate (beat/minute) of patientsadmitted to Hospital B ,2000.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 101
Cont…dN i l S M
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 102/146
Numerical Summary Measures
MCT (Measure of Central Tendency)
A frequency distribution is a general picture of thedistribution of a variable.
But, can’t indicate the average value and the spreadof the values.On the scale of values of a variable there is a certainstage at which the largest number of items tend to
cluster.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 102
Cont…dSince this stage is usually in the centre of distribution
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 103/146
Since this stage is usually in the centre of distribution,the tendency of the statistical data to get concentrated
at a certain value is called “central tendency”.The various methods of determining the point aboutwhich the observations tend to concentrate are called
MCT (Measure of Central Tendency).The objective of calculating MCT is to determine asingle figure which may be used to represent the wholedata set.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 103
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 104/146
Cont…d3. It should be as close to the maximum number of values as
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 105/146
3. It should be as close to the maximum number of values as possible.
4. It should have a definite value.
5. It should not be subjected to complicated and tedious calculations.
6. It should be capable of further algebraic treatment.
7. It should be stable with regard to sampling.
The three most common measures of central tendency are:
–Mean, Median, and Mode.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 105
Cont…d Arithmetic Mean
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 106/146
The arithmetic mean is the measure of central location
you are probably most familiar with.
It is the arithmetic average and is commonly called simply“mean” or “average.”
In formulas, the arithmetic mean is usually represented as μ
for population mean and , read as “x-bar ” for sample mean.It is the sum of all the observations divided by the total
number of observations.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 106
Cont…dGeneral formula
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 107/146
a) Ungrouped mean
b) Grouped data
In calculating the mean from grouped data, we assume that allvalues falling into a particular class interval are located at themid-point of the interval. It is calculated as follows:
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 107
.n
x
=x
then,valuesobservednare x...,,x,xIf n
1=i
i
n21
∑
Cont…dk
∑
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 108/146
where,
k = the number of class intervals.mi= the mid-point of the ith class interval.
f i= the frequency of the ith class interval.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 108
x =
m f
f
i i
i=1
i
i=1
k
∑
∑
Cont…dProperties of the Arithmetic Mean:
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 109/146
p
• For a given set of data there is one and only one
arithmetic mean (uniqueness).• Easy to calculate and understand (simplicity).• Influenced by each and every value in a data set.• Greatly affected by the extreme values (Sensitivity).
So, mean is an excellent measure of centraltendency when the distribution is symmetric
(normally or approximately normally distributed).
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 109
Cont…d• Algebraic sum of the deviations of the given values
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 110/146
from their arithmetic mean is always zero (Center of
gravity).• In case of grouped data if any class interval is open,arithmetic mean can not be calculated.
• it is not appropriate for either nominal or ordinal data.• The sum of the squares of deviations from the
arithmetic mean is less than of those computed fromany other point.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
110
Cont…dAdvantages;
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 111/146
1) It is based on all values given in the distribution.
2) It is most early understood.3) It is most amenable to algebraic treatment.
Disadvantages;
1) Overly sensitive to extreme values.2) When the distribution has open-end classes, its
computation would be based assumption, and
therefore may not be valid.3) Sometimes it may even look ridiculous (amazing).
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 111
Cont…dExample 1: The heart rates for n=10 patients were as
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 112/146
p f p
follows (beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150What is the arithmetic mean for the heart rate of
these patients?
Ans.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 112
Cont…dExample 2:Compute the mean age of 169 subjects
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 113/146
p p g j
from the grouped data.
Ans. Mean = 5810.5/169 = 34.48 years.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 113
Class interval Mid-point (mi) Frequency (f i) mif i
10-19
20-29
30-39
40-49
50-5960-69
14.5
24.5
34.5
44.5
54.564.5
4
66
47
36
124
58.0
1617.0
1621.5
1602.0
654.0
258.0
Total __ 169 5810.5
Cont…dMedian
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 114/146
It is the middle value of an observation when the observationsare listed in an increasing or decreasing order.
a)Ungrouped data
The median is the value which divides the data set into twoequal parts.
If the number of values is odd, the median will be the middlevalue when all values are arranged in order of magnitude with ½of the observations being larger than the median value, and ½smaller.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)114
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 115/146
Cont…d
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 116/146
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)116
Cont…db) Grouped data
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 117/146
p
In calculating the median from grouped data, we
assume that the values within a class-interval areevenly distributed through the interval. – The first step is to locate the class interval in
which it is located.
– Find n/2 and see a class interval with a minimumcumulative frequency which contains n/2. – Then, use the following formal.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)117
Cont…d
n
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 118/146
where,
Lm = lower true class boundary of the interval containing the
median.
Fc = cumulative frequency of the interval just above the median
class interval.
f m = frequency of the interval containing the median
W= class interval width.
n = total number of observations.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)118
~x = L
n
2F
f
Wm
c
m
+−
Cont…dExample. Compute the median age of 169 subjects
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 119/146
from the grouped data.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)119
Class interval Mid-point (mi) Frequency (f i) Cum. freq
10-19
20-29
30-39
40-4950-59
60-69
14.5
24.5
34.5
44.554.5
64.5
4
66
47
3612
4
4
70
117
153165
169
Total 169
Cont…d Ans.
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 120/146
n/2 = 169/2 = 84.5
n/2 = 84.5 = in the 3rd class intervalLower limit = 29.5, Upper limit = 39.5
Frequency of the class = 47
(n/2 – f c) = 84.5-70 = 14.5
Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)120
Cont…dProperties of the median;
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 121/146
• There is only one median for a given set of data
(uniqueness).• The median is easy to calculate.• Median is a positional average and hence it is
insensitive to very large or very small values.• Median can be calculated even in the case of openend intervals.
• It is determined mainly by the middle points andless sensitive to the remaining data points(weakness).
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)121
Cont…d• It is not a good representative of data if the number of
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 122/146
items is small.
• The median can be used as a summary measure for ordinal, discrete and continuous data, in general however,it is not appropriate for nominal data.
Advantages
1)It is easily calculated and is not much disturbed byextreme values.
2)It is more typical of the series.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)122
Cont…d3) The median may be located even when the data are
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 123/146
incomplete.
4) The median is more nearer to the reality and morerepresentative than the mean.
Disadvantages
1. The median is not so well suited to algebraictreatment as the arithmetic, geometric andharmonic means.
2. It is not so generally familiar as the arithmetic mean
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)123
Cont…dMode
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 124/146
• The mode is the most frequently occurring value among all theobservations in a set of data.
• It is not influenced by extreme values.
• It is possible to have more than one mode or no mode.
• It is not a good summary of the majority of the data.
• The mode can be used as a summary measure fornominal, ordinal, discrete and continuous data, ingeneral however, it is more appropriate for nominaland ordinal data.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)124
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 125/146
Cont…da) Ungrouped data
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 126/146
g p
• It is a value which occurs most frequently in a set
of values.• If all the values are different there is no mode, on
the other hand, a set of values may have more than
one mode.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared
By Minlikalew D. (B.Sc.)126
Example 1:
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 127/146
• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
• Mode is 4 “Unimodal” Example 2:
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes – 2 & 5
• This distribution is said to be “bi-modal”
Example 3:
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)127
b) Grouped data
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 128/146
• To find the mode of grouped data, we usually refer
to the modal class, where the modal class is theclass interval with the highest frequency.
• If a single value for the mode of grouped data must
be specified, it is taken as the mid-point of themodal class interval.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
128
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 129/146
Cont…dAlso we can use this formula
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 130/146
Mode = L + d1C
d1 + d2Where;
L= is the lower limit of the modal class
d1= is the difference of frequencies in the modal class and the preceding class
d2= is the difference of frequencies in the modal class and thesucceeding class
C= is the class interval of the modal class.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)130
Cont…dProperties of mode;
• The mode can be used as a summary measure for
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 131/146
• The mode can be used as a summary measure for nominal, ordinal, discrete and continuous data, in general
however, it is more appropriate for nominal and ordinaldata.• It is not affected by extreme values.• It can be calculated for distributions with open end
classes.• Often its value is not unique.• The main drawback of mode is that often it does not exist.• It is an average of position.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 131
Cont…dAdvantages
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 132/146
1. Since it is the most typical value it is the most
descriptive average.2. Since the mode is usually an “actual value”, it indicatesthe precise value of an important part of the series.
3. Used for categorical data to describe the most frequentcategory.
4. Not affected by extreme values.
5. Easy to understand
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)132
Cont…dDisadvantages
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 133/146
1. Unless the number of items is fairly large and the
distribution reveals a distinct central tendency, themode has no significance.
2. It is not capable of mathematical treatment.
3. In a small number of items the mode may not exist.4. Some times there may be more than one mode
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)133
Cont…d Exercise: A table showing the protein intake of different families.
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 134/146
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
134
Find mean, median, and mode.
Protein intake/
consumption unit/ day (g)
Mid point of class
intervals
Number of
families
fixi Cumulative
frequency
15- 25 20 30 600 30
25-35 30 40 1200 70
35-45 40 100 4000 170
45-55 50 110 5500 280
55-65 60 80 4800 360
65-75 70 30 2100 390
75-85 80 10 800 400
Total 400 19000
Cont…dMeasures of Dispersion
MCT t h t i l d t di b t th
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 135/146
MCT are not enough to give a clear understanding about thedistribution of the data.
We need to know something about the variability or spread of the values — whether they tend to be clustered close together,or spread out over a broad range.
Measures of Dispersion: Measures that quantify thevariation or dispersion of a set of data from its central
location.
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
135
Cont…dDispersion refers to the variety exhibited by thevalues of the data.
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 136/146
values of the data.The amount may be small when the values are close
together.If all the values are the same, no dispersion.Other synonymous term to Measures of
Dispersion : – “Measure of Variation”
– “Measure of Spread”
– “Measures of Scatter”
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 136
Cont…dMeasures of dispersion include:
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 137/146
– Range
– Inter-quartile range
– Variance
– Standard deviation
– Coefficient of variation
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
137
Cont…d1. Range (R)
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 138/146
• The difference between the largest and smallest
observations in a sample.• Range = Maximum value – Minimum value
Example –
– Data values: 5, 9, 12, 16, 23, 34, 37, 42
– Range = 42-5 = 37
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
138
Cont…d• Being determined by only the two extreme
b ti f th i li it d b it
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 139/146
observations, use of the range is limited because it
tells us nothing about how the data between theextremes are spread.• Further, interpretation of the range depends on the
number of observations- – when the number of observations increase, therange can get larger.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 139
Cont…d2. Percentiles, Quartiles and Inter-quartile Range
• The quartiles are sets of values which divide the
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 140/146
The quartiles are sets of values which divide thedistribution into four parts such that there are an
equal number of observations in each part. – Q1 = [(n+1)/4]th
– Q2 = [2(n+1)/4]th
– Q3 = [3(n+1)/4]th
• The inter-quartile range is the difference between the
third and the first quartiles. – Q3 - Q1
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 140
Cont…d• Although the inter-quartile range sometimes servesas a useful descriptive measure, it is mathematically
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 141/146
as a useful descriptive measure, it is mathematicallyintractable and can also vary considerably from
sample to sample.• Percentiles divide the data into 100 parts of
observations in each part.• It follows that the 25th percentile is the first quartile,
the 50th percentile is the median and the 75th percentile is the third quartile.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
141
Cont…d3. Variance
• A good measure of dispersion should make use of all the data.I t iti l d ld b d i d b bi i i th
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 142/146
• Intuitively, a good measure could be derived by combining, in some way, thedeviations of each observation from the mean.
• The variance achieves this by averaging the sum of the squares of the deviations fromthe mean.
8/12/2010
Victory College, Faculty of HealthScience, Department of Public
Health Officer, Biostatistics Lecture
Note Prepared By Minlikalew D.
142
Cont…d• The sample variance of the set x1, x
2, ..., x
nof n
observations with mean isẍ
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 143/146
observations with mean isẍ
Note : The sum of the deviations from the mean iszero, thus it is more useful to square the deviations,add them, find the mean (to get the variance).
8/12/2010
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 143
S(x x)
n - 1
2
i
2
i=1
n
=−∑
Cont…d4. Standard Deviation
• Being the square of the deviations the variance is limited as
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 144/146
• Being the square of the deviations, the variance is limited asa descriptive statistic because it is not in the same units as in
the observations. • By taking the square root of the variance, we obtain a
measure of dispersion in the original units.
Example : We use the data set of 10 numbers (See Page 29):
19 21 20 20 34 22 24 27 27 27 – The range = 34 – 19 = 15 – The first quartile is 20 and the third quartile is 27 – The inter quartile range = 27 – 20 = 7. – The variance is 21.88 – The SD = √21.88 = 4.68.
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
144
Cont…d5. Coefficient of variation
Wh d i t th i bilit i t t
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 145/146
When we desire to compare the variability in two sets
of data, the standard deviation which calculates theabsolute variation may lead to false results.The coefficient of variation gives relative variation &
is the best measure used to compare the variability in
two sets of data. Never use SD to compare variability between groups.CV = standard deviation
Mean
8/12/2010
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.)
145
Thanks You!!!
8/6/2019 Biostatistics:Descriptive Statistics
http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 146/146
Thanks You!!!
Enjoy it.