biostatistics:descriptive statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 1/146

Descriptive StatisticsIt a techniques used to organize, summarize,

categorize, classify, manipulate, and present a set of data in a concise way to make suitable for .Raw data are measurements or variables that have

not been organized, summarized or other wise

manipulated. Objective of data organization, summarization

manipulation;

-To see the similarity and dissimilarity of objects.-To see the important features of the collected data.-To prepare data for summarization and analysis.

8/12/2010 1

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)



Cont…d

Descriptive statistics include:

Frequency distribution.Tables.Graphs. Numerical summary measures;

- Measures of central tendency.

- Measures of variability.

8/12/2010



2



Cont…d

Before summarization, organization,

categorization/classification,

displaying/presentation, analyzation of data, we

need to know;

The concept of data.

The concept of variable.

The concept of measurement and measurement

scale

8/12/2010


Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)3



Cont…dData

Is facts or information which helps for makingreasoning.

Is a collection of observations on one or morevariables.

Is raw material of statistics. Is information collected from the source.

There are different criteria to classify data intodifferent groups.

8/12/2010






A. Based on the nature of the variable in which the data iscollected;

I. Qualitative/Categorical/Non-number data: the datacollected on a qualitative variable and obtained by simple

possession of certain attribute or characteristics.

Example:

-Breast feeding status (exclusive, partial, and none).

-Whether the mother was employed (yes, no).

-Marital status (single, married, divorced, widowed).

8/12/2010



5



Cont…dNominal data: are categorical data where the order

of the categories is arbitrary. A good example is

race/ethnicity has values 1=White, 2=Hispanic,3=American Indian, 4=Black, 5=Other. Note thatthe order of the categories is arbitrary. Certain

statistical concepts are meaningless for nominaldata. For example it would be silly to ask what arethe mean and standard deviation are for race/ethnicity.

8/12/2010



6



Cont…dOrdinal data: are categorical data where there is a logical

ordering to the categories. A good example is the Likert scale

that you see on many surveys: 1=Strongly disagree;2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree. Whilecomputation of a median is easily justified for ordinal data,some statisticians have reservations about computing a mean

for ordinal data.II. Quantitative/number data: the data collected on

quantitative variables and obtained by count or measurement.

8/12/2010



7



Cont…dQuantitative/number data Consist of both continuous

and discrete data type.

a.Continuous data: consist of both interval and ratiodata.

Interval data is continuous data where differences

are interpretable, but where there is no "natural"

zero. A good example is temperature in Fahrenheitdegrees. Ratios are meaningless for interval data. You

cannot say, for example, that one day is twice as hotas another day.

8/12/2010






Cont…dRatio data: are continuous data where both differences

and ratios are interpretable. Ratio data has a natural zero.

A good example is birth weight in kg.The distinctions between interval and ratio data are subtle,

but fortunately, this distinction is often not important.

Certain specialized statistics, such as a geometric mean and a coefficient of variation can only be applied to ratio data.

b. Discrete data: quantitative data collected from discrete

variable.

8/12/2010



Minlikalew D. (B.Sc.)

9



Cont…dB. Based on the source of data in which it is collected;

I. Primary Data: are those data, which are collected by theinvestigator himself. Such data are original in character andare mostly generated by census/sample survey conducted byindividuals or research institutions.

II.Secondary Data: are those data, which are collected fromsecondary source, for example journals, reports,government publications, publications of professionals andresearch organizations.

8/12/2010



10



Cont…dSource of data

There are different sources of data on health andhealth related conditions. These are; Health Surveys:

Vital statistics: Health Service Records Census:

8/12/2010

Victory College, Faculty of Health Science, Department

of Public Health Officer, Biostatistics Lecture NotePrepared By Minlikalew D. (B.Sc.)

11



Cont…dSystems for collecting data

1.Regular system: Registration of events as they become available.

2. Ad hoc system: A form of survey to collectinformation that is not available on regular basis.

Data collection technique/methods

There are different methods of data collection. For

selection the appropriate method we need toconsider the following points.

8/12/2010



12



Cont…d

Selection of data collection methods are based on;

The nature of the investigation whether the study isqualitative or quantitative.

The resources available and its Relevance of theinformation.

Acceptability and Accuracy of the method.The research interest to focus on and cover on.

Familiarization of the procedure.

The characteristics of the study population are under theinfluencing factors.

8/12/2010



13



Cont…d

Based on the above selection point the methods are;

For qualitative data:-1. Focus group discussion.

2. In-depth interview (unstructured/ semi-structured).

3. Observation(participant/non-participant)4. Case studies.

5. Rapid appraisal techniques.

6. Nominal group techniques.

7. Delphi techniques and life histories.

8/12/2010

Victory College, Faculty of Health Science, Department

of Public Health Officer, Biostatistics Lecture NotePrepared By Minlikalew D. (B.Sc.)

14



Cont…dFor quantitative data:-

1.Face-to-face and interview.

2.self-administered interview.3.Postal or mail method and telephone interview.

4.Measuring height, length, weight, BMI, MUAC, chest circumference, headcircumference, blood pressure, Hgb, Hct.

5.Using available information (record review), e.g. mortality report, morbidityreport.

8/12/2010



15



Cont…dDecision-makers need information that is:

– Relevant,

– Timely,

– Accurate and

– Usable.

The following table shows comparison of different

data collection techniques in terms of advantageand disadvantage.

8/12/2010






Cont…d

8/12/2010


Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 17

Summary of each data collection technique

Technique Advantage Disadvantage

Using available information• Is inexpensive, becausedata is already there.

• Permits examination of trends over the past.

• Data is not always easilyaccessible.

• Ethical issues concerningconfidentiality may

arise.• Information may be

imprecise or incomplete.• Data collection may not

be standardized.

C d



Cont…d

Observing

• Gives more detailed andcontext related information.

• Permits collection of information on facts notmentioned in thequestionnaire.

• Ethical issues concerningconfidentiality or privacymay arise.

• Observer bias may occur (observer may only noticewhat interest him or her).

• The presence of the datacollector can influence thesituation observed.

• Thorough training of

research assistants isrequired.

Interviewing

• Is suitable for use withilliterates.

• Permits clarification of

questions.• Has high response rate than

written questionnaires.

• The presence of theinterview can influenceresponses

• Reports of events may beless complete thaninformation gained throughobservations.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

18



Cont…dSmall scale flexibleinterview

• Permits collection of data in depthinformation and

exploration,spontaneous remarks byrespondents

• The interviewer mayinadvertently influencethe respondents.

•

Open ended data isdifficult to analyze.

Large scale fixed interview • Is easy to analyze • Important informationmay be missed becausespontaneous remarks byrespondent are usuallynot recorded or explored.

8/12/2010



19

C d



Cont…d

Administering writtenquestionnaires

• Less expensive.• Permits anonymity

and may result in

more honestresponses.

• Does not requireresearch assistants.

• Eliminates bias dueto phrasingquestions differentlywith differentrespondents.

• Cannot be used withilliteraterespondents.

• There is often a lowrate of response

• Questions may bemisunderstood.

8/12/2010



20

C d



Cont…dVariable

It is a characteristic which takes different values indifferent persons, places, or things. Any aspect of anindividual or object that is measured (e.g., BP) or recorded (e.g., age, sex) and takes any value. There

may be one variable in a study or many.E.g., A study of treatment outcome of TB.

Variables can be broadly classified into:

A. Categorical (or Qualitative).

B. Quantitative (or numerical variables).

8/12/2010



21



Cont…dA. Categorical (or Qualitative)

Variables that can be measured numerically but can bedivided in to different categories are called qualitativeor categorical variable.

A variable that can’t assume a numerical value but can

be classified in to non-numerical categories accordingto a set of rules.

The notion of magnitude is absent or implicit.

8/12/2010




C d



Cont…dThe variable has only two categories are called binary

or dichotomous. E.g. Sex. The variable with morethan two categories are called polythumous . E.g.

Occupational status.

It can be;

1. Nominal: Variables with no inherent order or

ranking sequence, e.g. numbers used as names(group 1, group 2...), gender, etc.

8/12/2010



23



Cont…d2. Ordinal: Variables with an ordered series, e.g. "greatly dislike,

moderately dislike, indifferent, moderately like, greatly like". Numbersassigned to such variables indicate rank/order only. The "distance" between the numbers has no meaning.

B. Quantitative (or numerical variables)

A variable that can assume numerical value and measured numerically.Quantitative data measures either how much? or how many? of something, i.e. a set of observations where any single observation is anumber that represents an amount or a count.

8/12/2010



24



Quantitative variable has the notion of magnitude. It can be;

1.Discrete

It can only have a limited number of discrete values(usually whole numbers).

Characterized by gaps or interruptions in the values.

The values aren’t just labels, but are actual measurablequantities.

8/12/2010






Example:

The number of episodes of diarrhoea a child hashad in a year. You can’t have 12.5 episodes of

diarrhoea.

The number of accidents.

The number of students in this class.

The number of cars.

E.t.c.

8/12/2010



26

C d



Cont…d2. Continuous

It can have an infinite number of possible values in any giveninterval.

Does not possess the gaps or interruptions Example:

Weight.

Income.

Age.

Time. E.t.c.

8/12/2010



27



3. Interval

Do not have a true zero. e.g. 88 degrees is not necessarily double the

temperature of 44 degrees.Equally spaced variables. e.g. temperature. The difference between a

temperature of 66 degrees and 67 degrees is taken to be the same asthe difference between 76 degrees and 77 degrees.

4. Ratio variables

Variables spaced equal intervals with a true zero point, e.g. age.

8/12/2010



28

C d



Cont…d5. Independent variable

It is a hypothesized cause or influence on a dependentvariable. This might be a variable that you control, like atreatment, or a variable not under your control, like anexposure.

6. Dependent variable

The variable that you believe might be influenced or modified by some treatment or exposure or the variable

you are trying to predict. Sometimes the dependentvariable is called the outcome variable.

8/12/2010



29

C t d



Cont…dThe definition of dependent and independent variable

depends on the context of the study. For example

the variable that is dependent in one study may beindependent in the other study.

8/12/2010



C t d



Cont…dMeasurement and Measurement Scale

Measurement: the assignment of numbers or names toobjects or events according to a set of rules. Allmeasurements are not the same.

Measurement Scale: ways in which variables/numbers

are defined and categorized. It is talking about thedegree of precision of which a characteristics measured.Depending on the nature of variable and set of rules

considered to measure variable, there are four scale of measurements.

8/12/2010



31

C t d



Cont…dEach scale of measurement has certain properties which

in turn determines the appropriateness for use of

certain statistical analyses.

1.Nominal scale

The simplest and lowest/weakest strength level of

measurement scale than others, in which the values fall intounordered categories or classes.

Uses names, labels, or symbols to assign each measurementand numbers have NO meaning.

Measure always qualitative data.

8/12/2010



32

C t d



Cont…d

Characteristics to be fulfilled;

- Each categories should be mutually exclusive.- Each categories should be exhaustive.

- The name or symbols can interchange with

out altering essential information.Example: Blood type, sex, race, marital status, eye

color, type of tar, University attended, occupation,

residence, e.t.c.

8/12/2010



33

Cont d



Cont…d2. Ordinal scale

Assigns each measurement to one of a limitednumber of categories that are ranked in terms of order.

The difference among categories are notnecessarily equal and often not even measurable.Although non-numerical, can be considered tohave a natural ordering.

It is the next higher level of measurement.

It is used usually for qualitative data.

8/12/2010



34



It is subjective in its nature.

Many health care variables are ordinal in nature.

Example: Patient status, cancer stages, social class, Pain level ,dehydration status, Glasgow coma scale e.t.c.

3. Interval scale

Measured on a continuum and differences between any twonumbers on a scale are of known size.It assign each measurement to one unlimited number of categories.

8/12/2010



35



It has no true zero point. “0” is arbitrarily chosenand doesn’t reflect the absence of temp.

The distance between each value is equal and fixed but the attribute is not equal.It is used for truly quantitative data.

Examples: Body temperature in OF or OC, directions indegrees, time of the day, IQ.

8/12/2010



36

Cont d



Cont…d4. Ratio scale

Measurement begins at a true zero point and thescale has equal space.It is the highest level of measurement.

It has true zero point.

Used for purely quantitative data.

Examples: Height, weight, BP, e.t.c.

8/12/2010



Cont d



Cont…d

8/12/2010



38

D e gr e e of pr e c i s i oni nm e a s ur i n g

Nominal

Ordinal

Interval

Ratio

Cont d



Cont…d

8/12/2010



39

Summary of each measurement scale

Nominal Ordinal Interval Ratio

People or objectswith the same scalevalue are the sameon some attribute.

The values of the scalehave no 'numeric'meaning in the waythat you usually think about numbers.

People or objectswith a higher scalevalue have more of some attribute.

The intervals betweenadjacent scale valuesare indeterminate.

Scale assignment is bythe property of "greater than," "equal to," or "less than."

Intervals betweenadjacent scale valuesare equal withrespect the attribute being measured.

E.g., the difference between 8 and 9 is thesame as the difference between 76 and 77.

There is a rationalezero point for thescale.

Ratios are equivalent,e.g., the ratio of 2 to 1is the same as the ratioof 8 to 4.

Cont d



Cont…dMethods of Data Organization and Presentation

In most cases, useful information is not immediately evident from the

mass of unsorted data and it does not impart information.Data organization: is making condensed information in a way thatwill show patterns of variation clearly.

Precise methods of analysis can be decided up on only when the

characteristics of the data are understood. For the primary objectiveof this different techniques of data organization are used.

8/12/2010



Cont d



Cont…d

Objective of data organization

To see the similarity and dissimilarity of objects.To see the important features of the collected data.To prepare data for summarization and analysis.

The methods of organizing and presenting(describing) data differ depending on the type of data/variable whether it is numerical or categorical

that is organized and presented.

8/12/2010



41



1.Describing categorical variables: It includes;

A. Table of frequency distributions – Frequency – Relative frequency

– Cumulative frequenciesB. Charts

– Bar charts

– Pie charts

8/12/2010



42

Cont d



Cont…dFrequency Distributions

• Frequency: It is the number of times each observation(for individual data) or each class interval (for groupeddata) occurs.

Frequency Distributions: is arrangement of data in a

table that shows the possible values of the data with thecorresponding frequency or class frequency. A simpleand effective way of summarizing categorical data is toconstruct a frequency distribution table.

8/12/2010



43

Cont d



Cont…dAdvantages:Data to be more easily appreciated.To draw quick comparisons.To arrange the data in the form of a table, or in one

of a number of different graphical forms.

Types of frequency distribution

I. Simple Frequency Distribution: a table

representing the frequency versus observations.

8/12/2010





In this table the number of

days of hospital stayrepresents the variableunder consideration,

Number of persons

represents thefrequency, and thewhole distribution iscalled simple frequency

distribution.

8/12/2010



Minlikalew D. (B.Sc.) 45

Hospital stay (days) of 50 patients in amedical ward (Hypothetical data)

Hospital stay (Days)(xi) Frequency (f i)(the number of patients

0 5

1 10

2 2

4 23

5 5

7 5



i. Array (Ordered Array)

It is a serial arrangement of numerical data in anascending or descending order.

It is the first step in organizing data. It is appropriate when the number of observation is

greater than 6 and less than 20. It enables to know quickly the smallest and the largest

measurement and the range in the observation. It is the simplest method.

8/12/2010



46



Example: Raw data: 5, 6, 4, 9, 11, 0, 3, 8.

When these data are put in ordered array0, 3, 4,5,6,8,9,11.

8/12/2010





ii. Categorical distribution

Non-numerical informationcan also be represented in afrequency distribution.

Example: HIV positive mothers

attended at ANC unit on their future plan for infant feeding.

8/12/2010

Victory College, Faculty of Health Science,

Department of Public Health Officer, BiostatisticsLecture Note Prepared By Minlikalew D. (B.Sc.)

48

E.g. Qualitative variables

Mothers plan No of Mother

Exclusive breastfeeding

100

Replacementfeeding

50

Mixed feeding 30

Nursery 50

Total 230



II. Groups Frequency Distribution

It is the way of representing large sets of data in class

intervals. STEPS IN CONSTRUCTION OF GROUPED

FREQUENCY DISTRIBUTION

1.Choosing the classes. (1st

Put data in ordered array).2.Sorting (or tallying) of the data into these classes.

3.Counting the number of items in each class.

4.Displaying the results in the form of a chart or table.

8/12/2010



49

Cont d



Cont…d1. Choosing the classes.

When data consisting of large number of observationsare divided in to certain groups that have definedupper and lower limits, each group is called class.

The size of the class is called class interval.

Choosing the suitable classification involves;a. Determining the appropriate number of class/class

interval.

8/12/2010



50

Cont d



Cont…dThe class/class interval are determined by;

I. Non-statistical method/ convenience method:-choose class not fewer than 6 and more than 20. Theaverage is 15. The class less than 6 is muchsummarized and causes loss of information, theclass greater than 20 does not meet the objective of data organization. the exact number we use in agiven situation depends mainly on the number of

measurements or observations we have to group.

8/12/2010



51

Cont d



Cont…dII. Statistical method:- choose class by using sturges’s formula.

Where K = number of class intervals.n = number of observations.

Example: Sample size are 275, How many class interval is needed?

K=1+3.322(log275)

K= 1+3.322(2.433)=9

8/12/2010



52

)3.322(logn1K +=

)3.322(logn1K +=

Cont d



Cont…d Note:

The Sturge’s rule should not be regarded as final, but should

be considered as a guide only. The number of classesspecified by the rule should be increased or decreased for convenient or clear presentation.

Classes should be mutually exclusive and do not overlap.

We must make sure that the smallest and largest values fallwithin the classification and none of the values can fall into possible gaps.

8/12/2010



53

Cont…d



Cont…db. Determine class width.Class width denoted by “W” which is equal for each

class.

Where W=Width of the classR=Range

Xmax=the largest value in the observation.

Xmin=the lowest value in the observation.

K=the number of class.

8/12/2010




K

X X R minmax

K W

−==



Example:

– Leisure time (hours) per week for 40 college students:

23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 1310 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 1812 27 15 21 25 16

K = 1 + 3.22 (log40) = 6.32 ≈ 6

Maximum value = 38, Minimum value = 10

Width = (38-10)/6 = 4.66 ≈ 5

8/12/2010




Cont d



Cont…d

c. Determining true limit/class boundary.

Class limit: the smallest and largest values that cango in to any class are regarded as its limits; they can be either lower or upper class limits.

True limit/class boundaries are those limits, whichare determined mathematically to make an intervalof a continuous variable which is continuous in bothdirections, and no gap exists between classes. Thetrue limits are what the tabulated limits wouldcorrespond with if one could measure exactly.

8/12/2010



Cont…d



Cont…dTrue limit/class boundaries used for smoothening of

the class intervals.

Obtained by subtract 0.5 from the lower and add it tothe upper limit. This is simple convention.

It can be lower or upper.

d. Determining class mark.

Class mark denoted by “Xc”. It is the mid point of each classes. The formula is;

8/12/2010


Health Officer, Biostatistics Lecture Note Prepared By


2

LTLUTL Xc

+=



Where Xc=class mark.

UTL=Upper True Limit.

LTL=Lower True Limit.

2. Sorting (or tallying) of the data into these classes.

Tally mark are small vertical bars which are used in afrequency table to represent the number of times a

particular event has appeared in the collected data. Againsta particular class is a particular value has occurred four

times, we put four tally marks (////) but for the fifthoccurrence we put a cross tally mark

8/12/2010




Cont…d



Cont…d

(////) to give it a block of five. When it occurs for thesixth time we put an other tally mark by leavingspace. If we use only continuous tally bars like(//////)there may be confusion in counting and it may leadto mistakes.

3.Counting the number of items in each class.

Relative frequency is the frequency of each classinterval (fi) divided by the total frequency (n). For

grouped data,

8/12/2010



∑= fin

Cont…d



Cumulative Frequencies when frequencies of two or moreclasses are added up. Helps to find the total number of items

whose values are less than or greater than some value. It can be;- Less than cumulative frequency distribution: Cumulativefrequency distribution, if we start the cumulation from the lowestsize of the variable to the highest size. The most common one.

- More than cumulative frequency distribution: If thecumulation is from the highest to the lowest value.

8/12/2010





Cumulative relative frequency: It is computed byadding subsequent relative frequencies of interest. Itis also possible to calculate cumulative relativefrequency(frc) by dividing cumulative frequency(fc)to total frequency (n) (i.e. frc =fc/n for each class).

8/12/2010



Cont…d



Cont…d

8/12/2010




Exercise: Construct grouped frequency distribution. For the following data. Age of patients(years) (n=60) in a diabetic clinic in Addis Ababa, January 2000 is19,82,98,78,30,26,32,66,87,81,40,48,70,61,69,58,60,53,28,54,47,40,

80,56,36,53,65,28,90,95,45,32,34,36,20,62,51,20,17,26,70,81,39,63,33,66,61,77,41,55,76,70,42,67,22,75,24,50,50,44.

Based on the above data construct a table that contains;

1.Class interval/Class. 6.Relative frequency

2.Class boundary. a. Less than relative frequency.

3.Class mark. b. Greater than relative frequency.4.Tally mark. 7. Cumulative relative frequency.

5.Frequency. a. Less than crf.

b. Greater than crf.

Cont…d



Statistical Tables

Statistical table is an orderly and systematic presentation of numerical data in rows and columns.

o Rows are horizontal arrangements of data ,and row heading is termed stub.

o Columns are vertical arrangement of dataand its heading is called caption.

Both simple and grouped frequency distributions can be put in statistical tables.

8/12/2010



Cont…d



Cont…d

Almost any quantitative information can be

organized into a table.Tables are useful for demonstrating patterns,exceptions, differences, and other relationships.

In addition, tables usually serve as the basis for preparing more visual displays of data, such asgraphs and charts, where some of the detail may belost.

8/12/2010



Cont…d



Parts of table

1. Table number:

– Serially numbered. – Should be written in the center at the top.

2. Title:

– Should be written in the center at the top of the table below thetable number.

3. Caption:

– Refers to the name of the column heading.

– Is written at the center of the column.

8/12/2010




Cont…d



4. Stub:

– Refers to the name of the raw heading. – Written at the extreme left.

5. Body of the table:

– The numerical data expressed in the table. – When the body is empty, it is called dummy table (table

shell) and the variables are termed dummy variables.

6. Head note:

– Short statement about all or major parts of the table. – Written below the title in brackets.

8/12/2010



Cont…d



7. Foot note:

– If any clarification is needed about the parts of a table. – Written at the bottom of the table. – Indicate source of data.

The following structure shows the placements of various parts of a table.

8/12/2010



Cont…d



Common Rules of Constructing Tables

Although there are no hard and fast rules to follow, the following

general principles should be addressed in constructing tables.1. It should be as simple as possible.

2. It should be self-explanatory. To create a table

that is self-explanatory, follow the guidelines below:

I. Title should be clear and to the point.

II.Title should answer when & where it is done, & what itexplains about.

8/12/2010



68

Cont…d



III. Precede the title with a table number.

IV. Label each row and each column clearly and

concisely and include the units of measurement for the data. Limit the number of variables to three or

less.

V. Totals should be shown either in the top row and thefirst column or in the last row and last column. If youshow percents (%), also give their total (always 100).

VI. Explain any code, abbreviation, or symbol, or exclusion in a footnote.

8/12/2010




Cont…d



VII. Note the source of the data in a footnote if the data arenot original.

VIII. Put the title at the top of the table.IX. Numerical entities of zero should be explicitly written

rather than indicated by a dash. Dashed are reserved for

missing or unobserved data.X. In cross-tabulated data (variables put as row and column

headings), the dependent variable should be the columnheading and the independent variable should be the rowheading.

8/12/2010




Cont…d



3. If the data shows a qualitative variable , theobservations are listed in alphabetical order or their

degree of importance.4. If the data is time bound, classified by time of

occurrence, it should be arranged in chronological order.

It starts from the earlier to the latest or vise versa.5. If the data represents places, it may be placed in

alphabetical order or in terms of geographic location.

8/12/2010




Cont…d



Types of table

Based on the purpose for which the table isdesigned and the complexity of therelationship, a table could be either of;

A. Simple frequency table.B. Cross tabulation.

8/12/2010




Cont…d



A. Simple frequency table (one-way table):

• Is used when theindividual observationsinvolve only to a single

variable.• The denominators for the percentages are the sumof all observed

frequencies.

8/12/2010




Example:- Table X: Overallimmunization status of children inAdami Tullu Woreda, Feb. 1999.

Immunizationstatus Number Percent

Notimmunized

75 35.7

Partiallyimmunized 57 27.1

Fullyimmunized

78 37.2

Total 210 100.0

Cont…d



B. Cross tabulated:

Is used to obtain the frequency distribution of one

variable by the subset of another variable.The decision for the denominator is based on the

variable of interest to be compared over the subset of

the other variable.Could be two type;

I. Two-way table.

II. High order table.

8/12/2010




Cont…d Example:-Table Y: TTimmunization by maritalstatus of the women ofchildbearing age,Addis Ababa town,2006.



I. Two-way table:

Shows two variables/characteristics andis formed wheneither the caption or

the stub is dividedinto two or more parts.

8/12/2010




p y g g

Source: Mikael A. et al Tetanus Toxoid immunization coverageamong women ofchildbearing age inAssendabo town; Bulletin of JIHS, 1996, 7(1): 13-20

Cont…d



II. Higher Order Table:

When it is desired torepresent three or morecharacteristics/variables

in a single table.

Example:-Table Z: Distribution of HealthProfessional by Sex and Residence.

8/12/2010




Cont…d



Diagrammatic representation of data

• Appropriately drawn graph allows readers to obtainrapidly an overall grasp of the data presented.

• Well designed graphs can be incredibly powerfulmeans of communicating a great deal of information

using visual techniques.• When graphs are poorly designed, they not only do

not effectively convey message, but also they often

mislead and confuse.

8/12/2010




Cont..dI f Di i R i



Importance of Diagrammatic Representation

Attractiveness.They help in deriving the required information in

less time and without any mental strain.They facilitate comparison.They show unsuspected events and let to actionMemorization.

8/12/2010




Cont…dLi it ti f di ti t ti



Limitations of diagrammatic presentation:

• Fail to show slight differences.• They are not accurate, provide approximate

information's .• The are not suitable to all statistical data.• They are not used when comparison is not necessary

or impossible.

8/12/2010




Cont…dG l l th t l t d b t



General rules that are commonly accepted about

construction of graphs:

1.Self-explanatory and as simple as possible.2.Titles are usually placed below the graph and it

should again question What? Where? When?.

3.Legends or keys should be used to differentiatevariables if more than one is shown.

4.The axes label should be placed to read from the left

side and from the bottom.

8/12/2010






Cont…d



Common types of diagrammatic representations

1. Bar graph

It is the easiest and most adaptable general-purposechart.

Bar graph is especially satisfactory for nominal andordinal data.

The heights of bars represent the value of thefrequency (actual number or percentage) for eachcategory.

8/12/2010




Cont…d



The categories are represented on the baseline (x-axis) at regular intervals and the corresponding

values frequencies or relative frequenciesrepresented on the Y-axis (ordinate) in the case of vertical bar diagram and vis-versa in the case of

horizontal bar diagram.

8/12/2010




Cont…dTi f t ti b h



Tips for constructing bar graph:

1. Whenever possible it is better to construct a bar diagram

on a graph paper 2. All bars drawn in any single study should be of the same

width.

3. Leave space between the different bars and should beequal distances.

4. All the bars should rest on the same line called the baseon the x-axis.

5. Whenever possible, it is advisable to draw bars in order of magnitude.

8/12/2010



84

Cont…d6 L b l b th l l



6. Label both axes clearly.

7. The scale should be started from zero.

8. Use of divided bars is possible to show thecomponent parts.

8/12/2010




Cont…d



Types of bar graph

A. Simple bar chart: – It is a one-dimensionaldiagram in which the bar represents the whole of

the magnitude. – The height or length of

each bar indicates thesize (frequency) of thefigure represented.

Example:

Fig. X: Distribution of pediatric patents in ahospital ward by type of admittingdiagnosis in Hospital X, Jan 2000.

8/12/2010




Cont…dB D bl b h E l



B. Double bar graph:

Used to depict twovariables.

Example:

Fig. Y: TT Immunization status bymarital status of women 15-49 years,Asendabo town, 1996.

8/12/2010




Cont…dC M l i l b h E l



C. Multiple bar chart:

– Represents the relationships

among more than twovariables.

– The component figures(bars) are shown as separate

bars adjoining each other. – The height of each bar

represents the actual valueof the component figure.

Example:

Fig. X’: Prevalence of cough in schoolchildren by smoking history of childrenand their parents, Town A Jan 2000.

8/12/2010




Cont…dD S b di id d ( ) b h



D. Sub-divided (component) bar graph:

It is also called segmented bar graph. If a givenmagnitude can be split up into subdivisions, or if thereare different quantities forming the subdivisions of thetotals, simple bars may be subdivided in the ratio of

the various subdivisions to exhibit the relationship of the parts to the whole. The order in which thecomponents are shown in a "bar" is followed in all

bars used in the diagram.Are constructed when each total is built up from twoor more component figures.

8/12/2010




Cont…d



Sub-divided (component) bar graph are two types. These

are;

I. Actual Component

Bar Diagrams:When the over all height of

the bars and the individualcomponent lengths

represent actual figures.

Example:

Fig.Y’: TT Immunization status bymarital status of women 15-49years, Asendabo town, 1996.

8/12/2010




Cont…d



II. Percentage

Component Bar

Diagram:

Where the individualcomponent lengths

represent the percentageeach component forms theover all total.

Note that a series of such bars

will all be the same totalheight, i.e., 100 percent.

Example:

Fig. Z: TT Immunization status by maritalstatus of women 15-49 years, Asendabotown, 1996.

8/12/2010




Cont…d2 Pi h t



2. Pie chart

Useful for qualitative or quantitative discrete data.Shows a relative frequency for each by dividing a

circle into sectors so that the areas of the sectors are proportional to the frequencies.

Appropriate for variables having six categories, because the circle should not be divided more than sixsectors.

8/12/2010




Cont…dM th d f t ti Example:



Methods of constructing

pie-chart:

– Construct a frequency table – Change the frequency in to

percentage (f/n). – Change the percentage in

degrees.Where degree = percentage ×

360 – Draw a circle and divide it

accordingly

Example:

Fig. X: Distribution of Cause of death of females in England &Wales,1999.

8/12/2010

Victory College, Faculty of Health Science,

Department of Public Health Officer,

Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 93

Cont…d3 Hi t



3.Histogram

Is a special kind of bar graph.

Useful for quantitative continuous data.

Is frequency distributions with continuous classintervals that have been turned in to graphs.

The area of each rectangle represents the frequencyof the corresponding class intervals.

To avoid crowding, you can use class midpoints.

8/12/2010




Cont…dI dditi t i lif i Example:



In addition to simplifyingcomplex data set,

histogram is importantin depicting the shape(symmetric/skewed)

and location of centraltendency (“averages”)of a frequencydistribution of acontinuous distribution.

Example:

Source: Knapp RG, Miller MC III: Clinical Epidemiology and

biostatistics: The national Medical series for Independent study.

Williams& Wilkins 1992 Baltimore, Maryland.

f.g.Z:Distribution of the RBC cholinesterasevalues (μmol/min/ml) obtained from 35workers Exposed to Pesticides.

8/12/2010




Cont…d4 Frequency polygon



4. Frequency polygon

To draw it connect the midpoints of the tops of the

adjacent rectangles (cells) of the histogram with linesegments a frequency polygon is obtained.

When the polygon is continued to the X-axis just out

side the range of the lengths the total area under the polygon will be equal to the total area under thehistogram.

It is not essential to draw histogram in order to obtainfrequency polygon.

8/12/2010




It b d ith t ti t l f hi t



It can be drawn with out erecting rectangles of histogram asfollows:

Methods of constructing frequency polygon: The scale should be marked in the numerical values of the mid-

points of intervals.

Erect ordinates on the midpoints of the interval - the length or altitude of an ordinate representing the frequency of the class onwhose mid-point it is erected and join the tops of the ordinatesand extend the connecting lines to the scale of sizes.

8/12/2010




Cont…dExample of frequency polygon Example of frequency polygon



Example of frequency polygon

drawn from histogram.

Fig. z:Frequency polygon for the ages of 2087 mothers with <5 children, AdamiTulu, 2003.

Example of frequency polygon

drawn with out frequency

polygon.

Fig. z’:Frequency polygon for the ages of women at the time of marriage.

8/12/2010




N1AGEMOTH

55.050.045.040.035.030.025.020.015.0

700

600

500

400

300

200

100

0

Std. Dev = 6.13

Mean = 27.6

N = 2087.00

A g e o f w o m e n a t th e t im e o f

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

1 2 1 7 2 2 2 7 3 2 3 7 4 2 4 7

A g

N o o f w o m e n

Cont…d5 Ogive Curve (The Cumulative Frequency Polygon)



5.Ogive Curve (The Cumulative Frequency Polygon)

Some times it may be necessary to know the number

of items whose values are more or less than a certainamount. To get this information it is necessary tochange the form of the frequency distribution from a‘simple’ to a ‘cumulative’ distribution.Ogive curve turns a cumulative frequency distribution in to graphs.Are much more common than frequency polygons.

8/12/2010




Cont…dTo construct an Ogive curve:



To construct an Ogive curve:

I) Compute the cumulative frequency of the

distribution.II)Prepare a graph with the cumulative frequency on the

vertical axis and the true upper class limits (class

boundaries) of the interval scaled along the X-axis(horizontal axis). The true lower limit of the lowestclass interval with lowest scores is included in the X-

axis scale; this is also the true upper limit of the nextlower interval having a cumulative frequency of 0.

8/12/2010

Victory College, Faculty of HealthScience, Department of Public

Health Officer, Biostatistics LectureNote Prepared By Minlikalew D. 100

Cont…dExample: Construct Ogive for Ogive Cumulative frequency curve



Example: Construct Ogive for the data below.

Table.X:Heart rate of patients admitted to

Hospital D, 2000.

g q y

Fig.D: Heart rate (beat/minute) of patientsadmitted to Hospital B ,2000.

8/12/2010




Cont…dN i l S M



Numerical Summary Measures

MCT (Measure of Central Tendency)

A frequency distribution is a general picture of thedistribution of a variable.

But, can’t indicate the average value and the spreadof the values.On the scale of values of a variable there is a certainstage at which the largest number of items tend to

cluster.

8/12/2010




Cont…dSince this stage is usually in the centre of distribution



Since this stage is usually in the centre of distribution,the tendency of the statistical data to get concentrated

at a certain value is called “central tendency”.The various methods of determining the point aboutwhich the observations tend to concentrate are called

MCT (Measure of Central Tendency).The objective of calculating MCT is to determine asingle figure which may be used to represent the wholedata set.

8/12/2010






Cont…d3. It should be as close to the maximum number of values as



3. It should be as close to the maximum number of values as possible.

4. It should have a definite value.

5. It should not be subjected to complicated and tedious calculations.

6. It should be capable of further algebraic treatment.

7. It should be stable with regard to sampling.

The three most common measures of central tendency are:

–Mean, Median, and Mode.

8/12/2010




Cont…d Arithmetic Mean



The arithmetic mean is the measure of central location

you are probably most familiar with.

It is the arithmetic average and is commonly called simply“mean” or “average.”

In formulas, the arithmetic mean is usually represented as μ

for population mean and , read as “x-bar ” for sample mean.It is the sum of all the observations divided by the total

number of observations.

8/12/2010




Cont…dGeneral formula



a) Ungrouped mean

b) Grouped data

In calculating the mean from grouped data, we assume that allvalues falling into a particular class interval are located at themid-point of the interval. It is calculated as follows:

8/12/2010




.n

x

=x

then,valuesobservednare x...,,x,xIf n

1=i

i

n21

∑

Cont…dk

∑



where,

k = the number of class intervals.mi= the mid-point of the ith class interval.

f i= the frequency of the ith class interval.

8/12/2010




x =

m f

f

i i

i=1

i

i=1

k

∑

∑

Cont…dProperties of the Arithmetic Mean:



p

• For a given set of data there is one and only one

arithmetic mean (uniqueness).• Easy to calculate and understand (simplicity).• Influenced by each and every value in a data set.• Greatly affected by the extreme values (Sensitivity).

So, mean is an excellent measure of centraltendency when the distribution is symmetric

(normally or approximately normally distributed).

8/12/2010




Cont…d• Algebraic sum of the deviations of the given values



from their arithmetic mean is always zero (Center of

gravity).• In case of grouped data if any class interval is open,arithmetic mean can not be calculated.

• it is not appropriate for either nominal or ordinal data.• The sum of the squares of deviations from the

arithmetic mean is less than of those computed fromany other point.

8/12/2010




110

Cont…dAdvantages;



1) It is based on all values given in the distribution.

2) It is most early understood.3) It is most amenable to algebraic treatment.

Disadvantages;

1) Overly sensitive to extreme values.2) When the distribution has open-end classes, its

computation would be based assumption, and

therefore may not be valid.3) Sometimes it may even look ridiculous (amazing).

8/12/2010




Cont…dExample 1: The heart rates for n=10 patients were as



p f p

follows (beats per minute):

167, 120, 150, 125, 150, 140, 40, 136, 120, 150What is the arithmetic mean for the heart rate of

these patients?

Ans.

8/12/2010




Cont…dExample 2:Compute the mean age of 169 subjects



p p g j

from the grouped data.

Ans. Mean = 5810.5/169 = 34.48 years.

8/12/2010




Class interval Mid-point (mi) Frequency (f i) mif i

10-19

20-29

30-39

40-49

50-5960-69

14.5

24.5

34.5

44.5

54.564.5

4

66

47

36

124

58.0

1617.0

1621.5

1602.0

654.0

258.0

Total __ 169 5810.5

Cont…dMedian



It is the middle value of an observation when the observationsare listed in an increasing or decreasing order.

a)Ungrouped data

The median is the value which divides the data set into twoequal parts.

If the number of values is odd, the median will be the middlevalue when all values are arranged in order of magnitude with ½of the observations being larger than the median value, and ½smaller.

8/12/2010






Cont…d



8/12/2010




Cont…db) Grouped data



p

In calculating the median from grouped data, we

assume that the values within a class-interval areevenly distributed through the interval. – The first step is to locate the class interval in

which it is located.

– Find n/2 and see a class interval with a minimumcumulative frequency which contains n/2. – Then, use the following formal.

8/12/2010




Cont…d

n



where,

Lm = lower true class boundary of the interval containing the

median.

Fc = cumulative frequency of the interval just above the median

class interval.

f m = frequency of the interval containing the median

W= class interval width.

n = total number of observations.

8/12/2010




~x = L

n

2F

f

Wm

c

m

+−

Cont…dExample. Compute the median age of 169 subjects



from the grouped data.

8/12/2010




Class interval Mid-point (mi) Frequency (f i) Cum. freq

10-19

20-29

30-39

40-4950-59

60-69

14.5

24.5

34.5

44.554.5

64.5

4

66

47

3612

4

4

70

117

153165

169

Total 169

Cont…d Ans.



n/2 = 169/2 = 84.5

n/2 = 84.5 = in the 3rd class intervalLower limit = 29.5, Upper limit = 39.5

Frequency of the class = 47

(n/2 – f c) = 84.5-70 = 14.5

Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

8/12/2010




Cont…dProperties of the median;



• There is only one median for a given set of data

(uniqueness).• The median is easy to calculate.• Median is a positional average and hence it is

insensitive to very large or very small values.• Median can be calculated even in the case of openend intervals.

• It is determined mainly by the middle points andless sensitive to the remaining data points(weakness).

8/12/2010




Cont…d• It is not a good representative of data if the number of



items is small.

• The median can be used as a summary measure for ordinal, discrete and continuous data, in general however,it is not appropriate for nominal data.

Advantages

1)It is easily calculated and is not much disturbed byextreme values.

2)It is more typical of the series.

8/12/2010




Cont…d3) The median may be located even when the data are



incomplete.

4) The median is more nearer to the reality and morerepresentative than the mean.

Disadvantages

1. The median is not so well suited to algebraictreatment as the arithmetic, geometric andharmonic means.

2. It is not so generally familiar as the arithmetic mean

8/12/2010




Cont…dMode



• The mode is the most frequently occurring value among all theobservations in a set of data.

• It is not influenced by extreme values.

• It is possible to have more than one mode or no mode.

• It is not a good summary of the majority of the data.

• The mode can be used as a summary measure fornominal, ordinal, discrete and continuous data, ingeneral however, it is more appropriate for nominaland ordinal data.

8/12/2010






Cont…da) Ungrouped data



g p

• It is a value which occurs most frequently in a set

of values.• If all the values are different there is no mode, on

the other hand, a set of values may have more than

one mode.

8/12/2010


Public Health Officer, Biostatistics Lecture Note Prepared

By Minlikalew D. (B.Sc.)126

Example 1:



• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6

• Mode is 4 “Unimodal” Example 2:

• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8

• There are two modes – 2 & 5

• This distribution is said to be “bi-modal”

Example 3:

• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12

• No mode, since all the values are different

8/12/2010




b) Grouped data



• To find the mode of grouped data, we usually refer

to the modal class, where the modal class is theclass interval with the highest frequency.

• If a single value for the mode of grouped data must

be specified, it is taken as the mid-point of themodal class interval.

8/12/2010




128



Cont…dAlso we can use this formula



Mode = L + d1C

d1 + d2Where;

L= is the lower limit of the modal class

d1= is the difference of frequencies in the modal class and the preceding class

d2= is the difference of frequencies in the modal class and thesucceeding class

C= is the class interval of the modal class.

8/12/2010




Cont…dProperties of mode;

• The mode can be used as a summary measure for



• The mode can be used as a summary measure for nominal, ordinal, discrete and continuous data, in general

however, it is more appropriate for nominal and ordinaldata.• It is not affected by extreme values.• It can be calculated for distributions with open end

classes.• Often its value is not unique.• The main drawback of mode is that often it does not exist.• It is an average of position.

8/12/2010




Cont…dAdvantages



1. Since it is the most typical value it is the most

descriptive average.2. Since the mode is usually an “actual value”, it indicatesthe precise value of an important part of the series.

3. Used for categorical data to describe the most frequentcategory.

4. Not affected by extreme values.

5. Easy to understand

8/12/2010




Cont…dDisadvantages



1. Unless the number of items is fairly large and the

distribution reveals a distinct central tendency, themode has no significance.

2. It is not capable of mathematical treatment.

3. In a small number of items the mode may not exist.4. Some times there may be more than one mode

8/12/2010




Cont…d Exercise: A table showing the protein intake of different families.



8/12/2010




134

Find mean, median, and mode.

Protein intake/

consumption unit/ day (g)

Mid point of class

intervals

Number of

families

fixi Cumulative

frequency

15- 25 20 30 600 30

25-35 30 40 1200 70

35-45 40 100 4000 170

45-55 50 110 5500 280

55-65 60 80 4800 360

65-75 70 30 2100 390

75-85 80 10 800 400

Total 400 19000

Cont…dMeasures of Dispersion

MCT t h t i l d t di b t th



MCT are not enough to give a clear understanding about thedistribution of the data.

We need to know something about the variability or spread of the values — whether they tend to be clustered close together,or spread out over a broad range.

Measures of Dispersion: Measures that quantify thevariation or dispersion of a set of data from its central

location.

8/12/2010




135

Cont…dDispersion refers to the variety exhibited by thevalues of the data.



values of the data.The amount may be small when the values are close

together.If all the values are the same, no dispersion.Other synonymous term to Measures of

Dispersion : – “Measure of Variation”

– “Measure of Spread”

– “Measures of Scatter”

8/12/2010




Cont…dMeasures of dispersion include:



– Range

– Inter-quartile range

– Variance

– Standard deviation

– Coefficient of variation

8/12/2010




137

Cont…d1. Range (R)



• The difference between the largest and smallest

observations in a sample.• Range = Maximum value – Minimum value

Example –

– Data values: 5, 9, 12, 16, 23, 34, 37, 42

– Range = 42-5 = 37

8/12/2010




138

Cont…d• Being determined by only the two extreme

b ti f th i li it d b it



observations, use of the range is limited because it

tells us nothing about how the data between theextremes are spread.• Further, interpretation of the range depends on the

number of observations- – when the number of observations increase, therange can get larger.

8/12/2010




Cont…d2. Percentiles, Quartiles and Inter-quartile Range

• The quartiles are sets of values which divide the



The quartiles are sets of values which divide thedistribution into four parts such that there are an

equal number of observations in each part. – Q1 = [(n+1)/4]th

– Q2 = [2(n+1)/4]th

– Q3 = [3(n+1)/4]th

• The inter-quartile range is the difference between the

third and the first quartiles. – Q3 - Q1

8/12/2010




Cont…d• Although the inter-quartile range sometimes servesas a useful descriptive measure, it is mathematically



as a useful descriptive measure, it is mathematicallyintractable and can also vary considerably from

sample to sample.• Percentiles divide the data into 100 parts of

observations in each part.• It follows that the 25th percentile is the first quartile,

the 50th percentile is the median and the 75th percentile is the third quartile.

8/12/2010




141

Cont…d3. Variance

• A good measure of dispersion should make use of all the data.I t iti l d ld b d i d b bi i i th



• Intuitively, a good measure could be derived by combining, in some way, thedeviations of each observation from the mean.

• The variance achieves this by averaging the sum of the squares of the deviations fromthe mean.

8/12/2010

Victory College, Faculty of HealthScience, Department of Public

Health Officer, Biostatistics Lecture

Note Prepared By Minlikalew D.

142

Cont…d• The sample variance of the set x1, x

2, ..., x

nof n

observations with mean isẍ



observations with mean isẍ

Note : The sum of the deviations from the mean iszero, thus it is more useful to square the deviations,add them, find the mean (to get the variance).

8/12/2010




S(x x)

n - 1

2

i

2

i=1

n

=−∑

Cont…d4. Standard Deviation

• Being the square of the deviations the variance is limited as



• Being the square of the deviations, the variance is limited asa descriptive statistic because it is not in the same units as in

the observations. • By taking the square root of the variance, we obtain a

measure of dispersion in the original units.

Example : We use the data set of 10 numbers (See Page 29):

19 21 20 20 34 22 24 27 27 27 – The range = 34 – 19 = 15 – The first quartile is 20 and the third quartile is 27 – The inter quartile range = 27 – 20 = 7. – The variance is 21.88 – The SD = √21.88 = 4.68.

8/12/2010




144

Cont…d5. Coefficient of variation

Wh d i t th i bilit i t t



When we desire to compare the variability in two sets

of data, the standard deviation which calculates theabsolute variation may lead to false results.The coefficient of variation gives relative variation &

is the best measure used to compare the variability in

two sets of data. Never use SD to compare variability between groups.CV = standard deviation

Mean

8/12/2010




145

Thanks You!!!



Thanks You!!!

Enjoy it.

biostatistics:descriptive statistics

Documents