biostatistics:descriptive statistics

146
Descriptive Statistics It a techniques used to organize, summarize, categorize, classify, manipulate, and present a set of data in a concise way to make suitable for . Raw data are measurements or variables that have not been organized, summarized or other wise manipulated. Objective of data organization, summarization manipulation; -To see the similarity and dissimilarity of objects. -To see the important features of the collected data. -To prepare data for summarization and analysis. 8/12/2010 1 Victory College, Faculty of Health Science, Department of Public Health Officer, Biostatistics Lecture Note Prepared By Minlikalew D. (B.Sc.)

Upload: minlik-alew-dejenie

Post on 07-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 1/146

Descriptive StatisticsIt a techniques used to organize, summarize,

categorize, classify, manipulate, and present a set of data in a concise way to make suitable for .Raw data are measurements or variables that have

not been organized, summarized or other wise

manipulated. Objective of data organization, summarization

manipulation;

-To see the similarity and dissimilarity of objects.-To see the important features of the collected data.-To prepare data for summarization and analysis.

8/12/2010 1

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

Page 2: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 2/146

Cont…d

Descriptive statistics include:

Frequency distribution.Tables.Graphs. Numerical summary measures;

- Measures of central tendency.

- Measures of variability.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

2

Page 3: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 3/146

Cont…d

Before summarization, organization,

categorization/classification,

displaying/presentation, analyzation of data, we

need to know;

The concept of data.

The concept of variable.

The concept of measurement and measurement

scale

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)3

Page 4: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 4/146

Cont…dData

Is facts or information which helps for makingreasoning.

Is a collection of observations on one or morevariables.

Is raw material of statistics. Is information collected from the source.

There are different criteria to classify data intodifferent groups.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)4

Page 5: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 5/146

A. Based on the nature of the variable in which the data iscollected;

  I. Qualitative/Categorical/Non-number data:  the datacollected on a qualitative variable and obtained by simple

 possession of certain attribute or characteristics.

Example:

-Breast feeding status (exclusive, partial, and none).

-Whether the mother was employed (yes, no).

-Marital status (single, married, divorced, widowed).

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

5

Page 6: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 6/146

Cont…dNominal data: are categorical data where the order

of the categories is arbitrary. A good example is

race/ethnicity has values 1=White, 2=Hispanic,3=American Indian, 4=Black, 5=Other. Note thatthe order of the categories is arbitrary. Certain

statistical concepts are meaningless for nominaldata. For example it would be silly to ask what arethe mean and standard deviation are for race/ethnicity.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

6

Page 7: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 7/146

Cont…dOrdinal data: are categorical data where there is a logical

ordering to the categories. A good example is the Likert scale

that you see on many surveys: 1=Strongly disagree;2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree. Whilecomputation of a median is easily justified for ordinal data,some statisticians have reservations about computing a mean

for ordinal data.II. Quantitative/number data: the data collected on

quantitative variables and obtained by count or measurement.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

7

Page 8: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 8/146

Cont…dQuantitative/number data Consist of both continuous

and discrete data type.

a.Continuous data: consist of both interval and ratiodata. 

Interval data is continuous data where differences

are interpretable, but where there is no "natural"

zero. A good example is temperature in Fahrenheitdegrees. Ratios are meaningless for interval data. You

cannot say, for example, that one day is twice as hotas another day.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)8

Page 9: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 9/146

Cont…dRatio data: are continuous data where both differences

and ratios are interpretable. Ratio data has a natural zero.

A good example is birth weight in kg.The distinctions between interval and ratio data are subtle,

  but fortunately, this distinction is often not important.

Certain specialized statistics, such as a geometric mean and a coefficient of variation can only be applied to ratio data.

b. Discrete data: quantitative data collected from discrete

variable.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

9

Page 10: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 10/146

Cont…dB. Based on the source of data in which it is collected;

I. Primary Data: are those data, which are collected by theinvestigator himself. Such data are original in character andare mostly generated by census/sample survey conducted byindividuals or research institutions.

II.Secondary Data: are those data, which are collected fromsecondary source, for example journals, reports,government publications, publications of professionals andresearch organizations.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

10

Page 11: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 11/146

Cont…dSource of data

There are different sources of data on health andhealth related conditions. These are; Health Surveys:

Vital statistics: Health Service Records Census:

8/12/2010

Victory College, Faculty of Health Science, Department

of Public Health Officer, Biostatistics Lecture NotePrepared By Minlikalew D. (B.Sc.)

11

Page 12: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 12/146

Cont…dSystems for collecting data

1.Regular system: Registration of events as they become available.

2. Ad hoc system: A form of survey to collectinformation that is not available on regular basis.

Data collection technique/methods

There are different methods of data collection. For 

selection the appropriate method we need toconsider the following points.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

12

Page 13: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 13/146

Cont…d

Selection of data collection methods are based on;

The nature of the investigation whether the study isqualitative or quantitative.

The resources available and its Relevance of theinformation.

Acceptability and Accuracy of the method.The research interest to focus on and cover on.

Familiarization of the procedure.

The characteristics of the study population are under theinfluencing factors.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

13

Page 14: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 14/146

Cont…d

Based on the above selection point the methods are;

For qualitative data:-1. Focus group discussion.

2. In-depth interview (unstructured/ semi-structured).

3. Observation(participant/non-participant)4. Case studies.

5. Rapid appraisal techniques.

6. Nominal group techniques.

7. Delphi techniques and life histories.

8/12/2010

Victory College, Faculty of Health Science, Department

of Public Health Officer, Biostatistics Lecture NotePrepared By Minlikalew D. (B.Sc.)

14

Page 15: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 15/146

Cont…dFor quantitative data:-

1.Face-to-face and interview.

2.self-administered interview.3.Postal or mail method and telephone interview.

4.Measuring height, length, weight, BMI, MUAC, chest circumference, headcircumference, blood pressure, Hgb, Hct.

5.Using available information (record review), e.g. mortality report, morbidityreport.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

15

Page 16: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 16/146

Cont…dDecision-makers need information that is:

 – Relevant,

 – Timely,

 – Accurate and 

 – Usable.

The following table shows comparison of different

data collection techniques in terms of advantageand disadvantage.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)16

Page 17: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 17/146

Cont…d

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 17

Summary of each data collection technique

Technique Advantage Disadvantage

Using available information• Is inexpensive, becausedata is already there.

• Permits examination of trends over the past.

• Data is not always easilyaccessible.

• Ethical issues concerningconfidentiality may

arise.• Information may be

imprecise or incomplete.• Data collection may not

 be standardized.

C d

Page 18: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 18/146

Cont…d

Observing

• Gives more detailed andcontext related information.

• Permits collection of information on facts notmentioned in thequestionnaire.

• Ethical issues concerningconfidentiality or privacymay arise.

• Observer bias may occur (observer may only noticewhat interest him or her).

• The presence of the datacollector can influence thesituation observed.

• Thorough training of 

research assistants isrequired.

Interviewing

• Is suitable for use withilliterates.

• Permits clarification of 

questions.• Has high response rate than

written questionnaires.

• The presence of theinterview can influenceresponses

• Reports of events may beless complete thaninformation gained throughobservations.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

18

Page 19: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 19/146

Cont…dSmall scale flexibleinterview

• Permits collection of data in depthinformation and

exploration,spontaneous remarks byrespondents

• The interviewer mayinadvertently influencethe respondents.

Open ended data isdifficult to analyze.

Large scale fixed interview • Is easy to analyze • Important informationmay be missed becausespontaneous remarks byrespondent are usuallynot recorded or explored.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

19

C d

Page 20: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 20/146

Cont…d

Administering writtenquestionnaires

• Less expensive.• Permits anonymity

and may result in

more honestresponses.

• Does not requireresearch assistants.

• Eliminates bias dueto phrasingquestions differentlywith differentrespondents.

• Cannot be used withilliteraterespondents.

• There is often a lowrate of response

• Questions may bemisunderstood.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

20

C d

Page 21: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 21/146

Cont…dVariable

It is a characteristic which takes different values indifferent persons, places, or things. Any aspect of anindividual or object that is measured (e.g., BP) or recorded (e.g., age, sex) and takes any value. There

may be one variable in a study or many.E.g., A study of treatment outcome of TB.

Variables can be broadly classified into:

 A. Categorical (or Qualitative).

 B. Quantitative (or numerical variables). 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

21

Page 22: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 22/146

Cont…dA. Categorical (or Qualitative)

Variables that can be measured numerically but can bedivided in to different categories are called qualitativeor categorical variable.

A variable that can’t assume a numerical value but can

  be classified in to non-numerical categories accordingto a set of rules.

The notion of magnitude is absent or implicit. 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)22

C d

Page 23: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 23/146

Cont…dThe variable has only two categories are called binary

or dichotomous.  E.g. Sex. The variable with morethan two categories are called  polythumous  . E.g.

Occupational status.

It can be;

1. Nominal: Variables with no inherent order or

ranking sequence, e.g. numbers used as names(group 1, group 2...), gender, etc.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

23

Page 24: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 24/146

Cont…d2. Ordinal: Variables with an ordered series, e.g. "greatly dislike,

moderately dislike, indifferent, moderately like, greatly like". Numbersassigned to such variables indicate rank/order only. The "distance" between the numbers has no meaning.

 B. Quantitative (or numerical variables)

A variable that can assume numerical value and measured numerically.Quantitative data measures either  how much? or  how many? of something, i.e. a set of observations where any single observation is anumber that represents an amount or a count.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

24

Page 25: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 25/146

Quantitative variable has the notion of magnitude. It can be;

1.Discrete

  It can only have a limited number of discrete values(usually whole numbers).

Characterized by gaps or interruptions in the values.

The values aren’t just labels, but are actual measurablequantities.

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)25

Page 26: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 26/146

Example:

The number of episodes of diarrhoea a child hashad in a year. You can’t have 12.5 episodes of 

diarrhoea.

The number of accidents.

The number of students in this class.

The number of cars.

 E.t.c.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

26

C d

Page 27: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 27/146

Cont…d2. Continuous

  It can have an infinite number of possible values in any giveninterval.

Does not possess the gaps or interruptions  Example:

Weight.

 Income.

 Age.

Time. E.t.c.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

27

Page 28: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 28/146

3. Interval

 Do not have a true zero. e.g. 88 degrees is not necessarily double the

temperature of 44 degrees.Equally spaced variables. e.g. temperature. The difference between a

temperature of 66 degrees and 67 degrees is taken to be the same asthe difference between 76 degrees and 77 degrees.

4. Ratio variables

Variables spaced equal intervals with a true zero point, e.g. age.

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

28

C d

Page 29: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 29/146

Cont…d5. Independent variable

It is a hypothesized cause or influence on a dependentvariable. This might be a variable that you control, like atreatment, or a variable not under your control, like anexposure.

6. Dependent variable

The variable that you believe might be influenced or modified by some treatment or exposure or the variable

you are trying to predict. Sometimes the dependentvariable is called the outcome variable.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

29

C t d

Page 30: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 30/146

Cont…dThe definition of dependent and independent variable

depends on the context of the study. For example

the variable that is dependent in one study may beindependent in the other study.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 30

C t d

Page 31: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 31/146

Cont…dMeasurement and Measurement Scale

Measurement: the assignment of numbers or names toobjects or events according to a set of rules. Allmeasurements are not the same.

Measurement Scale: ways in which variables/numbers

are defined and categorized. It is talking about thedegree of precision of which a characteristics measured.Depending on the nature of variable and set of rules

considered to measure variable, there are four scale of measurements.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

31

C t d

Page 32: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 32/146

Cont…dEach scale of measurement has certain properties which

in turn determines the appropriateness for use of 

certain statistical analyses.

1.Nominal scale 

The simplest and lowest/weakest strength level of 

measurement scale than others, in which the values fall intounordered categories or classes.

Uses names, labels, or symbols to assign each measurementand numbers have NO meaning.

Measure always qualitative data.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

32

C t d

Page 33: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 33/146

Cont…d

Characteristics to be fulfilled;

- Each categories should be mutually exclusive.- Each categories should be exhaustive.

- The name or symbols can interchange with

out altering essential information.Example: Blood type, sex, race, marital status, eye

color, type of tar, University attended, occupation,

residence, e.t.c.

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

33

Cont d

Page 34: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 34/146

Cont…d2. Ordinal scale 

Assigns each measurement to one of a limitednumber of categories that are ranked in terms of order.

The difference among categories are notnecessarily equal and often not even measurable.Although non-numerical, can be considered tohave a natural ordering.

It is the next higher level of measurement.

It is used usually for qualitative data.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

34

Page 35: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 35/146

It is subjective in its nature.

Many health care variables are ordinal in nature.

Example:  Patient status, cancer stages, social class, Pain level ,dehydration status, Glasgow coma scale e.t.c.

3. Interval scale 

Measured on a continuum and differences between any twonumbers on a scale are of known size.It assign each measurement to one unlimited number of categories.

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

35

Page 36: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 36/146

It has no true zero point. “0” is arbitrarily chosenand doesn’t reflect the absence of temp.

The distance between each value is equal and fixed but the attribute is not equal.It is used for truly quantitative data.

Examples: Body temperature in OF or OC, directions indegrees, time of the day, IQ.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

36

Cont d

Page 37: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 37/146

Cont…d4. Ratio scale 

Measurement begins at a true zero point and thescale has equal space.It is the highest level of measurement.

It has true zero point.

Used for purely quantitative data.

Examples: Height, weight, BP, e.t.c.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 37

Cont d

Page 38: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 38/146

Cont…d 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

38

D e  gr  e  e  of   pr  e  c i   s i   oni  nm e  a s  ur i  n g

Nominal

 Ordinal

  Interval

  Ratio

Cont d

Page 39: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 39/146

Cont…d

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

39

Summary of each measurement scale

 Nominal Ordinal Interval Ratio

People or objectswith the same scalevalue are the sameon some attribute.

The values of the scalehave no 'numeric'meaning in the waythat you usually think about numbers.

People or objectswith a higher scalevalue have more of some attribute.

The intervals betweenadjacent scale valuesare indeterminate.

Scale assignment is bythe property of "greater than," "equal to," or "less than."

Intervals betweenadjacent scale valuesare equal withrespect the attribute being measured.

E.g., the difference between 8 and 9 is thesame as the difference between 76 and 77.

There is a rationalezero point for thescale.

Ratios are equivalent,e.g., the ratio of 2 to 1is the same as the ratioof 8 to 4.

Cont d

Page 40: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 40/146

Cont…dMethods of Data Organization and Presentation

In most cases, useful information is not immediately evident from the

mass of unsorted data and it does not impart information.Data organization: is making condensed information in a way thatwill show patterns of variation clearly.

Precise methods of analysis can be decided up on only when the

characteristics of the data are understood. For the primary objectiveof this different techniques of data organization are used.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 40

Cont d

Page 41: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 41/146

Cont…d

Objective of data organization

To see the similarity and dissimilarity of objects.To see the important features of the collected data.To prepare data for summarization and analysis.

The methods of organizing and presenting(describing) data differ depending on the type of data/variable whether it is numerical or categorical

that is organized and presented. 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

41

Page 42: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 42/146

1.Describing categorical variables: It includes;

A. Table of frequency distributions – Frequency – Relative frequency

 – Cumulative frequenciesB. Charts

 – Bar charts

 – Pie charts 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

42

Cont d

Page 43: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 43/146

Cont…dFrequency Distributions

• Frequency: It is the number of times each observation(for individual data) or each class interval (for groupeddata) occurs.

Frequency Distributions: is arrangement of data in a

table that shows the possible values of the data with thecorresponding frequency or class frequency.  A simpleand effective way of summarizing categorical data is toconstruct a frequency distribution table.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

43

Cont d

Page 44: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 44/146

Cont…dAdvantages:Data to be more easily appreciated.To draw quick comparisons.To arrange the data in the form of a table, or in one

of a number of different graphical forms.

Types of frequency distribution

I. Simple Frequency Distribution: a table

representing the frequency versus observations.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 44

Page 45: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 45/146

In this table the number of 

days of hospital stayrepresents the variableunder consideration,

 Number of persons

represents thefrequency, and thewhole distribution iscalled simple frequency

distribution.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 45

Hospital stay (days) of 50 patients in amedical ward (Hypothetical data)

Hospital stay (Days)(xi) Frequency (f i)(the number of patients

0 5

1 10

2 2

4 23

5 5

7 5

Page 46: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 46/146

i. Array (Ordered Array)

It is a serial arrangement of numerical data in anascending or descending order.

It is the first step in organizing data. It is appropriate when the number of observation is

greater than 6 and less than 20. It enables to know quickly the smallest and the largest

measurement and the range in the observation.  It is the simplest method.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

46

Page 47: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 47/146

Example: Raw data: 5, 6, 4, 9, 11, 0, 3, 8.

When these data are put in ordered array0, 3, 4,5,6,8,9,11.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 47

Page 48: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 48/146

ii. Categorical distribution

 Non-numerical informationcan also be represented in afrequency distribution.

Example: HIV positive mothers

attended at ANC unit on their future plan for infant feeding.

 

8/12/2010

Victory College, Faculty of Health Science,

Department of Public Health Officer, BiostatisticsLecture Note Prepared By Minlikalew D. (B.Sc.)

48

E.g. Qualitative variables

Mothers plan No of Mother  

Exclusive breastfeeding

100

Replacementfeeding

50

Mixed feeding 30

 Nursery 50

Total 230

Page 49: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 49/146

II. Groups Frequency Distribution

It is the way of representing large sets of data in class

intervals. STEPS IN CONSTRUCTION OF GROUPED

FREQUENCY DISTRIBUTION

1.Choosing the classes. (1st

Put data in ordered array).2.Sorting (or tallying) of the data into these classes.

3.Counting the number of items in each class.

4.Displaying the results in the form of a chart or table.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

49

Cont d

Page 50: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 50/146

Cont…d1. Choosing the classes.

When data consisting of large number of observationsare divided in to certain groups that have definedupper and lower limits, each group is called class.

The size of the class is called class interval.

Choosing the suitable classification involves;a. Determining the appropriate number of class/class

interval.

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

50

Cont d

Page 51: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 51/146

Cont…dThe class/class interval are determined by;

I. Non-statistical method/ convenience method:-choose class not fewer than 6 and more than 20. Theaverage is 15. The class less than 6 is muchsummarized and causes loss of information, theclass greater than 20 does not meet the objective of data organization. the exact number we use in agiven situation depends mainly on the number of 

measurements or observations we have to group.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

51

Cont d

Page 52: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 52/146

Cont…dII. Statistical method:- choose class by using sturges’s formula.

Where K = number of class intervals.n = number of observations.

Example: Sample size are 275, How many class interval is needed?

K=1+3.322(log275)

K= 1+3.322(2.433)=9

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

52

)3.322(logn1K  +=

)3.322(logn1K  +=

Cont d

Page 53: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 53/146

Cont…d Note:

The Sturge’s rule should not be regarded as final, but should

  be considered as a guide only. The number of classesspecified by the rule should be increased or decreased for convenient or clear presentation.

Classes should be mutually exclusive and do not overlap.

We must make sure that the smallest and largest values fallwithin the classification and none of the values can fall into possible gaps.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

53

Cont…d

Page 54: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 54/146

Cont…db. Determine class width.Class width denoted by “W” which is equal for each

class.

Where W=Width of the classR=Range

Xmax=the largest value in the observation.

Xmin=the lowest value in the observation.

K=the number of class.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 54

 K 

 X   X   R minmax

K W

−==

Page 55: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 55/146

Example:

– Leisure time (hours) per week for 40 college students:

23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 1310 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 1812 27 15 21 25 16

  K = 1 + 3.22 (log40) = 6.32 ≈ 6

Maximum value = 38, Minimum value = 10

  Width = (38-10)/6 = 4.66 ≈ 5

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 55

Cont d

Page 56: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 56/146

Cont…d

c. Determining true limit/class boundary.

Class limit: the smallest and largest values that cango in to any class are regarded as its limits; they can be either lower or upper class limits.

True limit/class boundaries are those limits, whichare determined mathematically to make an intervalof a continuous variable which is continuous in bothdirections, and no gap exists between classes. Thetrue limits are what the tabulated limits wouldcorrespond with if one could measure exactly. 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 56

Cont…d

Page 57: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 57/146

Cont…dTrue limit/class boundaries used for smoothening of 

the class intervals.

Obtained by subtract 0.5 from the lower and add it tothe upper limit. This is simple convention.

It can be lower or upper.

d. Determining class mark.

Class mark denoted by “Xc”. It is the mid point of each classes. The formula is;

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 57

2

 LTLUTL Xc

+=

Page 58: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 58/146

Where Xc=class mark.

UTL=Upper True Limit.

LTL=Lower True Limit.

2. Sorting (or tallying) of the data into these classes.

Tally mark are small vertical bars which are used in afrequency table to represent the number of times a

 particular event has appeared in the collected data. Againsta particular class is a particular value has occurred four 

times, we put four tally marks (////) but for the fifthoccurrence we put a cross tally mark 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 58

Cont…d

Page 59: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 59/146

Cont…d

(////) to give it a block of five. When it occurs for thesixth time we put an other tally mark by leavingspace. If we use only continuous tally bars like(//////)there may be confusion in counting and it may leadto mistakes.

3.Counting the number of items in each class.

Relative frequency is the frequency of each classinterval (fi) divided by the total frequency (n). For 

grouped data,

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 59

∑= fin

Cont…d

Page 60: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 60/146

Cumulative Frequencies when frequencies of two or moreclasses are added up. Helps to find the total number of items

whose values are less than or greater than some value. It can be;- Less than cumulative frequency distribution: Cumulativefrequency distribution, if we start the cumulation from the lowestsize of the variable to the highest size. The most common one.

- More than cumulative frequency distribution: If thecumulation is from the highest to the lowest value.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 60

Page 61: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 61/146

Cumulative relative frequency: It is computed byadding subsequent relative frequencies of interest. Itis also possible to calculate cumulative relativefrequency(frc) by dividing cumulative frequency(fc)to total frequency (n) (i.e. frc =fc/n for each class).

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 61

Cont…d

Page 62: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 62/146

Cont…d

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 62

Exercise: Construct grouped frequency distribution. For the following data. Age of patients(years) (n=60) in a diabetic clinic in Addis Ababa, January 2000 is19,82,98,78,30,26,32,66,87,81,40,48,70,61,69,58,60,53,28,54,47,40,

80,56,36,53,65,28,90,95,45,32,34,36,20,62,51,20,17,26,70,81,39,63,33,66,61,77,41,55,76,70,42,67,22,75,24,50,50,44.

Based on the above data construct a table that contains;

1.Class interval/Class. 6.Relative frequency

2.Class boundary. a. Less than relative frequency.

3.Class mark. b. Greater than relative frequency.4.Tally mark. 7. Cumulative relative frequency.

5.Frequency. a. Less than crf.

 b. Greater than crf.

Cont…d

Page 63: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 63/146

Statistical Tables

Statistical table is an orderly and systematic presentation of numerical data in rows and columns.

o Rows are horizontal arrangements of data ,and row heading is termed stub.

o Columns are vertical arrangement of dataand its heading is called caption.

Both simple and grouped frequency distributions can be put in statistical tables.

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 63

Cont…d

Page 64: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 64/146

Cont…d

Almost any quantitative information can be

organized into a table.Tables are useful for demonstrating patterns,exceptions, differences, and other relationships.

In addition, tables usually serve as the basis for   preparing more visual displays of data, such asgraphs and charts, where some of the detail may belost.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 64

Cont…d

Page 65: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 65/146

Parts of table

1. Table number:

 – Serially numbered. – Should be written in the center at the top.

2. Title:

 – Should be written in the center at the top of the table below thetable number.

3. Caption:

 – Refers to the name of the column heading.

 – Is written at the center of the column.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 65

Cont…d

Page 66: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 66/146

4. Stub:

 – Refers to the name of the raw heading. – Written at the extreme left.

5. Body of the table:

 – The numerical data expressed in the table. – When the body is empty, it is called dummy table (table

shell) and the variables are termed dummy variables.

6. Head note:

 – Short statement about all or major parts of the table. – Written below the title in brackets.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 66

Cont…d

Page 67: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 67/146

7. Foot note:

 – If any clarification is needed about the parts of a table. – Written at the bottom of the table. – Indicate source of data.

The following structure shows the placements of various parts of a table.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 67

Cont…d

Page 68: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 68/146

Common Rules of Constructing Tables

Although there are no hard and fast rules to follow, the following

general principles should be addressed in constructing tables.1. It should be as simple as possible.

2. It should be self-explanatory. To create a table

that is self-explanatory, follow the guidelines below:

I. Title should be clear and to the point.

II.Title should answer when & where it is done, & what itexplains about.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

68

Cont…d

Page 69: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 69/146

III. Precede the title with a table number.

IV. Label each row and each column clearly and

concisely and include the units of measurement for the data. Limit the number of variables to three or

less.

V. Totals should be shown either in the top row and thefirst column or in the last row and last column. If youshow percents (%), also give their total (always 100).

VI. Explain any code, abbreviation, or symbol, or exclusion in a footnote.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 69

Cont…d

Page 70: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 70/146

VII. Note the source of the data in a footnote if the data arenot original.

VIII. Put the title at the top of the table.IX. Numerical entities of zero should be explicitly written

rather than indicated by a dash. Dashed are reserved for 

missing or unobserved data.X. In cross-tabulated data (variables put as row and column

headings), the dependent variable should be the columnheading and the independent variable should be the rowheading.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 70

Cont…d

Page 71: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 71/146

3. If the data shows a qualitative variable , theobservations are listed in alphabetical order or their 

degree of importance.4. If the data is time bound, classified by time of 

occurrence, it should be arranged in chronological order.

It starts from the earlier to the latest or vise versa.5. If the data represents places, it may be placed in

alphabetical order or in terms of geographic location.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 71

Cont…d

Page 72: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 72/146

Types of table

Based on the purpose for which the table isdesigned and the complexity of therelationship, a table could be either of;

A. Simple frequency table.B. Cross tabulation.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 72

Cont…d

Page 73: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 73/146

A. Simple frequency table (one-way table):

• Is used when theindividual observationsinvolve only to a single

variable.• The denominators for the percentages are the sumof all observed

frequencies.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 73

 Example:-  Table X: Overallimmunization status of children inAdami Tullu Woreda, Feb. 1999.

Immunizationstatus  Number Percent

 Notimmunized

75 35.7

Partiallyimmunized 57 27.1

Fullyimmunized

78 37.2

Total 210 100.0

Cont…d

Page 74: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 74/146

B. Cross tabulated:

Is used to obtain the frequency distribution of one

variable by the subset of another variable.The decision for the denominator is based on the

variable of interest to be compared over the subset of 

the other variable.Could be two type;

I. Two-way table.

II. High order table.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 74

Cont…d Example:-Table Y: TTimmunization by maritalstatus of the women ofchildbearing age,Addis Ababa town,2006.

Page 75: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 75/146

I. Two-way table:

Shows two variables/characteristics andis formed wheneither the caption or 

the stub is dividedinto two or more parts.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 75

p y g g

 Source: Mikael A. et al Tetanus Toxoid immunization coverageamong women ofchildbearing age inAssendabo town; Bulletin of JIHS, 1996, 7(1): 13-20

Cont…d

Page 76: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 76/146

II. Higher Order Table:

When it is desired torepresent three or morecharacteristics/variables

in a single table.

 Example:-Table Z: Distribution of HealthProfessional by Sex and Residence.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 76

Cont…d

Page 77: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 77/146

Diagrammatic representation of data

• Appropriately drawn graph allows readers to obtainrapidly an overall grasp of the data presented.

• Well designed graphs can be incredibly powerfulmeans of communicating a great deal of information

using visual techniques.• When graphs are poorly designed, they not only do

not effectively convey message, but also they often

mislead and confuse.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 77

Cont..dI f Di i R i

Page 78: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 78/146

Importance of Diagrammatic Representation

Attractiveness.They help in deriving the required information in

less time and without any mental strain.They facilitate comparison.They show unsuspected events and let to actionMemorization.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 78

Cont…dLi it ti f di ti t ti

Page 79: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 79/146

Limitations of diagrammatic presentation:

• Fail to show slight differences.• They are not accurate, provide approximate

information's .• The are not suitable to all statistical data.• They are not used when comparison is not necessary

or impossible.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 79

Cont…dG l l th t l t d b t

Page 80: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 80/146

General rules that are commonly accepted about

construction of graphs:

1.Self-explanatory and as simple as possible.2.Titles are usually placed below the graph and it

should again question What? Where? When?.

3.Legends or keys should be used to differentiatevariables if more than one is shown.

4.The axes label should be placed to read from the left

side and from the bottom.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 80

Page 81: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 81/146

Cont…d

Page 82: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 82/146

Common types of diagrammatic representations

1. Bar graph

It is the easiest and most adaptable general-purposechart.

Bar graph is especially satisfactory for nominal andordinal data.

The heights of bars represent the value of thefrequency (actual number or percentage) for eachcategory.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 82

Cont…d

Page 83: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 83/146

The categories are represented on the baseline (x-axis) at regular intervals and the corresponding

values frequencies or relative frequenciesrepresented on the Y-axis (ordinate) in the case of vertical bar diagram and vis-versa in the case of 

horizontal bar diagram.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 83

Cont…dTi f t ti b h

Page 84: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 84/146

Tips for constructing bar graph:

1. Whenever possible it is better to construct a bar diagram

on a graph paper 2. All bars drawn in any single study should be of the same

width.

3. Leave space between the different bars and should beequal distances.

4. All the bars should rest on the same line called the baseon the x-axis.

5. Whenever possible, it is advisable to draw bars in order of magnitude.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.)

84

Cont…d6 L b l b th l l

Page 85: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 85/146

6. Label both axes clearly.

7. The scale should be started from zero.

8. Use of divided bars is possible to show thecomponent parts.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 85

Cont…d

Page 86: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 86/146

Types of bar graph

A. Simple bar chart:  – It is a one-dimensionaldiagram in which the bar represents the whole of 

the magnitude. – The height or length of 

each bar indicates thesize (frequency) of thefigure represented.

 Example:

Fig. X: Distribution of pediatric patents in ahospital ward by type of admittingdiagnosis in Hospital X, Jan 2000.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 86

Cont…dB D bl b h E l

Page 87: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 87/146

B. Double bar graph:

Used to depict twovariables.

 Example:

Fig. Y: TT Immunization status bymarital status of women 15-49 years,Asendabo town, 1996.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 87

Cont…dC M l i l b h E l

Page 88: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 88/146

C. Multiple bar chart:

 – Represents the relationships

among more than twovariables.

 – The component figures(bars) are shown as separate

 bars adjoining each other. – The height of each bar 

represents the actual valueof the component figure.

 Example:

Fig. X’: Prevalence of cough in schoolchildren by smoking history of childrenand their parents, Town A Jan 2000.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 88

Cont…dD S b di id d ( ) b h

Page 89: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 89/146

D. Sub-divided (component) bar graph:

It is also called segmented bar graph. If a givenmagnitude can be split up into subdivisions, or if thereare different quantities forming the subdivisions of thetotals, simple bars may be subdivided in the ratio of 

the various subdivisions to exhibit the relationship of the parts to the whole. The order in which thecomponents are shown in a "bar" is followed in all

 bars used in the diagram.Are constructed when each total is built up from twoor more component figures.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 89

Cont…d

Page 90: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 90/146

Sub-divided (component) bar graph are two types. These

are;

I. Actual Component 

 Bar Diagrams:When the over all height of 

the bars and the individualcomponent lengths

represent actual figures.

 Example:

Fig.Y’: TT Immunization status bymarital status of women 15-49years, Asendabo town, 1996.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 90

Cont…d

Page 91: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 91/146

II. Percentage

Component Bar 

 Diagram:

Where the individualcomponent lengths

represent the percentageeach component forms theover all total.

Note that a series of such bars

will all be the same totalheight, i.e., 100 percent.

 Example: 

Fig. Z: TT Immunization status by maritalstatus of women 15-49 years, Asendabotown, 1996.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 91

Cont…d2 Pi h t

Page 92: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 92/146

2. Pie chart

Useful for qualitative or quantitative discrete data.Shows a relative frequency for each by dividing a

circle into sectors so that the areas of the sectors are proportional to the frequencies.

Appropriate for variables having six categories, because the circle should not be divided more than sixsectors.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 92

Cont…dM th d f t ti Example:

Page 93: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 93/146

Methods of constructing

pie-chart:

 – Construct a frequency table – Change the frequency in to

 percentage (f/n). – Change the percentage in

degrees.Where degree = percentage ×

360 – Draw a circle and divide it

accordingly

 Example: 

Fig. X: Distribution of Cause of death of females in England &Wales,1999.

8/12/2010

Victory College, Faculty of Health Science,

Department of Public Health Officer,

Biostatistics Lecture Note Prepared ByMinlikalew D. (B.Sc.) 93

Cont…d3 Hi t

Page 94: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 94/146

3.Histogram

Is a special kind of bar graph.

Useful for quantitative continuous data.

Is frequency distributions with continuous classintervals that have been turned in to graphs.

The area of each rectangle represents the frequencyof the corresponding class intervals.

To avoid crowding, you can use class midpoints.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 94

Cont…dI dditi t i lif i Example:

Page 95: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 95/146

In addition to simplifyingcomplex data set,

histogram is importantin depicting the shape(symmetric/skewed)

and location of centraltendency (“averages”)of a frequencydistribution of acontinuous distribution.

 Example:

Source: Knapp RG, Miller MC III: Clinical Epidemiology and

biostatistics: The national Medical series for Independent study.

Williams& Wilkins 1992 Baltimore, Maryland.

f.g.Z:Distribution  of the RBC cholinesterasevalues (μmol/min/ml) obtained from 35workers Exposed to Pesticides.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 95

Cont…d4 Frequency polygon

Page 96: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 96/146

4. Frequency polygon

To draw it connect the midpoints of the tops of the

adjacent rectangles (cells) of the histogram with linesegments a frequency polygon is obtained.

When the polygon is continued to the X-axis just out

side the range of the lengths the total area under the  polygon will be equal to the total area under thehistogram.

It is not essential to draw histogram in order to obtainfrequency polygon.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 96

It b d ith t ti t l f hi t

Page 97: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 97/146

It can be drawn with out erecting rectangles of histogram asfollows:

Methods of constructing frequency polygon: The scale should be marked in the numerical values of the mid-

 points of intervals.

Erect ordinates on the midpoints of the interval - the length or altitude of an ordinate representing the frequency of the class onwhose mid-point it is erected and join the tops of the ordinatesand extend the connecting lines to the scale of sizes.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 97

Cont…dExample of frequency polygon Example of frequency polygon

Page 98: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 98/146

Example of frequency polygon

drawn from histogram.

Fig. z:Frequency polygon for the ages of 2087 mothers with <5 children, AdamiTulu, 2003.

Example of frequency polygon

drawn with out frequency

polygon.

Fig. z’:Frequency polygon for the ages of women at the time of marriage.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 98

N1AGEMOTH

55.050.045.040.035.030.025.020.015.0

700

600

500

400

300

200

100

0

Std. Dev = 6.13

Mean = 27.6

N = 2087.00

A g e o f w o m e n a t th e t im e o f  

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

1 2 1 7 2 2 2 7 3 2 3 7 4 2 4 7

A g

   N   o   o   f   w   o   m   e   n

Cont…d5 Ogive Curve (The Cumulative Frequency Polygon)

Page 99: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 99/146

5.Ogive Curve (The Cumulative Frequency Polygon)

Some times it may be necessary to know the number 

of items whose values are more or less than a certainamount. To get this information it is necessary tochange the form of the frequency distribution from a‘simple’ to a ‘cumulative’ distribution.Ogive curve turns a cumulative frequency distribution in to graphs.Are much more common than frequency polygons.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 99

Cont…dTo construct an Ogive curve:

Page 100: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 100/146

To construct an Ogive curve:

I) Compute the cumulative frequency of the

distribution.II)Prepare a graph with the cumulative frequency on the

vertical axis and the true upper class limits (class

  boundaries) of the interval scaled along the X-axis(horizontal axis). The true lower limit of the lowestclass interval with lowest scores is included in the X-

axis scale; this is also the true upper limit of the nextlower interval having a cumulative frequency of 0.

8/12/2010

Victory College, Faculty of HealthScience, Department of Public

Health Officer, Biostatistics LectureNote Prepared By Minlikalew D. 100

Cont…dExample: Construct Ogive for Ogive Cumulative frequency curve

Page 101: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 101/146

 Example: Construct Ogive for the data below.

Table.X:Heart rate of patients admitted to

Hospital D, 2000.

g q y

Fig.D: Heart rate (beat/minute) of patientsadmitted to Hospital B ,2000.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 101

Cont…dN i l S M

Page 102: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 102/146

Numerical Summary Measures

MCT (Measure of Central Tendency)

A frequency distribution is a general picture of thedistribution of a variable.

But, can’t indicate the average value and the spreadof the values.On the scale of values of a variable there is a certainstage at which the largest number of items tend to

cluster.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 102

Cont…dSince this stage is usually in the centre of distribution

Page 103: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 103/146

Since this stage is usually in the centre of distribution,the tendency of the statistical data to get concentrated

at a certain value is called “central tendency”.The various methods of determining the point aboutwhich the observations tend to concentrate are called

MCT (Measure of Central Tendency).The objective of calculating MCT is to determine asingle figure which may be used to represent the wholedata set.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 103

Page 104: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 104/146

Cont…d3. It should be as close to the maximum number of values as

Page 105: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 105/146

3. It should be as close to the maximum number of values as possible.

4. It should have a definite value.

5. It should not be subjected to complicated and tedious calculations.

6. It should be capable of further algebraic treatment.

7. It should be stable with regard to sampling.

The three most common measures of central tendency are:

 –Mean, Median, and Mode.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 105

Cont…d Arithmetic Mean

Page 106: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 106/146

The arithmetic mean is the measure of central location

you are probably most familiar with.

It is the arithmetic average and is commonly called simply“mean” or “average.”

In formulas, the arithmetic mean is usually represented as μ 

for population mean and , read as “x-bar ” for sample mean.It is the sum of all the observations divided by the total

number of observations.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 106

Cont…dGeneral formula

Page 107: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 107/146

a) Ungrouped mean

   b) Grouped data

In calculating the mean from grouped data, we assume that allvalues falling into a particular class interval are located at themid-point of the interval. It is calculated as follows: 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 107

.n

x

 =x 

then,valuesobservednare x...,,x,xIf n

1=i

i

n21

Cont…dk 

Page 108: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 108/146

where,

k = the number of class intervals.mi= the mid-point of the ith class interval.

f i= the frequency of the ith class interval.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 108

x =

m f 

i i

i=1

i

i=1

Cont…dProperties of the Arithmetic Mean:

Page 109: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 109/146

p

• For a given set of data there is one and only one

arithmetic mean (uniqueness).• Easy to calculate and understand (simplicity).• Influenced by each and every value in a data set.• Greatly affected by the extreme values (Sensitivity).

So, mean is an excellent measure of centraltendency when the distribution is symmetric

(normally or approximately normally distributed).

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 109

Cont…d• Algebraic sum of the deviations of the given values

Page 110: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 110/146

from their arithmetic mean is always zero (Center of 

gravity).• In case of grouped data if any class interval is open,arithmetic mean can not be calculated.

• it is not appropriate for either nominal or ordinal data.• The sum of the squares of deviations from the

arithmetic mean is less than of those computed fromany other point.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

110

Cont…dAdvantages;

Page 111: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 111/146

1) It is based on all values given in the distribution.

2) It is most early understood.3) It is most amenable to algebraic treatment.

Disadvantages;

1) Overly sensitive to extreme values.2) When the distribution has open-end classes, its

computation would be based assumption, and

therefore may not be valid.3) Sometimes it may even look ridiculous (amazing).

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 111

Cont…dExample 1: The heart rates for n=10 patients were as

Page 112: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 112/146

p f p

 follows (beats per minute):

167, 120, 150, 125, 150, 140, 40, 136, 120, 150What is the arithmetic mean for the heart rate of 

these patients?

Ans.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 112

Cont…dExample 2:Compute the mean age of 169 subjects

Page 113: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 113/146

p p g j

from the grouped data.

Ans. Mean = 5810.5/169 = 34.48 years.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 113

Class interval Mid-point (mi) Frequency (f i) mif i

10-19

20-29

30-39

40-49

50-5960-69

14.5

24.5

34.5

44.5

54.564.5

4

66

47

36

124

58.0

1617.0

1621.5

1602.0

654.0

258.0 

Total __ 169 5810.5

Cont…dMedian

Page 114: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 114/146

It is the middle value of an observation when the observationsare listed in an increasing or decreasing order.

a)Ungrouped data

The median is the value which divides the data set into twoequal parts.

If the number of values is odd, the median will be the middlevalue when all values are arranged in order of magnitude with ½of the observations being larger than the median value, and ½smaller.

 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)114

Page 115: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 115/146

Cont…d

Page 116: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 116/146

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)116

Cont…db) Grouped data

Page 117: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 117/146

p

  In calculating the median from grouped data, we

assume that the values within a class-interval areevenly distributed through the interval. – The first step is to locate the class interval in

which it is located.

 – Find n/2 and see a class interval with a minimumcumulative frequency which contains n/2. – Then, use the following formal. 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)117

Cont…d 

n

Page 118: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 118/146

where,

Lm = lower true class boundary of the interval containing the

median.

Fc = cumulative frequency of the interval just above the median

class interval.

f m = frequency of the interval containing the median

W= class interval width.

n = total number of observations.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)118

~x = L

n

2F

f  

Wm

c

m

+− 

 

 

 

 

  

Cont…dExample. Compute the median age of 169 subjects

Page 119: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 119/146

from the grouped data.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)119

Class interval Mid-point (mi) Frequency (f  i) Cum. freq

10-19

20-29

30-39

40-4950-59

60-69

14.5

24.5

34.5

44.554.5

64.5

4

66

47

3612

4

4

70

117

153165

169

Total 169

Cont…d Ans.

Page 120: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 120/146

n/2 = 169/2 = 84.5

n/2 = 84.5 = in the 3rd class intervalLower limit = 29.5, Upper limit = 39.5

Frequency of the class = 47

(n/2 – f c) = 84.5-70 = 14.5

Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)120

Cont…dProperties of the median;

Page 121: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 121/146

•   There is only one median for a given set of data

(uniqueness).• The median is easy to calculate.• Median is a positional average and hence it is

insensitive to very large or very small values.• Median can be calculated even in the case of openend intervals.

• It is determined mainly by the middle points andless sensitive to the remaining data points(weakness).

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)121

Cont…d• It is not a good representative of data if the number of 

Page 122: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 122/146

items is small.

• The median can be used as a summary measure for ordinal, discrete and continuous data, in general however,it is not appropriate for  nominal data.

Advantages

1)It is easily calculated and is not much disturbed byextreme values.

2)It is more typical of the series.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)122

Cont…d3) The median may be located even when the data are

Page 123: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 123/146

incomplete.

4) The median is more nearer to the reality and morerepresentative than the mean.

Disadvantages

1. The median is not so well suited to algebraictreatment as the arithmetic, geometric andharmonic means.

2. It is not so generally familiar as the arithmetic mean

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)123

Cont…dMode

Page 124: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 124/146

• The mode is the most frequently occurring value among all theobservations in a set of data.

• It is not influenced by extreme values.

• It is possible to have more than one mode or no mode.

• It is not a good summary of the majority of the data.

•  The mode can be used as a summary measure fornominal, ordinal, discrete and continuous data, ingeneral however, it is more appropriate for nominaland ordinal data.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)124

Page 125: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 125/146

Cont…da) Ungrouped data

Page 126: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 126/146

g p

• It is a value which occurs most frequently in a set

of values.• If all the values are different there is no mode, on

the other hand, a set of values may have more than

one mode.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared

By Minlikalew D. (B.Sc.)126

 Example 1:

Page 127: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 127/146

• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6

• Mode is 4 “Unimodal” Example 2:

• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8

• There are two modes – 2 & 5

• This distribution is said to be “bi-modal”

 Example 3:

• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12

• No mode, since all the values are different

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)127

b) Grouped data

Page 128: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 128/146

• To find the mode of grouped data, we usually refer 

to the modal class, where the modal class is theclass interval with the highest frequency.

• If a single value for the mode of grouped data must

 be specified, it is taken as the mid-point of themodal class interval. 

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

128

Page 129: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 129/146

Cont…dAlso we can use this formula

Page 130: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 130/146

Mode = L + d1C

d1 + d2Where;

L= is the lower limit of the modal class

d1= is the difference of frequencies in the modal class and the preceding class

d2= is the difference of frequencies in the modal class and thesucceeding class

C= is the class interval of the modal class.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)130

Cont…dProperties of mode;

• The mode can be used as a summary measure for

Page 131: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 131/146

• The mode can be used as a summary measure for nominal, ordinal, discrete and continuous data, in general

however, it is more appropriate for nominal and ordinaldata.• It is not affected by extreme values.• It can be calculated for distributions with open end

classes.• Often its value is not unique.• The main drawback of mode is that often it does not exist.• It is an average of position.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 131

Cont…dAdvantages

Page 132: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 132/146

1. Since it is the most typical value it is the most

descriptive average.2. Since the mode is usually an “actual value”, it indicatesthe precise value of an important part of the series.

3. Used for categorical data to describe the most frequentcategory.

4. Not affected by extreme values.

5. Easy to understand

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)132

Cont…dDisadvantages

Page 133: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 133/146

1. Unless the number of items is fairly large and the

distribution reveals a distinct central tendency, themode has no significance.

2. It is not capable of mathematical treatment.

3. In a small number of items the mode may not exist.4. Some times there may be more than one mode

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)133

Cont…d Exercise: A table showing the protein intake of different families.

Page 134: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 134/146

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

134

Find mean, median, and mode.

  Protein intake/ 

consumption unit/ day (g)

  Mid point of class

intervals

Number of 

 families

fixi Cumulative

frequency

15- 25 20 30 600 30

25-35 30 40 1200 70

35-45 40 100 4000 170

45-55 50 110 5500 280

55-65 60 80 4800 360

65-75 70 30 2100 390

75-85 80 10 800 400

Total 400 19000

Cont…dMeasures of Dispersion

MCT t h t i l d t di b t th

Page 135: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 135/146

MCT are not enough to give a clear understanding about thedistribution of the data.

We need to know something about the variability or spread of the values — whether they tend to be clustered close together,or spread out over a broad range.

Measures of Dispersion: Measures that quantify thevariation or dispersion of a set of data from its central

location.

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

135

Cont…dDispersion refers to the variety exhibited by thevalues of the data.

Page 136: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 136/146

values of the data.The amount may be small when the values are close

together.If all the values are the same, no dispersion.Other synonymous term to Measures of 

Dispersion : – “Measure of Variation”

 – “Measure of Spread”

 – “Measures of Scatter”

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 136

Cont…dMeasures of dispersion include:

Page 137: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 137/146

 –  Range

 – Inter-quartile range

 – Variance

 – Standard deviation

 – Coefficient of variation

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

137

Cont…d1. Range (R)

Page 138: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 138/146

• The difference between the largest and smallest

observations in a sample.• Range = Maximum value – Minimum value

Example – 

 – Data values: 5, 9, 12, 16, 23, 34, 37, 42

 – Range = 42-5 = 37

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

138

Cont…d• Being determined by only the two extreme

b ti f th i li it d b it

Page 139: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 139/146

observations, use of the range is limited because it

tells us nothing about how the data between theextremes are spread.• Further, interpretation of the range depends on the

number of observations- – when the number of observations increase, therange can get larger.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 139

Cont…d2. Percentiles, Quartiles and Inter-quartile Range

• The quartiles are sets of values which divide the

Page 140: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 140/146

The quartiles are sets of values which divide thedistribution into four parts such that there are an

equal number of observations in each part. – Q1 = [(n+1)/4]th

 – Q2 = [2(n+1)/4]th

 – Q3 = [3(n+1)/4]th

• The inter-quartile range is the difference between the

third and the first quartiles. – Q3 - Q1

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 140

Cont…d• Although the inter-quartile range sometimes servesas a useful descriptive measure, it is mathematically

Page 141: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 141/146

as a useful descriptive measure, it is mathematicallyintractable and can also vary considerably from

sample to sample.• Percentiles divide the data into 100 parts of 

observations in each part.• It follows that the 25th percentile is the first quartile,

the 50th percentile is the median and the 75th  percentile is the third quartile.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

141

Cont…d3. Variance

• A good measure of dispersion should make use of all the data.I t iti l d ld b d i d b bi i i th

Page 142: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 142/146

• Intuitively, a good measure could be derived by combining, in some way, thedeviations of each observation from the mean. 

• The variance achieves this by averaging the sum of the squares of the deviations fromthe mean.

8/12/2010

Victory College, Faculty of HealthScience, Department of Public

Health Officer, Biostatistics Lecture

Note Prepared By Minlikalew D.

142

Cont…d• The sample variance of the set x1, x

2, ..., x

nof n

observations with mean isẍ

Page 143: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 143/146

observations with mean isẍ  

Note : The sum of the deviations from the mean iszero, thus it is more useful to square the deviations,add them, find the mean (to get the variance).

8/12/2010

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 143

S(x x)

n - 1

2

i

2

i=1

n

=−∑

Cont…d4. Standard Deviation

• Being the square of the deviations the variance is limited as

Page 144: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 144/146

• Being the square of the deviations, the variance is limited asa descriptive statistic because it is not in the same units as in

the observations. • By taking the square root of the variance, we obtain a

measure of dispersion in the original units. 

Example : We use the data set of 10 numbers (See Page 29):

19 21 20 20 34 22 24 27 27 27 – The range = 34 – 19 = 15 – The first quartile is 20 and the third quartile is 27 – The inter quartile range = 27 – 20 = 7. – The variance is 21.88 – The SD = √21.88 = 4.68.

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

144

Cont…d5. Coefficient of variation

Wh d i t th i bilit i t t

Page 145: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 145/146

When we desire to compare the variability in two sets

of data, the standard deviation which calculates theabsolute variation may lead to false results.The coefficient of variation gives relative variation &

is the best measure used to compare the variability in

two sets of data. Never use SD to compare variability between groups.CV = standard deviation

Mean

8/12/2010

Victory College, Faculty of Health Science, Department of 

Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.)

145

Thanks You!!!

Page 146: Biostatistics:Descriptive Statistics

8/6/2019 Biostatistics:Descriptive Statistics

http://slidepdf.com/reader/full/biostatisticsdescriptive-statistics 146/146

Thanks You!!!

Enjoy it.