uses of biostatistics in epidemiology (1)

Uses of Biostatistics

in Epidemiology (1)

Amornrath Podhipak, Ph.D. Department of Epidemiology

Faculty of Public Health Mahidol University

2006

Why Statistics ??

Why Computers ??

Why Software ??

Medical doctors and public health personnel

A tools for calculation

Why do we need “statistics” in medicine and public health? (particularly, epidemiology??)

* Medicine is becoming increasingly quantitative in describing a condit i on.Most of malaria patients are infected with P.falciparum.

82.5% got P.falciparum. Those patients looks pale.

Haemoglobin level was989. mg%, onaverage.

Epidemiology concerns with describing disease patt ern in a group of people. Descriptive statistics give a

clearer picture of what we want to describe.

* The answer to a research question need to be more definite.

Is the new treatment better: how much better?, in what aspect?, any evidence? could it be a real difference?

Inferential statistics give an answer in the world ofuncertainty.

oo ooooooooooooooo oooooooooo( vs )

4 scales of measurement

Qualitative variables- Nominal scale (group classification only)- Ordinal scale (classification with ordering / ranking)

Quantitative variables- Interval (magnitude + constant distance between points)- Ratio (magnitude + constant distance between points + true zero)

Before using statistics, we need some kinds of measurements, in order to get more detailed information.

Weght?80 kg

Height?160 cm

Handsome?

Intelligent?

Income?100,000

Married?

BP? 140/90

HIV?

Female

Male

1

2

Nominal scale

Values have no meaning.

Ordinal scale

12 3

Equal distance between points does not reflect equal interval value.

Interval scale i.e. degr ee cel ci us

Ratio scale i.e. weight

0 Freezing point was supposed to be zero degree celciusNot the true ZERO temperature (no heat )

10 20 30

0True ZERO (nothing here)

10 20 30

Equal distance between points means equal interval value.

Equal distance between points means equal interval value.

Questionnaire(TB and Passive smoking)

ooo o oooooo[ ] [ ]

- - o oo oo[] 16 [ ] 7 9[ ] 9 +

Family income ……………………. Baht/m

Passive Smoking ……...

oo…………………….

- X ray [ ] +ve -[ ] ve

… ………. , oo…………………..

Record form

Variable (characteristic being measured) Result of measurement Type Marital status single/married/divorced nominal

gender male/female nominal

smoking yes/no nominal

smoking nonsmoker/ light smoker/ ordinal

moderate smoker/ heavy smoker

smoking number of cig/day ratio

feeling of pain yes/no nominal

feeling of pain none/light/moderate/high ordinal

feeling of pain --------- 0 10 ordinal

attitude toward strongly agree/ agree/ ordinal

selective abortion not sure/ disagree/ strongly disagree

blood pressure mmHg ratio

temperature degree celcius interval

weight gram ratio

tumor stage I, II, III, IV ordinal

Quantitative (numeric, metric) variables are classified asconti nuous It can take all values in an interval

e.g. weight, temperature, etc.discrete

It can take only certain values (often integer value) e.g. parity, number of sex partners, etc.

Continuous data can be categorised into groups, which one needs to defin e “upper boundary” and “lower boundary” of a value (or a class)

120121122123124125126127

boundaries: o1205 1215 1225 1235 1245

1201. 1202. 1203. 1204.1205. 1206 1207 1208

boundaries: 12015 12025 12035 12045 1205. , . , . , . , .

5…

12011. 12012. 12013. 12014.12015. 12016. 12017. 12018.

boundaries: 120115 120125 120135 120145 1. , . , . , . ,

20155. …

-Descriptive statistics a way to summarize a dataset (a group of measurement)

Example: - oo oooo1 0 0 , 1 0 1 2 .

140140140136141123125134125129

123161142155129130139129134130

140132138142155125136129136153

151141138125123134135135135130

155130134146135139134142139149

147155158135141136136147139132

134140141153142127147142146127

151140151140141147139134140149

132140141142165153146134151151

134141138130141132140138127129

What are values that best describe the height of these 1 0 0 persons?

1) Rearrange the data:

123 123 124 125 125 125 125 127 127 127129 129 129 129 129 130 130 130 130 130132 132 132 132 134 134 134 134 134 134134 134 134 135 135 135 135 135 136 136136 136 136 138 138 138 138 139 139 139139 139 140 140 140 140 140 140 140 140140 140 141 141 141 141 141 141 141 141142 142 142 142 142 142 146 146 146 147147 147 147 149 149 151 151 151 151 151153 153 153 155 155 155 155 158 161 165

Minimum, Maximum, Range, Median, Mode

123 , 165 , 42 , 139, 140Max-Min , Value in the middle, Most repeated value

2) Present in a table (Frequency distribution)

Class Boundaries:(depends on the

boundaries of thesevalues) Height (cm) Mid point (X) f = frequency

119.5-124.5124.5-129.5129.5-134.5134.5-139.5139.5-144.5144.5-149.5149.5-154.5154.5-159.5159.5-164.5164.5-169.5

120-124125-129130-134135-139140-144145-149150-154155-159160-164165-169

122127132137142147152157162167

31218241998511

120 125 130 135 140 145 150 155 160 1650

5

10

15

20

25

170

3) Present in a graph (Histogram)

Frequency

Height (cm)

Methods of data presentation

1. Table

2. Graph

- line graph

- bar chart

- pie chart

- scatter plot

- area graph

- error bar

- histogram

Another set of value for describing a dataset is the MEAN and STANDARD DEVIATION.

Mean indicates the location. Standard deviation indicates the scatterness of data (roughly).

Example: Dataset 1: Age of 6 children4 4 4 4 4 4 Mean=4 . 0 year s

sd = 0 y (no variation) Example: Dataset 2: Age of 6 children

2 2 4 4 6 6 Mean=4 . 0 year s

sd = 1 .7 9 y(with variation)

or, another example: The average body height of these children was 138.9 cm. with stan

dard deviation of 8 .9 cm. The average body height of these children was 138.9 cm. with stan

oo o oo oo o0 2

If we categorize the data into qualitative (tall/short) the proportion would then be calculated.

Descriptive statistics (proportion and/or percentage)Most of the children were less than 150 cm. tall.

85% of them had height less than 152 cm.

A final note on defining a variable and a measurement:

Important things to consider before making any measurement: 1. Do we measure the right thing?

ooooo oooo and CVD 2 . What is the tool that can actually measure whatwe want to measure?

Morphology (measure)indicators % standard weight

body mass index (wt/ht2) tricep skinfold thickness

Wt for age Wt for height

etc. Food intake (ask) Protein calorie intake (ask & calculate)

3 . How valid the instrument?

o ooo ooo ooooooooooooo oooooooo ooo ooo ooooo oooo oooooo oooooo oooooo oooooo o f questions, recall of subjects, certainty of reported amount of food, variability

of ingredients, etc.) Does the information obtained actually reflect fatty food intake? 4 . How precise the instrument?

Does the information precisely estimate the amount of fatty food intake for ea ch individual?

In summary:

Statistics (and epidemiology) deals with a group (the bigger the group, the better the result) of persons (not one individual pati

ent). We look for the characteristics which are most common in thegroup.

Descriptive statistics is used for explaining our sample (or findings).. oo ooo oooooooo o ooo oooo ooo. 80% of them had haemoglobin level less than 10 mg%. The average haemoglobin level was 9.5 mg% with standard devi

1 .5 %.ation of mg

Inferential statistics (Infer to general population of interest)

uses of biostatistics in epidemiology (1)

Documents