uses of biostatistics in epidemiology (1)
DESCRIPTION
Uses of Biostatistics in Epidemiology (1). Amornrath Podhipak, Ph.D. Department of Epidemiology Faculty of Public Health Mahidol University 2006. Medical doctors and public health personnel. Why Statistics ?? Why Computers ?? Why Software ??. A tools for calculation. - PowerPoint PPT PresentationTRANSCRIPT
Uses of Biostatistics
in Epidemiology (1)
Amornrath Podhipak, Ph.D. Department of Epidemiology
Faculty of Public Health Mahidol University
2006
Why Statistics ??
Why Computers ??
Why Software ??
Medical doctors and public health personnel
A tools for calculation
Why do we need “statistics” in medicine and public health? (particularly, epidemiology??)
* Medicine is becoming increasingly quantitative in describing a condit i on.Most of malaria patients are infected with P.falciparum.
82.5% got P.falciparum. Those patients looks pale.
Haemoglobin level was989. mg%, onaverage.
Epidemiology concerns with describing disease patt ern in a group of people. Descriptive statistics give a
clearer picture of what we want to describe.
* The answer to a research question need to be more definite.
Is the new treatment better: how much better?, in what aspect?, any evidence? could it be a real difference?
Inferential statistics give an answer in the world ofuncertainty.
oo ooooooooooooooo oooooooooo( vs )
4 scales of measurement
Qualitative variables- Nominal scale (group classification only)- Ordinal scale (classification with ordering / ranking)
Quantitative variables- Interval (magnitude + constant distance between points)- Ratio (magnitude + constant distance between points + true zero)
Before using statistics, we need some kinds of measurements, in order to get more detailed information.
Weght?80 kg
Height?160 cm
Handsome?
Intelligent?
Income?100,000
Married?
BP? 140/90
HIV?
Female
Male
1
2
Nominal scale
Values have no meaning.
Ordinal scale
12 3
Equal distance between points does not reflect equal interval value.
Interval scale i.e. degr ee cel ci us
Ratio scale i.e. weight
0 Freezing point was supposed to be zero degree celciusNot the true ZERO temperature (no heat )
10 20 30
0True ZERO (nothing here)
10 20 30
Equal distance between points means equal interval value.
Equal distance between points means equal interval value.
Questionnaire(TB and Passive smoking)
ooo o oooooo[ ] [ ]
- - o oo oo[] 16 [ ] 7 9[ ] 9 +
Family income ……………………. Baht/m
Passive Smoking ……...
oo…………………….
- X ray [ ] +ve -[ ] ve
… ………. , oo…………………..
Record form
Variable (characteristic being measured) Result of measurement Type Marital status single/married/divorced nominal
gender male/female nominal
smoking yes/no nominal
smoking nonsmoker/ light smoker/ ordinal
moderate smoker/ heavy smoker
smoking number of cig/day ratio
feeling of pain yes/no nominal
feeling of pain none/light/moderate/high ordinal
feeling of pain --------- 0 10 ordinal
attitude toward strongly agree/ agree/ ordinal
selective abortion not sure/ disagree/ strongly disagree
blood pressure mmHg ratio
temperature degree celcius interval
weight gram ratio
tumor stage I, II, III, IV ordinal
Quantitative (numeric, metric) variables are classified asconti nuous It can take all values in an interval
e.g. weight, temperature, etc.discrete
It can take only certain values (often integer value) e.g. parity, number of sex partners, etc.
Continuous data can be categorised into groups, which one needs to defin e “upper boundary” and “lower boundary” of a value (or a class)
120121122123124125126127
boundaries: o1205 1215 1225 1235 1245
1201. 1202. 1203. 1204.1205. 1206 1207 1208
boundaries: 12015 12025 12035 12045 1205. , . , . , . , .
5…
12011. 12012. 12013. 12014.12015. 12016. 12017. 12018.
boundaries: 120115 120125 120135 120145 1. , . , . , . ,
20155. …
-Descriptive statistics a way to summarize a dataset (a group of measurement)
Example: - oo oooo1 0 0 , 1 0 1 2 .
140140140136141123125134125129
123161142155129130139129134130
140132138142155125136129136153
151141138125123134135135135130
155130134146135139134142139149
147155158135141136136147139132
134140141153142127147142146127
151140151140141147139134140149
132140141142165153146134151151
134141138130141132140138127129
What are values that best describe the height of these 1 0 0 persons?
1) Rearrange the data:
123 123 124 125 125 125 125 127 127 127129 129 129 129 129 130 130 130 130 130132 132 132 132 134 134 134 134 134 134134 134 134 135 135 135 135 135 136 136136 136 136 138 138 138 138 139 139 139139 139 140 140 140 140 140 140 140 140140 140 141 141 141 141 141 141 141 141142 142 142 142 142 142 146 146 146 147147 147 147 149 149 151 151 151 151 151153 153 153 155 155 155 155 158 161 165
Minimum, Maximum, Range, Median, Mode
123 , 165 , 42 , 139, 140Max-Min , Value in the middle, Most repeated value
2) Present in a table (Frequency distribution)
Class Boundaries:(depends on the
boundaries of thesevalues) Height (cm) Mid point (X) f = frequency
119.5-124.5124.5-129.5129.5-134.5134.5-139.5139.5-144.5144.5-149.5149.5-154.5154.5-159.5159.5-164.5164.5-169.5
120-124125-129130-134135-139140-144145-149150-154155-159160-164165-169
122127132137142147152157162167
31218241998511
120 125 130 135 140 145 150 155 160 1650
5
10
15
20
25
170
3) Present in a graph (Histogram)
Frequency
Height (cm)
Methods of data presentation
1. Table
2. Graph
- line graph
- bar chart
- pie chart
- scatter plot
- area graph
- error bar
- histogram
Another set of value for describing a dataset is the MEAN and STANDARD DEVIATION.
Mean indicates the location. Standard deviation indicates the scatterness of data (roughly).
Example: Dataset 1: Age of 6 children4 4 4 4 4 4 Mean=4 . 0 year s
sd = 0 y (no variation) Example: Dataset 2: Age of 6 children
2 2 4 4 6 6 Mean=4 . 0 year s
sd = 1 .7 9 y(with variation)
or, another example: The average body height of these children was 138.9 cm. with stan
dard deviation of 8 .9 cm. The average body height of these children was 138.9 cm. with stan
oo o oo oo o0 2
If we categorize the data into qualitative (tall/short) the proportion would then be calculated.
Descriptive statistics (proportion and/or percentage)Most of the children were less than 150 cm. tall.
85% of them had height less than 152 cm.
A final note on defining a variable and a measurement:
Important things to consider before making any measurement: 1. Do we measure the right thing?
ooooo oooo and CVD 2 . What is the tool that can actually measure whatwe want to measure?
Morphology (measure)indicators % standard weight
body mass index (wt/ht2) tricep skinfold thickness
Wt for age Wt for height
etc. Food intake (ask) Protein calorie intake (ask & calculate)
3 . How valid the instrument?
o ooo ooo ooooooooooooo oooooooo ooo ooo ooooo oooo oooooo oooooo oooooo oooooo o f questions, recall of subjects, certainty of reported amount of food, variability
of ingredients, etc.) Does the information obtained actually reflect fatty food intake? 4 . How precise the instrument?
Does the information precisely estimate the amount of fatty food intake for ea ch individual?
In summary:
Statistics (and epidemiology) deals with a group (the bigger the group, the better the result) of persons (not one individual pati
ent). We look for the characteristics which are most common in thegroup.
Descriptive statistics is used for explaining our sample (or findings).. oo ooo oooooooo o ooo oooo ooo. 80% of them had haemoglobin level less than 10 mg%. The average haemoglobin level was 9.5 mg% with standard devi
1 .5 %.ation of mg
Inferential statistics (Infer to general population of interest)