health and disease in populations 2002 sources of variation (1) paul burton! jane hutton

Health and Disease in Populations 2002

Sources of variation (1)

Paul Burton! Jane Hutton

Informal lecture objectives

Objective 1 To enable the student to distinguish between

observed data and the underlying tendencies which give rise to those data

Objective 2: To understand the concepts of sources of

variation and randomness

Formal lecture objectivesfor Random Variation (1) and (2)

Objective 1 Distinguish between ‘observed’ epidemiological

quantities (incidence, prevalence, incidence rate ratio etc) and their ‘true’ or ‘underlying’ values.

Objective 2 Discuss how ‘observed’ epidemiological quantities

depart from their ‘true’ values because of random variation.

Formal lecture objectivesfor Sources of Variation

Objective 3 Describe how ‘observed’ values help us

towards a knowledge of the ‘true’ values by: allowing us to test hypotheses about the true

value (SoV 1) allowing us to calculate a range within which

the true value probably lies (SoV 2)

Drawing conclusions Experiment

Flip a coin 10 times

Result Observe 7 heads, 3 tails

Conclusions Data wrong (e.g. a miscount) Artefact Chance The coin is biased towards heads

Tendency versus observation Coins tend to produce equal numbers of

heads and tails, but what we observe may depart from this by random variation.

Random variation in health On average, there are 4 cases of meningitis per

month in Leicester; some months we observe 10, some months 0.

Smokers tend to be less healthy than non-smokers; but if we pick a few people at random, we might find that the smokers are healthier than the non-smokers.

Tendency versus observation

Epidemiologists, health planners etc. want to know about the underlying tendencies and patterns. However, as well as systematic variation, everything they observe is affected by random variation.

Underlying tendency observed data The proportion of red marbles in a bag of 1000 red and black ones

The number of reds among ten picked at random from the bag

The forthcoming result of a UK general election

The voting intentions of 1,000 UK voters picked at random

The total number of Leicester diabetic patients who have foot problems

The number with foot problems in a random sample of 200 Leicester diabetics

If we know about the underlying tendency, we can predict what we may ‘reasonably’ expect to observe (probability theory).

Neonatal Intensive Care (NIC) cots

True requirement (1992 figures) 1/1,000 live births per annum

Health authority has approximately 12,000 live births per annum

On average 12 NIC `cots' will be required per year (this is the true tendency)

95%18

29/3019

99%21

Obstetric beds (NIC cots)

Often observe 8-16 cots being used

Need 19 or more on 1/day per month

Need 21 or more on 1% of days

Hardly ever need more than 24 cots

Provide 19 cots On average 12 are occupied = 63% occupancy

True tendency observed distribution easy

BUT how do we reverse the direction of inference?

Observed distribution true tendency

Any questions?

Hypothesis testing


towards a knowledge of the ‘true’ values by: Allowing us to test hypotheses about the true

value

Hypothesis testing

An hypothesis: A statement that an underlying tendency of scientific interest takes a particular quantitative value The coin is fair (the probability of heads is 0.5) The new drug is no better than the standard

treatment (the ratio of survival rates = 1.0) The true prevalence of tuberculosis in a given

population is 2 in 10,000

Testing hypotheses

Are the observed data ‘consistent’ with the stated hypothesis? Informally? Formally?

Formally Calculate the probability of getting an observation

as extreme as, or more extreme than, the one observed if the stated hypothesis was true.

If this probability is very small, then either something very unlikely has occurred; or the hypothesis is wrong

It is then reasonable to conclude that the data are incompatible with the hypothesis.

The probability is called a ‘p-value’

Hypothesis: this coin is fair Observed data: 10 heads, 0 tails

P-value: 0.002 (1 in 500) (exactly 2 1/ 1,024)

Conclusion: Data inconsistent with hypothesis; strong evidence against the hypothesis

Prior beliefs relevant here: 10 heads, 0 tails: (Is the coin biased?) 10 survivors, 0 deaths on new treatment X: (Does X work if

historically 50% died)

An arbitrary convention

P-value: p 0.05 Data ‘inconsistent with hypothesis’ ‘Substantive evidence against the hypothesis’ ‘Reasonable to reject the hypothesis’ ‘Statistically significant’

P-value: p>0.05 None of the above

The mean surface temperature of the earth has increased by only 1°C over the last 50 years p=0.1 does not prove that there is no global warming!

Hypothesis tests

The incidence of disease X in Warwickshire is significantly lower than in the rest of the UK (p=0.01)

The death rate from disease Y is significantly higher in Barnsley than in Leicester (p=0.05)

Patients on the new drug did not live significantly longer than those on the standard drug (p=0.4)

The ‘null hypothesis’ The hypothesis to be tested is often called the ‘null

hypothesis’ (H0) The ratio of death rates is 1.0 The prevalence in Warwickshire is the same as in

Leicestershire ‘p<=0.05’: substantial evidence against the

hypothesis being tested, not that it is definitely false

p>0.05: Data (not in-) consistent with the hypothesis. Little or no evidence against the hypothesis being tested, not that it is definitely true

An experiment: flip a coin 10 times Observed result: 7 heads, 3 tails Question:

Is the coin biased?0 0.001 *1 0.010 *2 0.044 *3 0.117 * 4 0.2055 0.246 p = 2×(0.001+0.010+0.044+0.117) = 0.3446 0.2057 0.117 *8 0.044 *9 0.010 *10 0.001 *

An experiment: flip a coin 10 times

Observed result: 7 heads, 3 tails Data consistent with the coin being unbiased. Weak evidence against the null hypothesis’ So: little evidence that the coin is biased But: does not prove that the coin is unbiased

Problems

Rejecting H0 is not always much use. p<0.05 is arbitrary; nothing special happens between

p=0.049 and p=0.051 p=0.000001and p=0.6 easy to interpret False positive results Statistical significance depends on sample size. Flip a

coin 3 times minimum p=0.25 (i.e. 2×1/8) Statistically significant clinically important

Nevertheless, p values are used a lot

A solution


towards a knowledge of the ‘true’ values by: Allowing us to test hypotheses about the true

value Providing us with a range within which the

underlying tendency probably lies

Any questions?

health and disease in populations 2002 sources of variation (1) paul burton! jane hutton

Documents