the environment and disease: association or causation?observational studies 6 (2020) 1-9 submitted...

65
Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation? Sir Austin Bradford Hill Editor’s Note: Sir Austin Bradford Hill was Professor of Medical Statistics, University of London, United Kingdom. This article was originally published in the Proceedings of the Royal Society of Medicine, May 1965, 58, 295-300. This paper is reprinted with permission of the copyright holder, Sage Publications. New comments by the following researchers follow: Peter Armitage; Mike Baiocchi; Samantha Kleinberg; James O’Malley; Chris Phillips and Joel Greenhouse; Kenneth Rothman; Herb Smith; Tyler VanderWeele; Noel Weiss; and William Yeaton. Among the objects of this newly founded Section of Occupational Medicine are: first, ‘to provide a means, not readily afforded elsewhere, whereby physicians and surgeons with a special knowledge of the relationship between sickness and injury and conditions of work may discuss their problems, not only with each other, but also with colleagues in other fields, by holding joint meetings with other Sections of the Society’; and second, ‘to make available information about the physical, chemical and psychological hazards of occupation, and in particular about those that are rare or not easily recognized’. At this first meeting of the Section and before, with however laudable intentions, we set about instructing our colleagues in other fields, it will be proper to consider a problem fundamental to our own. How in the first place do we detect these relationships between sickness, injury and conditions of work? How do we determine what are physical, chemical and psychological hazards of occupation, and in particular those that are rare and not easily recognised? There are, of course, instances in which we can reasonably answer these questions from the general body of medical knowledge. A particular, and perhaps extreme, physical en- vironment cannot fail to be harmful; a particular chemical is known to be toxic to man and therefore suspect on the factory floor. Sometimes, alternatively, we may be able to consider what might a particular environment do to man, and then see whether such con- sequences are indeed to be found. But more often than not we have no such guidance, no such means of proceeding; more often than not we are dependent upon our observation and enumeration of defined events for which we then seek antecedents. In other words, we see that the event B is associated with the environmental feature A, that, to take a specific example, some form of respiratory illness is associated with a dust in the environment. In what circumstances can we pass from this observed association to a verdict of causation? Upon what basis should we proceed to do so? I have no wish, nor the skill, to embark upon a philosophical discussion of the meaning of ‘causation’. The ‘cause’ of illness may be immediate and direct, it may be remote and indirect underlying the observed association. But with the aims of occupational, and almost c 2020 Sage Publications.

Upload: others

Post on 15-Jan-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20

The environment and disease: association or causation?

Sir Austin Bradford Hill

Editor’s Note: Sir Austin Bradford Hill was Professor of Medical Statistics, Universityof London, United Kingdom. This article was originally published in the Proceedings of theRoyal Society of Medicine, May 1965, 58, 295-300. This paper is reprinted with permission ofthe copyright holder, Sage Publications. New comments by the following researchers follow:Peter Armitage; Mike Baiocchi; Samantha Kleinberg; James O’Malley; Chris Phillips andJoel Greenhouse; Kenneth Rothman; Herb Smith; Tyler VanderWeele; Noel Weiss; andWilliam Yeaton.

Among the objects of this newly founded Section of Occupational Medicine are: first,‘to provide a means, not readily afforded elsewhere, whereby physicians and surgeons witha special knowledge of the relationship between sickness and injury and conditions of workmay discuss their problems, not only with each other, but also with colleagues in otherfields, by holding joint meetings with other Sections of the Society’; and second, ‘to makeavailable information about the physical, chemical and psychological hazards of occupation,and in particular about those that are rare or not easily recognized’.

At this first meeting of the Section and before, with however laudable intentions, weset about instructing our colleagues in other fields, it will be proper to consider a problemfundamental to our own. How in the first place do we detect these relationships betweensickness, injury and conditions of work? How do we determine what are physical, chemicaland psychological hazards of occupation, and in particular those that are rare and not easilyrecognised?

There are, of course, instances in which we can reasonably answer these questions fromthe general body of medical knowledge. A particular, and perhaps extreme, physical en-vironment cannot fail to be harmful; a particular chemical is known to be toxic to manand therefore suspect on the factory floor. Sometimes, alternatively, we may be able toconsider what might a particular environment do to man, and then see whether such con-sequences are indeed to be found. But more often than not we have no such guidance, nosuch means of proceeding; more often than not we are dependent upon our observation andenumeration of defined events for which we then seek antecedents. In other words, we seethat the event B is associated with the environmental feature A, that, to take a specificexample, some form of respiratory illness is associated with a dust in the environment. Inwhat circumstances can we pass from this observed association to a verdict of causation?Upon what basis should we proceed to do so?

I have no wish, nor the skill, to embark upon a philosophical discussion of the meaningof ‘causation’. The ‘cause’ of illness may be immediate and direct, it may be remote andindirect underlying the observed association. But with the aims of occupational, and almost

c©2020 Sage Publications.

Page 2: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill

synonymously preventive, medicine in mind, the decisive question is whether the frequencyof the undesirable event B will be influenced by a change in the environmental feature A.How such a change exerts that influence may call for a great deal of research. However,before deducing ‘causation’ and taking action, we shall not invariably have to sit aroundawaiting the results of that research. The whole chain may have to be unravelled or a fewlinks may suffice. It will depend upon circumstances.

Disregarding then any such problem in semantics we have this situation. Our observa-tions reveal an association between two variables, perfectly clear-cut and beyond what wewould care to attribute to the play of chance. What aspects of that association should weespecially consider before deciding that the most likely interpretation of it is causation?

1. Strength. First upon my list, I would put the strength of the association. To take avery old example, by comparing the occupations of patients with scrotal cancer withthe occupations of patients presenting with other diseases, Percival Pott could reach acorrect conclusion because of the enormous increase of scrotal cancer in the chimneysweeps. ‘Even as late as the second decade of the twentieth century’, writes RichardDoll, ‘the mortality of chimney sweeps from scrotal cancer was some 200 times thatof workers who were not specially exposed to tar or mineral oils and in the eighteenthcentury the relative difference is likely to have been much greater’ (Doll, 1964).

To take a more modern and more general example upon which I have now reflected forover 15 years, prospective inquiries into smoking have shown that the death rate fromcancer of the lung in cigarette smokers is nine to 10 times the rate in non-smokers andthe rate in heavy cigarette smokers is 20 to 30 times as great. On the other hand, thedeath rate from coronary thrombosis in smokers is no more than twice, possibly less,the death rate in non-smokers. Though there is good evidence to support causationit is surely much easier in this case to think of some features of life that may go hand-in-hand with smoking – features that might conceivably be the real underlying causeor, at the least, an important con- tributor, whether it be lack of exercise, natureof diet or other factors. But to explain the pronounced excess in cancer of the lungin any other environmental terms requires some feature of life so intimately linkedwith cigarette smoking and with the amount of smoking that such a feature shouldbe easily detectable. If we cannot detect it or reasonably infer a specific one, then insuch circumstances, I think we are reasonably entitled to reject the vague contentionof the armchair critic ‘you can’t prove it, there may be such a feature’.

Certainly in this situation, I would reject the argument sometimes advanced thatwhat matters is the absolute difference between the death rates of our various groupsand not the ratio of one to other. That depends upon what we want to know. Ifwe want to know how many extra deaths from cancer of the lung will take placethrough smoking (i.e. presuming causation), then obviously we must use the absolutedifferences between the death rates – 0.07 per 1000 per year in non-smoking doctors,0.57 in those smoking 1–14 cigarettes daily, 1.39 for 15–24 cigarettes daily and 2.27for 25 or more daily. But it does not follow here, or in more specifically occupationalproblems, that this best measure of the effect upon mortality is also the best measurein relation to aetiology. In this respect, the ratios of 8, 20 and 32 to 1 are far moreinformative. It does not, of course, follow that the difference revealed by ratios are of

2

Page 3: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Association or Causation?

any practical importance. Maybe they are, maybe they are not; but that is anotherpoint altogether.

We may recall John Snow’s classic analysis of the opening weeks of the cholera epi-demic of 1854 (Snow, 1855). The death rate that he recorded in the customers suppliedwith the grossly polluted water of the Southwark and Vauxhall Company was in truthquite low – 71 deaths in each 10,000 houses. What stands out vividly is the fact thatthe small rate is 14 times the figure of five deaths per 10,000 houses supplied with thesewage-free water of the rival Lambeth Company.

In thus putting emphasis upon the strength of an association, we must, nevertheless,look at the obverse of the coin. We must not be too ready to dismiss a cause-and-effecthypothesis merely on the grounds that the observed association appears to be slight.There are many occasions in medicine when this is in truth so. Relatively few personsharbouring the meningococcus fall sick of meningococcal meningitis. Relatively fewpersons occupationally exposed to rat’s urine contract Weil’s disease.

2. Consistency: Next on my list of features to be specially considered, I would place theconsistency of the observed association. Has it been repeatedly observed by differentpersons, in different places, circumstances and times?

This requirement may be of special importance for those rare hazards singled out in theSection’s terms of reference. With many alert minds at work in industry today manyan environmental association may be thrown up. Some of them on the customary testsof statistical significance will appear to be unlikely to be due to chance. Nevertheless,whether chance is the explanation or whether a true hazard has been revealed maysometimes be answered only by a repetition of the circumstances and the observations.

Returning to my more general example, the Advisory Committee to the Surgeon-General of the United States Public Health Service found the association of smokingwith cancer of the lung in 29 retrospective and seven prospective inquiries (US Depart-ment of Education, Health and Welfare, 1964). The lesson here is that broadly thesame answer has been reached in quite a wide variety of situations and techniques. Inother words, we can justifiably infer that the association is not due to some constanterror or fallacy that permeates every inquiry. And we have indeed to be on our guardagainst that.

Take, for instance, an example given by Heady (Heady, 1958). Patients admitted tohospital for operation for peptic ulcer are questioned about recent domestic anxietiesor crises that may have precipitated the acute illness. As controls, patients admittedfor operation for a simple hernia are similarly quizzed. But, as Heady points out,the two groups may not be in pari materia. If your wife ran off with the lodger lastweek, you still have to take your perforated ulcer to hospital without delay. But witha hernia you might prefer to stay at home for a while – to mourn (or celebrate) theevent. No number of exact repetitions would remove or necessarily reveal that fallacy.

We have, therefore, the somewhat paradoxical position that the different results of adifferent inquiry certainly cannot be held to refute the original evidence; yet the sameresults from precisely the same form of inquiry will not invariably greatly strengthen

3

Page 4: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill

the original evidence. I would myself put a good deal of weight upon similar resultsreached in quite different ways, e.g. prospectively and retrospectively.

Once again looking at the obverse of the coin, there will be occasions when repetitionis absent or impossible and yet we should not hesitate to draw conclusions. Theexperience of the nickel refiners of South Wales is an outstanding example. I quotefrom the Alfred Watson Memorial Lecture that I gave in 1962 to the Institute ofActuaries:

The population at risk, workers and pensioners, numbered about one thou-sand. During the ten years 1929 to 1938, sixteen of them had died fromcancer of the lung, eleven of them had died from cancer of the nasal sinuses.At the age specific death rates of England and Wales at that time, one mighthave anticipated one death from cancer of the lung (to compare with the16), and a fraction of a death from cancer of the nose (to compare with the11). In all other bodily sites cancer had appeared on the death certificate11 times and one would have expected it to do so 10-11 times. There hadbeen 67 deaths from all other causes of mortality and over the ten years’period 72 would have been expected at the national death rates. Finallydivision of the population at risk in relation to their jobs showed that theexcess of cancer of the lung and nose had fallen wholly upon the workersemployed in the chemical processes.

More recently my colleague, Dr Richard Doll, has brought this story a stagefurther. In the nine years 1948 to 1956 there had been, he found, 48 deathsfrom cancer of the lung and 13 deaths from cancer of the nose. He assessedthe numbers expected at normal rates of mortality as, respectively 10 and01. In 1923, long before any special hazard had been recognized, certainchanges in the refinery took place. No case of cancer of the nose has beenobserved in any man who first entered the works after that year, and inthese men there has been no excess of cancer of the lung. In other words,the excess in both sites is uniquely a feature in men who entered the refineryin, roughly, the first 23 years of the present century.

No causal agent of these neoplasms has been identified. Until recently noanimal experimentation had given any clue or any support to this whollystatistical evidence. Yet I wonder if any of us would hesitate to accept it asproof of a grave industrial hazard? (Hill, 1930)

In relation to my present discussion, I know of no parallel investigation. We have (orcertainly had) to make up our minds on a unique event; and there is no difficulty indoing so.

3. Specificity: One reason, needless to say, is the specificity of the association, the thirdcharacteristic which invariably we must consider. If, as here, the association is lim-ited to specific workers and to particular sites and types of disease and there is noassociation between the work and other modes of dying, then clearly that is a strongargument in favour of causation.

4

Page 5: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Association or Causation?

We must not, however, overemphasise the importance of the characteristic. Even inmy present example there is a cause-and-effect relationship with two different sitesof cancer – the lung and the nose. Milk as a carrier of infection and, in that sense,the cause of disease can produce such a disparate galaxy as scarlet fever, diphtheria,tuberculosis, undulant fever, sore throat, dysentery and typhoid fever. Before thediscovery of the underlying factor, the bacterial origin of disease, harm would havebeen done by pushing too firmly the need for specificity as a necessary feature beforeconvicting the dairy.

Coming to modern times, the prospective investigations of smoking and cancer ofthe lung have been criticised for not showing specificity – in other words, the deathrate of smokers is higher than the death rate of non-smokers from many causes ofdeath (though in fact the results of Doll and Hill (1964) do not show that). But heresurely one must return to my first characteristic, the strength of the association. Ifother causes of death are raised 10, 20 or even 50% in smokers whereas cancer of thelung is raised 900–1000% we have specificity – a specificity in the magnitude of theassociation.

We must also keep in mind that diseases may have more than one cause. It has alwaysbeen possible to acquire a cancer of the scrotum without sweeping chimneys or takingto mule-spinning in Lancashire. One-to-one relationships are not frequent. Indeed,I believe that multi-causation is generally more likely than single causation thoughpossibly if we knew all the answers we might get back to a single factor.

In short, if specificity exists we may be able to draw conclusions without hesitation; ifit is not apparent, we are not thereby necessarily left sitting irresolutely on the fence.

4. Temporality: My fourth characteristic is the temporal relationship of the association –which is the cart and which the horse? This is a question which might be particularlyrelevant with diseases of slow development. Does a particular diet lead to diseaseor do the early stages of the disease lead to those peculiar dietetic habits? Does aparticular occupation or occupational environment pro- mote infection by the tuberclebacillus or are the men and women who select that kind of work more liable to contracttuberculosis whatever the environment – or, indeed, have they already contracted it?This temporal problem may not arise often but it certainly needs to be remembered,particularly with selective factors at work in industry.

5. Biological gradient: Fifth, if the association is one which can reveal a biological gra-dient, or dose-response curve, then we should look most carefully for such evidence.For instance, the fact that the death rate from cancer of the lung rises linearly withthe number of cigarettes smoked daily adds a very great deal to the simpler evidencethat cigarette smokers have a higher death rate than non-smokers. That comparisonwould be weakened, though not necessarily destroyed, if it depended upon, say, a muchheavier death rate in light smo- kers and a lower rate in heavier smokers. We shouldthen need to envisage some much more complex relationship to satisfy the cause-and-effect hypothesis. The clear dose-response curve admits of a simple explanation andobviously puts the case in a clearer light.

5

Page 6: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill

The same would clearly be true of an alleged dust hazard in industry. The dustierthe environment the greater the incidence of disease we would expect to see. Oftenthe difficulty is to secure some satisfactory quantitative measure of the environmentwhich will permit us to explore this dose-response. But we should invariably seek it.

6. Plausibility: It will be helpful if the causation we suspect is biologically plausible. Butthis is a feature I am convinced we cannot demand. What is biologically plausibledepends upon the biological knowledge of the day.

To quote again from my Alfred Watson Memorial Lecture (Hill 1962), there was

...no biological knowledge to support (or to refute) Pott’s observation inthe 18th century of the excess of cancer in chimney sweeps. It was lack ofbiological knowledge in the 19th that led a prize essayist writing on the valueand the fallacy of statistics to conclude, amongst other ‘absurd’ associations,that ‘it could be no more ridiculous for the stranger who passed the nightin the steerage of an emigrant ship to ascribe the typhus, which he therecontracted, to the vermin with which bodies of the sick might be infected’.And coming to nearer times, in the 20th century there was no biologicalknowledge to support the evidence against rubella.

In short, the association we observe may be one new to science or medicine and wemust not dismiss it too light-heartedly as just too odd. As Sherlock Holmes advisedDr Watson, ‘when you have eliminated the impossible, whatever remains, howeverimprobable, must be the truth’.

7. Coherence: On the other hand the cause-and-effect interpretation of our data shouldnot seriously conflict with the generally known facts of the natural history and biologyof the disease – in the expression of the Advisory Committee to the Surgeon-Generalit should have coherence.

Thus in the discussion of lung cancer, the Committee finds its association withcigarette smoking coherent with the temporal rise that has taken place in the twovariables over the last generation and with the sex difference in mortality – featuresthat might well apply in an occupational problem. The known urban/rural ratio oflung cancer mortality does not detract from coherence, nor the restriction of the effectto the lung.

Personally, I regard as greatly contributing to coherence the histopathological evidencefrom the bronchial epithelium of smokers and the isolation from cigarette smoke offactors carcinogenic for the skin of laboratory animals. Nevertheless, while such labo-ratory evidence can enormously strengthen the hypothesis and, indeed, may determinethe actual causative agent, the lack of such evidence cannot nullify the epidemiologicalobservations in man. Arsenic can undoubtedly cause cancer of the skin in man butit has never been possible to demonstrate such an effect on any other animal. In awider field, John Snow’s epidemiological observations on the conveyance of cholera bythe water from the Broad Street pump would have been put almost beyond disputeif Robert Koch had been then around to isolate the vibrio from the baby’s nappies,the well itself and the gentleman in delicate health from Brighton. Yet the fact that

6

Page 7: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Association or Causation?

Koch’s work was to be awaited another 30 years did not really weaken the epidemio-logical case though it made it more difficult to establish against the criticisms of theday – both just and unjust.

8. Experiment: Occasionally, it is possible to appeal to experimental, or semi-experimental,evidence. For example, because of an observed association some preventive action istaken. Does it in fact prevent? The dust in the workshop is reduced, lubricating oilsare changed, persons stop smoking cigarettes. Is the frequency of the associated eventsaffected? Here the strongest support for the causation hypothesis may be revealed.

9. Analogy: In some circumstances, it would be fair to judge by analogy. With the effectsof thalidomide and rubella before us, we would surely be ready to accept slighter butsimilar evidence with another drug or another viral disease in pregnancy.

Here, then, are nine different viewpoints from all of which we should study associationbefore we cry causation. What I do not believe – and this has been suggested – is thatwe can usefully lay down some hard-and-fast rules of evidence that must be obeyed beforewe accept cause and effect. None of my nine viewpoints can bring indisputable evidencefor or against the cause-and-effect hypothesis and none can be required as a sine qua non.What they can do, with greater or less strength, is to help us to make up our minds on thefundamental question – is there any other way of explaining the set of facts before us, isthere any other answer equally, or more, likely than cause and effect?

Tests of Significance

No formal tests of significance can answer those questions. Such tests can, and should,remind us of the effects that the play of chance can create, and they will instruct us in thelikely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of ourhypothesis.

Nearly 40 years ago, among the studies of occupa- tional health that I made for theIndustrial Health Research Board of the Medical Research Council was one that concernedthe workers in the cotton-spinning mills of Lancashire (Hill, 1962). The question that I hadto answer, by the use of the National Health Insurance records of that time, was this: Dothe workers in the cardroom of the spinning mill, who tend the machines that clean theraw cotton, have a sickness experience in any way different from that of other operativesin the same mills who are relatively unexposed to the dust and fibre that were features ofthe cardroom? The answer was an unqualified ‘Yes’. From age 30 to age 60, the cardroomworkers suffered over three times as much from respiratory causes of illness whereas fromnon-respiratory causes their experience was not different from that of the other workers.This pronounced difference with the respiratory causes was derived not from abnormallylong periods of sickness but rather from an excessive number of repeated absences fromwork of the cardroom workers.

All this has rightly passed into the limbo of forgotten things. What interests me today isthis: My results were set out for men and women separately and for half a dozen age groupsin 36 tables. So there were plenty of sums. Yet I cannot find that anywhere I thought itnecessary to use a test of significance. The evidence was so clear-cut, the differences between

7

Page 8: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill

the groups were mainly so large, the contrast between respiratory and non-respiratory causesof illness so specific, that no formal tests could really contribute anything of value to theargument. So why use them?

Would we think or act that way today? I rather doubt it. Between the two world warsthere was a strong case for emphasising to the clinician and other research workers theimportance of not overlooking the effects of the play of chance upon their data. Perhapstoo often generalities were based upon two men and a laboratory dog while the treatmentof choice was deduced from a difference between two bedfuls of patients and might easilyhave no true meaning. It was therefore a useful corrective for statisticians to stress, and toteach the need for, tests of significance merely to serve as guides to caution before drawinga conclusion, before inflating the particular to the general.

I wonder whether the pendulum has not swung too far – not only with the attentivepupils but even with the statisticians themselves. To decline to draw conclusions withoutstandard errors can surely be just as silly? Fortunately, I believe we have not yet gone sofar as our friends in the USA where, l am told, some editors of journals will return an articlebecause tests of significance have not been applied. Yet there are innumerable situations inwhich they are totally unnecessary – because the difference is grotesquely obvious, becauseit is negligible, or because, whether it be formally significant or not, it is too small to beof any practical importance. What is worse the glitter of the t table diverts attention fromthe inadequacies of the fare. Only a tithe, and an unknown tithe, of the factory personnelvolunteer for some procedure or interview, 20% of patients treated in some particular wayare lost to sight, 30% of a randomly-drawn sample are never contacted. The sample may,indeed, be akin to that of the man who, according to Swift, ‘had a mind to sell his house andcarried a piece of brick in his pocket, which he showed as a pattern to encourage purchasers’.The writer, the editor and the reader are unmoved. The magic formulae are there.

Of course, I exaggerate. Yet too often I suspect we waste a deal of time, we graspthe shadow and lose the substance, we weaken our capacity to interpret data and to takereasonable decisions whatever the value of P. And far too often we deduce ‘no difference’from ‘no significant difference’. Like fire, the χ2 test is an excellent servant and a badmaster.

The case for action

Finally, in passing from association to causation I believe in ‘real life’, we shall have toconsider what flows from that decision. On scientific grounds, we should do no such thing.The evidence is there to be judged on its merits and the judgment (in that sense) shouldbe utterly independent of what hangs upon it – or who hangs because of it. But in anotherand more practical sense we may surely ask what is involved in our decision. In occupa-tional medicine our object is usually to take action. If this be operative cause and that bedeleterious effect, then we shall wish to intervene to abolish or reduce death or disease.

While that is a commendable ambition it almost inevitably leads us to introduce differ-ential standards before we convict. Thus, on relatively slight evidence, we might decide torestrict the use of a drug for early-morning sickness in pregnant women. If we are wrongin deducing causation from association no great harm will be done. The good lady and thepharmaceutical industry will doubtless survive.

8

Page 9: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Association or Causation?

On fair evidence, we might take action on what appears to be an occupational hazard,e.g. we might change from a probably carcinogenic oil to a non-carcinogenic oil in a limitedenvironment and without too much injustice if we are wrong. But we should need verystrong evidence before we made people burn a fuel in their homes that they do not likeor stop smoking the cigarettes and eating the fats and sugar that they do like. In askingfor very strong evidence I would, however, repeat emphatically that this does not implycrossing every ‘t’, and swords with every critic, before we act.

All scientific work is incomplete – whether it be observational or experimental. Allscientific work is liable to be upset or modified by advancing knowledge. That does notconfer upon us a freedom to ignore the knowledge we already have, or to postpone theaction that it appears to demand at a given time. Who knows, asked Robert Browning,but the world may end tonight? True, but on available evidence most of us make ready tocommute on the 8.30 next day.

References

Doll, R (1964). Cancer. In: Witts LJ (ed.) Medical Surveys and Clinical Trials, 2nd ed.London: Oxford University Press, p.333.

Doll, R and Hill, A.B. (1964). Mortality in relation to smoking: ten years’ observations ofBritish doctors. British Medical Journal, 1, 1399-1410.

Heady J.A. (1958). False figuring: statistical method in medicine. Med World Lond, 89,305.

Hill A.B. (1930). Sickness Amongst Operatives in Lancashire Spinning Mills. IndustrialHealth Research Board Report No. 59. London: HMSO.

Hill, A.B. (1962). The statistician in medicine. (Alfred Watson Memorial Lecture). J InstActuar, 88, 178–191.

Snow J. (1855). On the Mode of Communication of Cholera. 2nd edn. London: JohnChurchill (reprinted 1936, New York).

US Department of Health, Education, and Welfare (1964). Smoking and Health. PublicHealth Service Publication No. 1103. Washington.

9

Page 10: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 10 Submitted 7/19; Published 1/20

Commentary on Lecture by Sir Austin Bradford Hill

Peter Armitage [email protected]

Wallingford, Oxfordshire, U.K.

This lecture is quoted widely as the source for Bradford Hill’s principles, or criteria,for favouring causality in epidemiological studies as distinct from mere association betweenenvironmental factor and disease. The criteria were also proposed in an edition of his text-book on medical statistics published later (Hill, 1977). A careful analysis of the argumentsis provided by Rothman and Greenland (1998). The lecture was given a few years afterHill’s retirement, and perhaps indicates a return to the themes that would have perme-ated his thoughts in the pre-war period, when he was actively involved in occupationalepidemiology, before his important advocacy of randomized trials which occupied much ofhis time in the 1950s. The abundance in this paper of examples of contentious issues inoccupational medicine suggests that these had never been far from his mind. Is it also, per-haps, an indication that he thought of himself primarily as an epidemiologist rather than astatistician?

It is interesting that statistical analysis does not occur as the central theme of any ofthe nine criteria, but there is an important final section on significance tests, which heregards as being irrelevant to the main issue. This may have been an offshoot of his desireto discuss the problem as one of common sense, avoiding technical detail. One wonders,though, whether significance tests can be wholly ignored. If all the other criteria are satisfiedthe results of a test may be sufficiently ambiguous as to affect the conclusions and perhapslead the investigators to extend the study before publicizing the results.

Bradford Hill was a persuasive and perceptive advocate, and his writings are a joy toread. I frequently read in the press of large studies which demonstrate that people whohave particular habits suffer more from some disease, without any explanation of the thornydistinction between association and causation. Let us hope that this reissue of BradfordHill’s classic lecture will achieve some of its author’s intentions.

References

Hill, A.B. (1977). A Short Textbook of Medical Statistics. London: Hodder and Stoughton.Rothman, K.J and Greenland, S. (1998). Hill’s Criteria for Causality. In Encyclopedia of

Biostatistcs (ed. P. Armitage and T. Colton), Vol.3. London: Wiley.

c©2020 Peter Armitage.

Page 11: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 11-16 Submitted 12/19; Published 1/20

Following Bradford Hill

Mike Baiocchi [email protected]

Department of Epidemiology and Population Health

Stanford University

Stanford, CA 94305, USA

Abstract

In 1965, Sir Austin Bradford Hill offered his thoughts on: “What aspects of [an] associ-ation should we especially consider before deciding that the most likely interpretation ofit is causation?” He proposed nine means for reasoning about the association, which henamed as: strength, consistency, specificity, temporality, biological gradient, plausibility,coherence, experiment, and analogy. In this paper, we look at what motivated BradfordHill to propose we focus on these nine features. We contrast Bradford Hill’s approachwith a currently fashionable framework for reasoning about statistical associations – theCommon Task Framework. And then suggest why following Bradford Hill, 50+ years on,is still extraordinarily reasonable.

Keywords: Causality, Bradford Hill Criteria, Common Task Framework

1. Reading with context

It feels odd writing about a paper that is more than 50 years old, particularly inside ofa discipline that is currently undergoing extraordinary growth and innovation. But the“Bradford Hill criteria” (Bradford Hill, 1965) occupy a particularly prominent peak forthose of us interested in making decisions that hinge on causal claims. To some, BradfordHill laid out friendly signposts that suggest safer paths to achieving solid inferences aboutcausal connections. To more, the “Bradford Hill criteria” are only stood up in order to beknocked right back down; the nine “criteria” are introduced and logical holes are punchedthrough until students are left with the impression that hemming in causality with rulesis a fool’s errand. And yet, this paper persists. In fact, I was discussing this paper theother day with a colleague who told a fascinating story about when she served as an expertwitness for a defense team in some legal case or another. The defense attorneys wanted herto work through the plausibility of each of the criteria. By report, it sounded like a deeplyinteresting (and lucrative) exercise in careful thinking. Her story got me curious so I duginto the legal world – and lo and behold – there are citations, and guides, and warningsabout both deploying the Bradford Hill criteria in one’s arguments before the court, as wellas detailed guides on how to counter opposing counsel’s expert witness’s use of the criteria.These ideas seem to have sprouted legs and scurried out of our exclusive domain and intoothers, even while still kicking up heated exchanges in our own academic literature (Phillipsand Goodman (2004), Hofler (2005), Phillips and Goodman (2006), Hofler (2006) – as youmight be able to guess from the alternation of authors, that’s a fun exchange). So what’sgoing on here?

c©2020 Mike Baiocchi.

Page 12: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Baiocchi

If you encountered Bradford Hill’s ideas in a setting disconnected from the originalmanuscript then it may help unlock a bit more of his meaning by considering his audience.Bradford Hill first gave his remarks to the newly formed Section of Occupational Medicine.In context, these ideas were offered to medical practitioners charged with making complexdecisions about health but with an eye toward bottom line economics, employment dynam-ics, and (to some degree) consumer tastes. His audience was not the usual data-analystswe think of, concerned with uncertainty intervals and p-values. Bradford Hill tells us thisdirectly as he sets the table: “[Suppose] our observations reveal an association between twovariables, perfectly clear-cut and beyond what we would care to attribute to the play ofchance. What aspects of that association should we especially consider before deciding thatthe most likely interpretation of it is causation?” Using modern terminology, we might sayBradford Hill is quite a bit less interested in statistical inference and more interested instudy design considerations. Though even that terminology does not get quite at the nubof his line of reasoning.

To get closer to the flow of his arguments, let’s jump over all the important bits in themiddle and pull from his concluding section: The Case for Action. Again, letting the manspeak for himself: “Finally, in passing from association to causation I believe in ’real life’ weshall have to consider what flows from that decision... In occupational medicine our objectis usually to take action. . . While that is a commendable ambition it almost inevitablyleads us to introduce differential standards before we convict.” What follows is a discussionof balances – how does the strength of evidence enter into the decision to forbid/compelpeople to take actions? The size of the effect, levels of certainty, chains of consequencesthat may arise from our (in)actions – these all need to be weighed out. If he stopped hisreasoning at that level of thought then this manuscript likely would have resolved into somekind of call for a better decision-theoretic framework. But that’s not where Bradford Hilltook the argument, and this is why this paper is still fascinating all these years later. Asfar as I can discern, there are two additional tensions he is tracking. The first is the tensionhe highlights quite a bit which is the need for action right now, which he contrasts withthe academic’s slower, more careful building of evidence toward solid, scientific conclusions.More interesting (at least to me and my contemporary eyes) is his focus on convincingpeople.

2. Reasoning with context

There are several ways to think about what we – those of us who are interested in makingempirically rigorous, positive change in the world – are doing. Perhaps we are mathematiz-ing the scientific method, providing crisp, quantified boundaries on what can be known andhow best to empirically know it. Maybe we merely clear away the rubbish others bring intothe Ivory Tower. These are recognizable roles we play in academic settings. But BradfordHill is reminding us of a more fundamental role: we do all this to convince people. Wemay believe that rigorous evidence will compel, but it won’t. Look at the absurdity ofclimate-change denial, or the rates of anti-vax. Change does not – exclusively – arise fromrigorous empirical conclusions.

One way to understand the challenge of convincing people is to unpack why we tendto formalize and create decision rules. The first reason we formalize is to make discovery

12

Page 13: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Following Bradford Hill

more productive. When a new causal discovery emerges it often feels a bit shocking andit’s natural to marvel at its departure from what has come before. For us, data-analysts,we tend to focus on the methods by which the discovery was achieved – which can be evenmore novel than the underlying causal discovery. But quickly, what used to seem novel andcutting-edge in science gets pulled into the core – what used to be “art” becomes codified andreproducible. This process is wildly productive, allowing many researchers to explore newdirections that previously only the cleverest could. The second reason we tend to formalizerequires some insight about how humans make decisions and assign responsibility: if we canformalize these kinds of causal-discovery methods, then we shift the burden of responsibilityfor declaring discovery outside of the individual (e.g., located in the idiosyncrasies of boththe situation under investigation and the researcher making the claim) and into the general(e.g., rules that are recognizable across settings but also rules that have buy-in from theresearchers or policy makers or stakeholders in our domain who will be impacted by ourempirical investigations). Formalized decision rules that standardize discovery and regulateour claims on the strength of conclusions – in a manner that is much like laws – helpcommunities set standards and make planning and settling disagreements easier and lessarbitrary. In fact, looking at several of Bradford Hill’s “criteria” with modern eyes, you cansee how his insights have been formalized with statistical methodologies (to pick a few): (i)“strength of effect” looks quite a bit like the thinking behind modern sensitivity analysisaddressing unobserved confounders; (ii) “consistency” looks like meta-analysis, replicability,and transportability of effect; (iii) even the initially surprising appeal to “analogy” thathe suggests has found some formalization in the work on “transfer learning” in machinelearning. These additions to our formalized tool set are great.

But, again, formalization is not what Bradford Hill is interested in. In fact, he takes somewonderfully cheeky shots at formalized decision making. He suggests that using statisticalprocesses allows decision-makers to obscure and shirk their critical responsibilities. Thetension that keeps Bradford Hill’s argument fresh is the one that makes many of us excitedabout doing our jobs: figuring out how to bring new discoveries to the larger community.When a debate is vital and complex, when the stakes really matter then how do we reachsolid conclusions that will be strong enough to win over our colleagues and those impactedby our conclusions? If you’re in the position Bradford Hill was, talking to a room full ofphysicians interested in Occupational Medicine, then you’ll understand the tension betweenthe kinds of formalized rules that rigorous statistical analysis provides and the kinds ofarguments used to convince and debate in the larger (less technical) community. A concreteexample: given our analysis, we believe we should order a popular agricultural productremoved from the market. If we decide to act on this belief then a new rule will come intoexistence. There will be many “losers” in this new regime, and a number of them will needto change their behaviors. How do we explain this rule? How do we get buy-in for thisrule? The less familiar the logic used to create the rule is to the people on the receivingend then the less likely they will engage in the required change in behavior.

For a moment, let’s pause this unpacking of Bradford Hill’s manuscript and move for-ward to approximately contemporary time. One of the dominant modes of reasoning incontemporary data analysis – the Common Task Framework (CTF) – serves as an extraor-dinary contrast to Bradford Hill’s ideas; it’s worth exploring the CTF to better understandBradford Hill.

13

Page 14: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Baiocchi

3. Reasoning without context

The productivity, and explosive improvements, in statistical prediction (“machine learning”)has rightly stood out in fields touched by data science. (If we’re being honest then to somedegree it has also caused some feelings of anxiety inside those of us less inclined to flashyprediction, and more enamored of the slower accumulation of information in fields interestedin causal inference.) If, like me, you are less familiar with the field of prediction then you’relikely even less familiar with the epistemological engine that has powered its growth. TheCommon Task Framework (Liberman (2015); Donoho (2017)) provides a fast, low-barriers-to-entry way for analysts to debate which algorithm performs “best” on a given data set.The CTF is an alternative way of assessing the suitability of an algorithm; it stands incontrast to the more traditional methods like mathematical theorems or simulations from agiven data generating function. Even if the CTF name is unfamiliar you’ve likely heard ofthis dynamic; the NetFlix Prize (Bennett and Lanning, 2007) was an excellent example ofthis framework. The key features of the CTF are (slightly modified from Donoho (2017)):

1. A publicly available training dataset involving, for each observation, a list of (possiblymany) feature measurements, and an outcome for that observation.

2. A set of enrolled competitors (analysts) whose common task is to infer a predictionrule from the training data.

3. A scoring referee, to which competitors can submit their prediction rule. The refereeruns the prediction rule against a testing dataset which is sequestered behind a Chinesewall. The referee objectively and automatically reports the score (prediction accuracy)achieved by the submitted rule.

All competitors share the common task of creating prediction rules which will receivea good score; hence the phase common task framework. The performance metric providesan ordering that gives analysts permission to claim “this algorithm provides useful insightswhen used on this data set” – such claims are strongest when framed relative to otheralgorithms. This is where the “leaderboard” style of algorithmic development came from.

Obviously, predictive models are not causal models. But it is not hard to find colleagueswho have become a bit too enamored of the predictive power of this or that fantastical blackbox – believing a bit too much in its ability to accurately describe all possible dynamicsof the data. For these folks, it is a small leap of faith to using a model like this to tryanswering questions about what will happen if we intervene. (Dear reader, I assure you: itpains me too.) How? Perhaps they use something like predictive margins (see Graubard andKorn, 1999) – first, setting all the observational units level to unexposed, second settingall the observational units level to exposed, and then contrasting the two hypotheticalgroups’ outcomes. Hidden behind almost all actions taken after consulting a black box is aconfidence in its ability to faithfully describe all potential configurations of the data. Butwhere did their confidence in the model come from?

The CTF allows fantastically complex, “black box” algorithms to be developed and (ina particular sense) evaluated. Without the CTF, complex algorithms – so complicated thatthey cannot be described mathematically – would have much weaker evidence to be trustedand thus deployed. With the CTF, we can see the performance of any algorithm vis-a-vis

14

Page 15: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Following Bradford Hill

any other algorithm on the same data set. The CTF has allowed algorithm developers to beextremely productive, principally by being able to avoid both slow moving math as well asthe kind of deeper engagement with nuanced issues that gave rise to the data that traditionalcausal inference analysts do. Algorithms that grow up in the CTF aren’t required to beaccountable to slow-moving, tradition-bound, coherence-seeking people. In fact, there’s avery explicit line of argument inside some data communities that “human experts needto be removed from the decision-making process” – rather, the machines should do thelearning because they are more likely to produce the most optimal results. The thinkinggoes: Humans are slow. Humans are hard to understand. Let’s remove humans from thisprocess.

But here’s the thing, when we stand with Bradford Hill, humans are the point. We’retrying to convince humans to change. Rules that are (in a particular sense) “optimal” arenot the same as rules that are useful for affecting change. In fact, it’s easy to imaginethat rules generated by “black box” algorithms (i.e., literally inexplicable) are less likelyto be complied with than rules reached through consensus building and through reasoningaccessible by those who are being asked to have their lives shaped by the rules. Do notmistake what I’m saying as arguing against well-reasoned, formal, quantified rules thatcome out of statistical analyses. Our rigorous statistical methods are the strong bones ofthe beast, but they don’t provide the heart, mind, and muscles that animate and make thesedecisions human. We are better now that we have statistical procedures that formalize manyof Bradford Hill’s criteria. But these new statistical procedures do not solve the principalissue Bradford Hill was engaging, how to convince and change.

4. Following Bradford Hill

I didn’t introduce the CTF to either bury or praise it, but rather because it is a perfectexample of how one might reason about data in a way that is about as far removed as possiblefrom the way Bradford Hill advocates we reason about data. The contrast here, I hope,helps illuminate the point that Bradford Hill was making. When I read this manuscript,I see someone making tough, impactful decisions in the presence of uncertainty. He issteeped in the particulars of the situation. While formalizing Bradford Hill’s criteria isuseful, and will produce better decision-making, it is also beside the point. (And, in themost extreme, can lead to a type of blindness about the role of experts, stakeholders, andconsequences for our analyses.) The criteria are paths of reasoning about causality thatresonate and reassure. In the kinds of questions epidemiologists, health policy researchers,economists, criminologists. . . engage, the ultimate audience is a community of people whoour conclusions impact. Statistical reasoning can be like mathematics at times, but inanswering these kinds of questions it is better to think of statistical reasoning as a form ofrigorous, quantitative argumentation – meant to guide thought and shift beliefs.

I have a friend that keeps a copy of Bradford Hill’s criteria pinned to his office wall. Heuses it to remind himself of the paths he might take. I like that. If you follow Bradford Hillthen I think you’ll have an easier time reaching your audience.

Acknowledgments

15

Page 16: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Baiocchi

I would like to Jordan Rodu for conversations about the CTF and the cheese plate.

References

Bennett, J. and Lanning, S. (2007). The netflix prize. Proceedings of KDD cup and work-shop, 2(3):35.

Bradford Hill, A. (1965). The environment and disease: association or causation? Proceed-ings of the Royal Society of Medicine, 58(2):295–300.

Donoho, D. (2017). 50 years of data science. Journal of Computational and GraphicalStatistics, 2(26):745–66.

Graubard, B. and Korn, E. (1999). Predictive margins with survey data. Biometrics,55(2):652–9.

Hofler, M. (2005). The bradford hill considerations on causality: a counterfactual perspec-tive. Emerging Themes in Epidemiology, 2(1).

Hofler, M. (2006). Getting causal considerations back on the right track. Emerging Themesin Epidemiology, 3(1).

Liberman, M. (2015). Reproducible research and the common task method. SimmonsFoundation Lecture.

Phillips, C. and Goodman, K. (2004). The missed lessons of sir austin bradford hill. Epi-demiologic Perspectives & Innovations, 1(1).

Phillips, C. and Goodman, K. (2006). Causal criteria and counterfactuals; nothing more(or less) than scientific common sense. Emerging Themes in Epidemiology, 3(1).

16

Page 17: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 17-19 Submitted 9/19; Published 1/20

On the use and abuse of Hill’s viewpoints on causality

Samantha Kleinberg [email protected]

Computer Science Department

Stevens Institute of Technology

Hoboken, NJ, USA 07030

Here, then, are nine different viewpoints from all of which we should studyassociation before we cry causation. What I do not believe — and this hasbeen suggested — is that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we accept causeand effect. None of my nine viewpoints can bring indisputable evidence for oragainst the cause-and-effect hypothesis and none can be required as a sine quanon. What they can do, with greater or less strength, is to help us to make upour minds on the fundamental question — is there any other way of explainingthe set of facts before us, is there any other answer equally, or more, likely thancause and effect? (emphasis added, italics original) Hill (1965)

Not since Fisher1 suggested p < 0.05 is often convenient has such a clear statementby a statistician been so misunderstood. Hill’s sensible advice has has been transformedlike Samsa in Kafka’s Metamorphosis into what his article warned against: a checklist.Google scholar returns over 100,000 articles using the phrase “Bradford Hill Criteria,” ithas been growing in usage in books since the 1990s (see figure 1),2 and even the Wikipediapage on the topic is titled “Bradford Hill Criteria.”3 And yet Hill wrote that there are no“hard-and-fast rules” for causality.

This is not just a marketing problem. How we talk influences how we think (Boroditsky,2011) and the mutation of considerations into criteria is in fact part of their misuse. Hillreferred to the pieces of evidence we may wish to examine as “aspects of [an] association[to] consider before deciding that the most likely interpretation of it is causation” (p. 295)and “viewpoints from [which] to study association before we cry causation” (p. 299). Con-siderations may influence our decisions, such as whether to believe a causal relationshipexists, but they are also things we may evaluate and ignore if they’re not relevant. Criteria,in contrast, are a benchmark against which we test something. In the case of causality,criteria provide a tantalizing yet misleading shortcut: check off these boxes and you canclaim causality. Yet, there is no such checklist for causality and Hill’s considerations areneither necessary nor sufficient to establish a causal relationship.4

1. Fisher (1925) said about a p-value threshold of 0.05 that “it is convenient to take this point as a limit injudging whether a deviation is to be considered significant or not.” The message people heard appearsto be “p < 0.05 or it didn’t happen.”

2. Variations of the phrase involving viewpoints and considerations are so rare that no Ngrams are found.3. https://en.wikipedia.org/wiki/Bradford_Hill_criteria

4. See Kleinberg (2015) and (Rothman et al., 2008, p. 26) for a few examples detailing just why this is.

c©2020 Samantha Kleinberg.

Page 18: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Kleinberg

Figure 1: Google Ngram results for usage of “Bradford Hill Criteria” in books.

Checklists can be a powerful tool in safety critical domains where cognitive load ishigh and time is short (Gawande, 2010). However the settings where Hill’s views are mostuseful are not that. They are cases where experiments are difficult or impossible and wemust cobble together piecemeal evidence for causal claims. These are cases where we alsomust assess whether a consideration is relevant to the topic. For example, Hill along withRichard Doll identified a link between smoking and lung cancer at a time when little wasknown about the etiology of lung cancer (Doll and Hill, 1950). It is not possible to conductrandomized experiments to test the hypothesis that smoking is responsible for cancer, butit is of great public health significance to know what causes cancer so it can be prevented.From this experience Hill distilled his views on how we can gain such causal knowledge intohis famous article.

Yet rather than providing a starting point, Hill’s viewpoints have been widely andrepeatedly used as a standard of evidence, the same way the majority of researchers use ap-value cutoff of 0.05. The precise danger of conventions is that one need not justify them,whereas a p-value threshold of 0.04 or 0.06 would invite significant scrutiny.5 However,many other factors such as effect size are important to determining whether a result isactually important or not. When Hill’s views are treated as criteria, they similarly becomea causal inference figleaf. If these reasonable but still unvalidated pieces of evidence can beprovided,6 then congratulations, you can claim causality.

So if I’m suggesting researchers quit the causal criteria cold turkey, what will replacethem? It is perfectly fine to refer to Hill’s considerations as a starting point when thinkingabout what evidence one might gather when evaluating an association. The part that isnot fine is making the leap from these pieces of evidence to a definitive claim of causality– and both failing to consider other types of evidence and forcing these considerations tofit scenarios where they do not apply. These are two critical areas for future research to

5. Editorial guidelines for the journal Cognition explicitly state that only effects with p < 0.05 can bedescribed as statistically significant, stating that “for better or for worse, this is the current convention,”which it seems even journals are powerless to change.

6. This is all leaving aside the question of what it means to satisfy each criteria, which surely requires morenuance than present/absent.

18

Page 19: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Use and abuse of Hill’s viewpoints on causality

explore. First, our methods and data have evolved in the years since Hill’s article, yet theconsiderations remain static. It is worth exploring whether there are other evidence typesthat may prove useful as well as updating how current methods might support the existingconsiderations (e.g. how do big data and simulations fit in?). Second, while Hill focusedon epidemiology, the considerations have been used more broadly and it is important toexamine how the needs for and standards of evidence vary across domains. By allowingthem to evolve, Hill’s considerations will hopefully meet a better end than poor Samsa.

Acknowledgments

Thanks to Dylan Small for providing an outlet for these viewpoints.

References

Boroditsky, L. (2011). How language shapes thought. Scientific American, 304(2):62–65.

Doll, R. and Hill, A. B. (1950). Smoking and carcinoma of the lung. British medical journal,2(4682):739.

Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh.

Gawande, A. (2010). The Checklist Manifesto. Henry Holt and Company.

Hill, A. (1965). The environment and disease: association or causation? Proceedings of theRoyal Society of Medicine, 58(2):295–300.

Kleinberg, S. (2015). Why: A Guide to Finding and Using Causes. O’Reilly Media.

Rothman, K. J., Greenland, S., Lash, T. L., et al. (2008). Modern epidemiology, volume 3.Wolters Kluwer Health/Lippincott Williams & Wilkins Philadelphia.

19

Page 20: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 20-23 Submitted 10/19; Published 1/20

Plausibility and the Benefit of Theoretical Reasoning:Comment on Austin Bradford Hill (1965)

A. James O’Malley [email protected]

Department of Biomedical Data Science and

The Dartmouth Institute of Health Policy and Clinical Practice

Geisel School of Medicine, Dartmouth University

Lebanon, NH 03756, USA

I greatly enjoyed reading the paper “The Environment and Disease: Association or Cau-sation” by Professor Austin Bradford Hill (ABH). The paper continues to be held in highregard and is massively cited (9,359 citations as of October 20, 2019). The paper is cen-tered around nine aspects of statistical association that ABH suggests be considered beforedeciding if causation should be claimed. In some fields, these points have erroneously beenviewed and taught as causal criteria (Phillips and Goodman 2004). Given that “Associa-tion or Causation” discernment is receiving increasing attention in the statistical literature,and that we’re in the midst of the emergence of data science and ever-growing volumes ofobservational data, reviewing ABH (1965) and assessing whether there are insights that caninform contemporary statistical analysis is timely and valuable.

ABH (1965) has been the subject of much review and discussion over many years (e.g.,Philips and Goodman 2004; Thygesen, Andersen and Andersen 2005; Fedak et al 2015).Often reviews have systematically considered each of the nine points and offered discussionand critique about each. Because the appropriateness or not of these points has alreadyreceived an extensive amount of attention, I have focused my comment on a specific topicthat while being inherent to ABH (1965) has not been directly addressed. I focus on therole of theoretical models – often used in economics and sociology to generate causal storiesand allied hypotheses prior to analyzing data or even designing a study – in relation tothe task of distinguishing cause from association. Is the ability to construct and exploittheoretical models in order to devise ways of testing their legitimacy an under-appreciatedskillset that ought to be part of statistical (and data science) practice and education? Inthe following, I first provide some general comments and then give an illustrative example.

The point in ABH (1965) that is most directly relevant to my comment is Plausibility,one of the least considered of the nine points. ABH states “It will be helpful if the causationwe suspect is biologically plausible.” I agree with this statement and think that it can beextended to “It will be helpful if the causation we suspect is plausible in the scientific orother setting being analyzed.” In other words, does the causal story make sense givencurrent knowledge. In order to answer this question, it is imperative that one understandthe underlying science or subject area knowledge. If the causal story is plausible thenthe skill of being able to think theoretically or conceptually about it may allow tests ofhypotheses and ways of identifying parameters in the theoretical model to be constructedthat data can subsequently adjudicate.

c©2020 A. James O’Malley.

Page 21: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Plausibility and Benefit of Theoretical Reasoning

Since 1965, the presence and importance of statistics in science and medicine has grown.However, in many settings, roles of statisticians have been restricted to data analysis as op-posed to the art of linking the broader scientific perspective underlying research or otherinvestigation to data. To prepare students for such data-focused careers, the budding statis-tician is typically supplied with practice problems to solve. This tailor-made approach by-passes a lot of the artfulness of being able to reason theoretically in the application-side ofa problem to identify the key aspects of it that can be tested by data. Thus, even if theirempirical work is performed with much creativity and skill, in terms of overall impact therole of a traditionally-trained statistician may not be as great as it could be.

The statistician is often a “Devil’s advocate”. While being a Devil’s advocate is certainlyneeded, if the ultimate goal is to get the best answer possible using current resources, theskill of being able to discern what should be tested in the first place is vital. I think ofthe construction of a theoretical model based on current knowledge as constructive causalthinking. This process can enlighten one of the data that will be most useful to acquire inorder to strengthen a claim of causality and of what identification strategies can be used.I advocate for the involvement of statisticians in the role of constructive causal thinking aswell as a Devil’s advocate.

In methodological statistics it is common to assume certain complications don’t existin order to focus on the conditions for which a particular result applies. Yet, from apurist standpoint, observational data analysis is inherently doomed. By its nature, thereis always an assumption which, if violated, leads to bias as there is no way of ensuringthat all assumptions, particularly those involving unobserved variables, hold. Therefore, itseems important to pair methodological work on a ready-made or stylized problem with theability to identify the key problem to work on in the first place. This encompasses beingable to justify why a question is important, why assumptions are plausible, recognizing andconstructing sanity checks to inform the validity of a model, and so on. Working throughthese steps may help enable the methodological problem that has the biggest impact to berealized.

Technical approximation is another critical but often untaught skill. It is a means ofassessing what is a reasonable “ball-park” answer to a numerical problem. The conceptcan be generalized to the ability to construct a reasonable ball-park theoretical modelthat can in turn be used to assess whether the stated question of interest is the right oneand whether a proposed analysis answers the question of interest. In practice, the bestresearch questions and empirical strategies can be ingenious for their ease of understanding.Exceedingly complicated solutions are seldom universally popular. Explanations involvingelegant sanity checks based on a plausible theoretical model can be far more convincingthan the output of uninformed but complex empirical analyses.

To conclude the first part of my comment, many social science fields (e.g., economics,sociology) place substantial value on theoretical models that develop hypotheses and alliedidentification strategies. I believe that the skill of being able to theoretically constructrelationships between variables and testable constructs is underutilized in statistics. Ageneral example of this skill is seen is the identification of an instrumental variable (IV);because IVs cannot be fully tested by the data, their construction is a skill that relies onbroad and artful thinking.

21

Page 22: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

O’Malley

To illustrate theoretical (non-empirical) reasoning and its benefits, consider the tie-directionality identification strategy developed in Christakis and Fowler (2007) to accountfor unmeasured common causes in the estimation of peer-effects (the effect of one individualon another). This example features the points in ABH (1965) concerning Plausibility,Analogy and Dose-response.

Christakis and Fowler (2007) hypothesized that a health behavior or characteristic (e.g.,obesity or body-mass-index) of an individual is affected by the same trait in their peers.One form of bias that may afflict the estimator of a peer-effect is from unmeasured commoncauses – uncontrolled factors that impart similar effects on a group of individuals. Un-measured common causes may give the appearance that the change in the outcome of oneindividual tracks that of other individuals, even if there is no actual peer effect.

If there exist dyads for which there should be no inter-individual influence, then suchdyads can be used as a control group whose estimated peer effect captures any unmea-sured common causes (Christakis and Fowler 2007). Under the homogeneity assumptionthat all types of dyads are equally affected by exposure to an unmeasured common-cause,differencing a peer-effect against that for the control dyad group is able to recover the truepeer effect. This is an example of argument by Analogy. Christakis and Fowler (2007) alsotheorized that if peer effects exist they should increase with the strength of relationshipbetween individuals. This led them to theorize that in the case of mutable relationshipssuch as friendships, the peer association will be greatest in dyads for which the friendshipis mutual, second-greatest along network ties when friendship is only from the focal actorto the other actor, third-greatest along network ties when friendship is only from the otheractor to the focal actor, and least in dyads with no relationship (null dyads). (The “focalactor” is the individual providing the dependent variable.) A requirement that peer-effectestimates follow this theoretically-implied ordering of effect-sizes is a more rigorous testof the existence of peer-effects than a single comparison of the estimated peer-effects fornon-null and null dyads as the former encapsulates all hypotheses simultaneously.

Although all sources of unmeasured confounding are not guarded against, which is im-possible in an observational study, the theoretical derivation and implied empirical strategyfor testing the peer-effects hypothesis is appealing for featuring Plausibility, Analogy andDose-response. The underlying theoretical model makes sense, is easily understood andcan be constructively debated, which is what has transpired. For example, in a subsequentpaper, Shalizi and Thomas (2011) introduced a form of confounding, known as latent ignor-ability, which is not accounted for by the Christakis and Fowler approach. They argued thatindividuals may form relationships because they have similar characteristics that continueto affect their outcomes post tie- formation and thus that latent ignorability is indistinguish-able from the true peer-effects in the Christakis and Fowler study. They embellished theirtheory with a compelling example of how latent homophily could occur. By articulatingthe theoretical model or causal mechanism clearly, both the original paper and subsequentcritiques written about it can triggered valuable discussion and insights resulting in furtheradvances in understanding. Plausibility assessment and theoretical reasoning are importantskills!

Another important point implied by ABH (1965) is that one should not hold back frompublishing results in order to wait for every contingency to be accounted for. It is betterto advance knowledge with results once available as long as reports and publications are

22

Page 23: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Plausibility and Benefit of Theoretical Reasoning

accompanied with clear descriptions of the theoretical model, assumptions and identificationstrategy. The final section of ABH (1965), “The Case for Action,” argues that evidence isthere to be judged on its merits and should be independent of what hangs on it. This isconsistent with the notion that while a great idea isn’t foolproof, it might still be worthwhileto pursue. This progressive approach allied with full acknowledgement and explanationof assumptions and an adept sensitivity analysis that evaluates the impact of plausibleviolations of the assumptions (another invocation of the wisdom of ABH (1965)) supportsfully-informed and timely decision making.

While the technical correctness of the points made in ABH (1965) have been extensivelydebated, I believe that ABH (1965) remains a valuable paper to read and that statisticianscan benefit from doing so, including by embracing the importance of plausibility and re-lated points. Social scientists have practiced the art of developing and applying theoreticalmodels for many years. I conjecture that being able to evaluate theoretical models for theirplausibility and apply them to construct identification strategies will help to increase theprominence of statisticians in science, medicine and policy-formation (and other areas ofsociety).

References

Christakis, N.A., Fowler, J.H. (2007). The spread of obesity in a large social network over32 years. New England Journal of Medicine, 357:370–379.

Fedak, K.M., Bernal, A., Capshaw, Z.A., Gross, S. (2015). Applying the Bradford Hill cri-teria in the 21st century: how data integration has changed causal inference in molecularepidemiology. Emerging Themes in Epidemiology, 12:14.

Hill, A.B. (1965). The environment and disease: association or causation? Proceedings ofthe Royal Society of Medicine, 58:295–300.

Phillips, C.V and Goodman, K.J. (2004). The missed lessons of Sir Austin Bradford Hill.Epidemiological Perspectives and Innovations, 1(1):3.

Shalizi, C.R., Thomas, A.C. (2011). Homophily and contagion are generically confounded inobservational social network studies. Sociological Methods and Research, 40(2):211–239.

Thygesen, L.C., Andersen, G.S., Andersen, H. (2005). A philosophical analysis of the Hillcriteria. Journal of Epidemiology and Community Health, 59(6):512-6.

23

Page 24: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 24-29 Submitted 8/19; Published 1/20

Comment on “The Environment and Disease: Association orCausation?”

Christopher J. Phillips [email protected]

Department of History

Joel Greenhouse [email protected]

Department of Statistics

Carnegie Mellon University

Pittsburgh, PA 15289

A. B. Hill’s 1965 discussion of the relationship between statistical association and causal-ity has become so well known among epidemiologists, it is easy to treat his list of relevant“aspects” as timeless and context-less, as a set of logical postulates. In this comment, weinstead want to place Hill’s article in its historical context, both that of the author, andeven more so that of epidemiology’s understanding of statistical association and causation.

The most obvious, but often neglected, context is that the text was first presented as thePresidential Address to the Royal Society of Medicine’s Section of Occupational Medicine.Presidential addresses, particularly when given by senior colleagues, are opportunities forreflection beyond a typical research article. There’s nothing necessary about this reflectivity,however, and the following year’s presidential address by R.S.F. Schilling was instead animpassioned account of the dangers of trawler fishing (Schilling, 1966). This was also notthe first such address Hill himself had given – his May 1954 address to the Epidemiology andPreventive Medicine section eschewed broad generalization in favor of a more traditionalaccount of the expected versus observed cases of interwar polio in England and Wales,broken down by county and locale (Hill, 1954).

The important contextual difference in October 1964 was that the Occupational Medicinesection had just been formed, and Terence Cawthorne’s opening address noted that the sec-tion was intended not just for physicians and surgeons but also scientists from a range ofdisciplines (Cawthorne, 1965). As subsequent speakers at the first meeting made clear, thetraditional focus on industrial medicine was still present, but the renaming as “occupationalmedicine” was a move indicating its broader audience, from sports physicians to educationworkers. In this context, Hill set out to address a question that by its very nature was inter-disciplinary and of great interest to this larger group: what is the relationship between anenvironmental agent and disease, and when is it appropriate to identify such a relationshipas causal?

This was a pressing issue for the new section, Hill wrote, because the characterizationof the relationship between occupational conditions and sickness is “fundamental” and yetproblematic. When is a respiratory illness among workers, he asks, simply associated withdust in the environment, and when is it caused by it? As Hill well knew, the questionof causation cannot be answered solely by any one field, instead, by drawing on physiol-

c©2020 Christopher Phillips and Joel Greenhouse.

Page 25: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Comment on “The Environment and Disease: Association or Causation?”

ogy, statistics, labor relations, public health, and pulmonology, Hill sought to portray thequestion as essentially interdisciplinary and therefore of great interest to the new section.

Hill himself had long been concerned with causal questions. Though he modestly doesn’tcite his own work, he had written a long report on the prevalence and origins of respira-tory illnesses among Lancashire’s cotton cardroom operators in the 1920s for the MedicalResearch Council. The cleaning of raw cotton prior to spinning was known to cast off dustand fibers, and in 1927 the UK’s home secretary established an investigation into “whether,and if so to what extent, dust in cardrooms in the cotton industry is a cause of ill-healthor disease among cardroom operatives” (Hill, 1930). It was known already that there werehealth effects from the cleaning of the carding machines themselves, and the owners of themills contended that the mechanical methods installed to remove dust had remedied theproblem, while operatives contended that cleaning cotton carried health risks distinct fromthe cleaning of the machines. Hill was tasked with finding out what kind of ill effects, ifany, there were from the cleaning of cotton, and whether these were distinct from otherrespiratory diseases known to occur at different stages of the process, and what if anythingcould be done about it. Hill’s choice of this same example in 1964 is a clear indication thatthe question of “environment and disease” was one that he had been contemplating for along time.

Hill’s role within industrial health efforts of the 1920s and 1930s put him in contact withcolleagues who themselves saw an essential role for statistics in making causal claims. Unlikethen-contemporary biomedicine’s focus on bacteria and other microscopic agents of infec-tion, industrial and environmental health were areas in which “causal factors” and “causalrelationships” were known to be multifactored, complicated, and often hidden. Work inthese areas using statistical rates to make claims about causality, from the relationship ofhousing and health to that of infant mortality, goes back at least to William Farr and Flo-rence Nightingale. Later, Udny Yule had used statistical methods, specifically a regressionequation, to try to pinpoint causes of pauperism in England at the turn of the century.1

Hill’s work at the Medical Research Council along these lines was initially overseen by MajorGreenwood, an influential statistician whose own training, combining statistics (he stud-ied under Karl Pearson) with physiology and preventive medicine, exemplified the growingimportance of data for measuring associations between health and environmental agents(Higgs, 2000). Indeed, there’s a good argument that Hill inherited the mantle of statisticsin medicine from this earlier generation. Their approaches – emphasizing careful studydesign and data collection techniques, relying on relatively conservative uses of statistics,avoiding formal inferences tests and elaborate models – would later characterize Hill’s ownapproach throughout his career.

With remarkable clarity in his 1964 address, Hill lays out the question of interest:

Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. Whataspects of that association should we especially consider before deciding thatthe most likely interpretation of it is causation? (p. 295)

Contrary to Rothman and Greenland’s claim (1998) that Hill’s criteria were essentially“an expansion of criteria offered previously in the landmark Surgeon General’s report on

1Yule (1899) was discussed thoughtfully alongside other examples in Freedman (1999).

25

Page 26: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Phillips and Greenhouse

Smoking and Health,” Hill’s approach for distinguishing causal from non-causal associationswas developed based on his longstanding experience in epidemiologic field studies. As anillustration we consider his influential paper with Richard Doll, “Smoking and Carcinoma ofthe Lung” (1950). This was an early case-control study of 20 hospitals in the London regionof patients presenting with cancer of the lung, stomach and large bowel. The patients withcarcinoma of the stomach and large bowel served as one comparison group and anothercomparison group were non-cancer general hospital patients, “chosen so as to be of thesame sex and age as the lung-carcinoma patients.” In the Discussion section, Doll and Hillsynthesized the results, in language that would both implicitly and explicitly feature therelevant aspects of association, specificity, biological gradient, consistency, coherence, andplausibility:

• “...the comparison of the smoking habits of patients in different groups...revealed noassociation between smoking and cancer of the other sites (mainly stomach and largebowel. The association therefore seems to be specific to carcinoma of the lung.”

• “The effect of smoking varies, as would be expected, with the amount smoked.”

• “How do these results fit in with other known facts about smoking and carcinomaof the lung? Both the consumption of tobacco and the number of deaths attributedto cancer of the lung are known to have increased, and to have increased largely, inmany countries this century.”

• “As to the nature of the carcinogen we have no evidence. The only carcinogenicsubstance which has been found in tobacco smoke is arsenic, but the evidence thatarsenic can produce carcinoma of the lung is suggestive rather than conclusive. Shouldarsenic prove to be the carcinogen, the possibility arises that it is not the tobacco itselfwhich is dangerous. Insecticides containing arsenic have been used for the protectionof the growing crop since the end of the last century and might conceivably be thesource of the responsible factor.”

Clearly, Doll and Hill are systematically and logically assessing the body of evidence fromtheir study and the existing epidemiologic literature to make a case for (or against) a causalassociation.

In the last quotation, Doll and Hill consider an alternative explanation for the ob-served association between tobacco smoke and carcinoma of the lung—arsenic in tobacco.Curiously, in his 1964 address he does not explicitly include elimination of alternative expla-nations as a relevant aspect. However, Hill clearly considered the elimination of alternativeexplanations central to the establishment of a causal association as he indicated in his Wat-son Memorial Lecture (1962), itself repeatedly cited in the 1964 address. In the sectionentitled “The Assessment of Evidence” he writes:

We are continuously brought back to the fundamental question - what alterna-tive explanation will fit a set of observations, what other differences between ourcontrasted groups could equally, or better, account for the observed incidences.That is the crux of the matter - and no χ2 test or other application of the Greekalphabet will answer it. It demands an experience, and acumen, in what tocollect, how to seek in the data for essentials, how to interpret. (p. 189)

26

Page 27: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Comment on “The Environment and Disease: Association or Causation?”

This focus on alternative explanations predated Hill’s work with Doll on smoking andhealth. Indeed, his experience with the Industrial Health Research Board had alreadyestablished this pattern in the 1920s. When looking at whether artificial humidificationproduced diseases among workers in the cotton industry’s sheds, for example, Hill showedhow simply by collecting the right data one could carefully rule out possible alternativeexplanations for the sickness of workers (Hill, 1927). He’d include towns with only one kindof shed, to ensure sicker workers were not selecting the humid sheds; he’d also include townswith both humid and dry sheds to ensure there was not a bias at that level of selection;he’d track the sickness of every weaver, even those who left before the end of the study, toensure that the departure of the sickest workers was not itself an explanation of the data;he consulted over 20,000 records to be sure that any measured differences were unlikely tobe explained by chance. Ruling out alternative explanations didn’t involve formal tests forHill so much as it did careful data collection and logical thinking.

Counterfactual history is a dangerous game to play but it is likely uncontroversial tosay that even if Hill hadn’t published his list of criteria, some set of criteria for movingfrom statistical association to causation would have become standard by the 1980s. Wemight instead be talking about the U.S. Surgeon General’s “Criteria of the EpidemiologicMethod” since their publication in 1964’s Smoking and Health featured a similar empha-sis on consistency, strength, specificity, temporal relationship, and coherence. Throughoutthe 1950s, a number of eminent epidemiologists and statisticians had tried to specify howexactly statistical methods might be useful for making causal claims. In 1959, Jerome Corn-field and colleagues published a long review article systematically evaluating the question ofcausation in smoking and health. The conclusion – that smoking plays a “causal role” in theacquisition of lung cancer – was based on the existing data’s strength, specificity, tempo-ral relationship, consistency, coherence, and a systematic attempt to eliminate alternativeexplanations (Greenhouse 2009). These were, of course, exactly the criteria later used inthe Surgeon General’s report. Two years earlier, Abraham M. Lilienfeld’s “EpidemiologicalMethods and Inferences in Studies of Noninfectious Diseases” (1957) also used smoking asthe model for thinking about how to elucidate possible etiological factors, settling on apragmatic approach to analyzing statistical associations which emphasized, e.g., that whenthe exposure to a factor is diminished, so should the incidence. And two years before that,E. Cuyler Hammonds’ chapter on “Cause and Effect” in Ernest Wynder’s The BiologicEffects of Tobacco made it plain that “The eventual aim of epidemiologic research is to dis-cover means by which conditions may be altered in such a way as to lower disease incidenceand mortality rates or at least to limit their rise. Thus, it becomes a search for causativefactors.” Moreover, for Hammonds, causative factors were readily ascertained statisticallybecause they typically acted “quantitively,” namely they “increas[ed] the probability thata specific event will occur” (Hammonds, 1955: 173-174). Though with different emphasesand subtleties, there was a consistent approach from many epidemiologists and biostatisti-cians to think systematically and rigorously about how to make causal claims in the yearsafter World War II. Hill’s lecture, while more expansive in its list of criteria than anythingthat came before, did little to change the overall trajectory of how practitioners were usingstatistical measures of association.

As a coda, it is worthwhile to remember that Hill’s lecture wasn’t hailed as a masterpiece,or even as particularly important when it was first published. Though now widely cited,

27

Page 28: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Phillips and Greenhouse

it was hardly cited at all through the 1970s (only about twice per year on average in thefifteen years after publication). In fact, Mervyn Susser’s 1973 textbook Causal Thinking inthe Health Sciences has its “criteria of judgment” explicitly modeled around the SurgeonGeneral’s report. As Susser was to admit years later, he had not even read Hill’s article whenpreparing his textbook, surely a sign if there is any that its importance wasn’t immediatelyapparent (Susser, 1973, 1991). We think it plausible, in fact, that the association of causalcriteria with Hill’s speech was ultimately more of an honorary gesture towards Hill’s ownimportance in the 1980s and 1990s. By that point, there was little doubt of his lasting role,in the development of clinical trials and observational studies, in the use of statistics bygovernmental agencies, and, indeed, in the ways we can make rigorous causal claims.

References

Cawthorne, T. (1965). Opening Address. Proceedings of the Royal Society of Medicine.58(5):289-94.

Cornfield, J. et al. (1959). Smoking and Lung Cancer: Recent Evidence and a Discussionof Some Questions. Journal of the National Cancer Institute. 22(1):173-203.

Doll R. and Hill, A.B. (1950 Sep 30). Smoking and Carcinoma of the Lung. British MedicalJournal. 2(4682):739-748.

Freedman, D. (1999). From Association to Causation: Some Remarks on the History ofStatistics. Statistical Science. 14(3):243-258.

Greenhouse, J.B. (2009). Commentary: Cornfield, Epidemiology, and Causality. Interna-tional Journal of Epidemiology, 38(5):1199-1201.

Hammond, E.C. (1955). Cause and Effect. The Biologic Effects of Tobacco. E.L. Wynder,ed. Little, Brown, Boston. 171-196.

Higgs, E. (2000). Medical Statistics, Patronage, and the State: The Development of theMRC Statistical Unit, 1911-1948. Medical History, 44:323-340.

Hill, A.B. (1927). Artificial Humidification in the Cotton Weaving Industry. MedicalResearch Council, Industrial Fatigue Research Board, Report No. 48. London: HerMajesty’s Stationary Office.

Hill, A.B. (1930). Sickness Among Operatives in Lancashire Cotton Spinning Mills. MedicalResearch Council, Industrial Health Research Board, Report No. 59. London: HerMajesty’s Stationary Office.

Hill, A.B. (1954). Poliomyelitis in England and Wales Between the Wars. Proceedings ofthe Royal Society of Medicine. 47(9):795-805.

Hill, A.B. (1962). Alfred Watson Memorial Lecture: The Statistician in Medicine. Journalof the Institute of Actuaries. 88(2):178-191.

Lilienfeld, A.M. (1957). Epidemiological Methods and Inferences in Studies of NoninfectiousDiseases. Public Health Reports. 72(1):51-60.

Rothman, K.J. and Greenland, S. (1998). Causation and Causal Inference. In ModernEpidemiology, 2nd edition. Lippincott Williams & Wilkins, Philadelphia.

Schilling, R.S. (1966). Trawler Fishing: An Extreme Occupation. Proceedings of the RoyalSociety of Medicine. 59(5):405-410.

Susser, M. (1973). Causal Thinking in the Health Sciences. Oxford University Press, NewYork.

28

Page 29: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Comment on “The Environment and Disease: Association or Causation?”

Susser, M. (1991). What is a Cause and How Do We Know One? A Grammar for PragmaticEpidemiology. American Journal of Epidemiology, 133(7):635-648.

Yule, G.U. (1899). An Investigation into the Causes of Changes in Pauperism in England,Chiefly During the Last Two Intercensal Decades. Journal of the Royal Statistical Society.62(2):249-295.

29

Page 30: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 30-32 Submitted 7/19; Published 1/20

The Wrong Message from the Wrong Talk

Kenneth J. Rothman [email protected]

Boston University and

Research Triangle Institute

3040 East Cornwallis Road

Research Triangle Park, NC USA 27709

Hill’s 1965 address (Hill, 1965) on causal inference to the newly formed section onOccupational Medicine of the Royal Society is considered hallowed scripture, not merelyrevered by researchers, but even used as a template for adjudication of causal questions bycourts of law. It certainly was a provocative and thoughtful after-dinner talk, but it doesnot merit the status of scripture, especially not for the reasons it is so revered. The messagefor which it is remembered was not the most important advice in the talk, and the talkitself was less important than another lecture, now largely forgotten, that Hill gave someyears earlier.

The 1965 talk, which has been cited over 9000 times, gained prominence because itseems to offer a checklist of criteria for causal inference. Scientists, like anyone else, areinclined to follow a recipe if one exists. Apparently, few are bothered by how well the recipeworks, or even whether it works. Consequently, numerous papers have attempted to infercausation through the application of the “Hill criteria,” on topics ranging from the zikavirus and microcephaly (Awadh et al., 2017) to neuropsychiatry (van Reekum et al., 2001)to climate change (Science in Society Archive). In one recent review of dietary sodiumand cardiovascular risk, the authors divided the paper into sections that corresponded toeach of Hill’s nine criteria (Cogswell et al., 2016). In other papers, the authors employedscoring systems to quantify how well the criteria are met and from that derived an overallnumber which was intended to be a measure of causality (Mente et al., 2009). In one suchapproach, a discriminant analysis was used to generate weights for each of the nine criteria,which were used to derive an overall score that was interpreted as a probability indicatingwhether the association is causal (Swaen and Amelsvoort, 2009).

Philosophers of science consider these to be futile efforts, because they agree that therecannot be a logical basis for a checklist approach to scientific inference. It comes as nosurprise, then, that critics have steadily questioned the legitimacy of inferential checklists,as well as the origin of Hill’s list, and the applicability of specific points that Hill includedin the list (Weed, 1988; Haack, 2014; Ward, 2009; Rothman and Greenland, 2005; Morabia,1991; Philips and Goodman, 2004; Blackburn and Labarthe, 2012; Thygesen et al., 2005;Weiss, 2002). But we can hardly criticize Hill for originating the checklist, because he askedto be counted among the critics of a checklist approach to inference. He studiously avoidedthe word “criteria” in his talk, and he cautioned against using his “viewpoints” as if theywere criteria, claiming that none of them was a sine qua non for causal inference.

c©2020 Kenneth Rothman.

Page 31: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

The Wrong Message from the Wrong Talk

In 2004, Philips and Goodman elegantly laid out the thesis that Hill’s 1965 paper isesteemed for the wrong reason (Philips and Goodman, 2004): “We will say only that Hill’slist seems to have been a useful contribution to a young science that surely needed systematicthinking, but it long since should have been relegated to part of the historical foundation,as an early rough cut. Yet it is still being recited by many as something like natural law.Appealing in our teaching and epistemology to the untested ”criteria” of a great luminaryfrom the past is reminiscent of the ”scientific” methods of the Dark Ages.” In their essay,they suggested two alternative messages that they deemed to be the “missed lessons” ofHill’s talk. One of these was Hill’s warning about reliance on statistical significance testingfor inference: “(W)e waste a deal of time, we grasp the shadow and lose the substance, weweaken our capacity to interpret data and to take reasonable decisions whatever the value ofP. And far too often we deduce ’no difference’ from ’no significant difference’” (Hill, 1965).Hill continued, “I wonder whether the pendulum has not swung too far - not only withthe attentive pupils but even with the statisticians themselves.” This issue, which has beenmostly submerged for decades, has finally surfaced to spawn serious debate and reflection(Wasserstein and Lazar, 2016; Amrhein et al., 2019). Reliance on significance testing forinference has led to countless misinterpretations of data, and has exacerbated the problemof replication of results. If that message had been the legacy of Hill’s talk instead of thechecklist, untold errors might have been avoided.

A dozen years before the 1965 lecture for the Royal Society, Hill gave the Cutter Lec-ture at Harvard. Some background is relevant: in 1951, Prof. Hugh Sinclair of OxfordUniversity was the Cutter Lecturer (Sinclair, 1951). Sinclair’s theme was to exalt the ex-perimental method, while attacking nonexperimental science, at least regarding nutritionalresearch: “The use of the experimental method has brilliant discoveries to its credit, whereasthe method of observation has achieved little...The observer must await the occurrence ofthe natural succession of events he wishes to study, and he is very apt to be misled by thefallacy of post hoc ergo propter hoc or by the existence of a correlation without causality”(Sinclair, 1951). When Hill delivered his Cutter Lecture two years later, he pointedly titledit “Observation and Experiment” (Hill, 1953). He made the case for nonexperimental epi-demiology, starting with the example of Snow’s work on cholera, moving on to studies ofrubella, and then to his own work on smoking and lung cancer. He made it clear that hewas not criticizing experimentation; indeed he preferred when possible to get experimentalevidence, a point reiterated in his 1965 Royal Society talk. But he emphasized that “...mypreference in preventive medicine for the experimental approach...does not lead me to repu-diate or even, I hope, to underrate the claims of accurate and designed observations” (Hill,1953). Hill’s lecture was clearly a response to Sinclair’s attack on nonexperimental science.He repeated Sinclair’s phrases: “The observer may well have to be more patient than theexperimenter – awaiting the occurrence of the natural succession of events he desires tostudy; he may well have to be more imaginative – sensing the correlations that lie below thesurface of his observations; and he may well have to be more logical and less dogmatic –avoiding as the evil eye the fallacy of post hoc ergo propter hoc, the mistaking of correlationfor causation” (Hill, 1953).

Hill’s inspirational rebuttal to Sinclair, along with his work on smoking and lung cancer,helped to fortify the philosophic foundation for nonexperimental epidemiologic research. Heasked whether we can draw inferences at all without experimentation, and he answered affir-

31

Page 32: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Rothman

matively. In comparison, the issue that became the focus of his 1965 talk was misconstruedand far less compelling.

References

Amrhein V, Greenland S, McShane B (2019). Scientists rise up against statistical signifi-cance. Nature, 567: 305-307. doi: 10.1038/d41586-019-00857-9.

Awadh A, Chughtai AA, Dyda A, Sheikh M, Heslop DJ, MacIntyre CR. (2017). Doeszika virus cause microcephaly – applying the Bradford Hill viewpoints. PLOS CurrentsOutbreaks, Feb 22.

Blackburn H, Labarthe D (2012). Stories from the evolution of guidelines for causal inferencein epidemiologic associations: 1953–1965. Am J Epidemiol, 176: 1071–1077.

Cogswell ME, Mugavero K, Bowman BA, Frieden TR (2016). Dietary sodium and cardio-vascular disease risk – measurement matters. N Engl J Med, 375: 580-586.’

Haack S (2014). Evidence matters; science, proof, and truth in the law. Cambridge Univer-sity Press, pp 258-263.

Hill, AB (1953). Observation and experiment. N Engl J Med, 248: 995-1001.Hill, AB (1965). The environment and disease: association or causation? Proc R Soc Med.,

58:295–300.Mente A, de Koning L, Shannon HS, Anand SS. (2009). A systematic review of the evidence

supporting a causal link between dietary factors and coronary heart disease. Arch InternMed., 169(7):659–669.

Morabia A (1991). On the origin of Hill’s criteria. Epidemiology,2: 367-369.Philips CV, Goodman KJ (2004). The missed lessons of Sir Austin Bradford Hill. Epidemi-

ologic Perspectives & Innovations, 1:3. doi:10.1186/1742-5573-1-3Rothman KJ and Greenland S (2005). Causation and causal inference in epidemiology. Am

J Public Health, 95: S144–S150. doi:10.2105/AJPH.2004.059204Science in Society Archive: http://www.i-sis.org.uk/TheBradfordHillCriteria.phpSinclair HM (1951). Nutritional surveys of population groups. N Engl J Med, 145, 39-47.Swaen G, Amelsvoort L (2009). A weight of evidence approach to causal inference. J Clin

Epidemiol, 62: 270-277Thygesen LC, Andersen GS, Andersen H. (2005). A philosophical analysis of the Hill

criteria. J Epidemiol Community Health, 59: 512–516. doi: 10.1136/jech.2004.027524van Reekum R, Streiner DL, Conn DK (2001). Applying Bradford Hill’s criteria for causa-

tion to neuropsychiatry: challenges and opportunities. J Neuropsychiatry Clin Neurosci.,13:318-325.

Ward AC (2009). The role of causal criteria in causal inferences: Bradford Hill’s ”aspects ofassociation.” Epidemiologic Perspectives & Innovations , 6:2 doi:10.1186/1742-5573-6-2

Wasserstein RL, Lazar NA (2016). The ASA’s Statementon p-Values: Context, Process, andPurpose. The American Statistician, 70: 129-133, doi:10.1080/00031305.2016.1154108

Weed DL (1988). Causal criteria and Popperian refutation. In Causal Inference, RothmanKJ, ed., Epidemiology Resources, Inc.

Weiss NS. (2002). Can the “specificity” of an association be rehabilitated as a basis forsupporting a causal hypothesis? Epidemiology, 13: 6–8.

32

Page 33: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 33-46 Submitted 11/19; Published 1/20

Causation in Action: Some Remarks Attendant toRe-reading Hill (1965)

Herbert L. Smith [email protected]

Department of Sociology and Population Studies Center

University of Pennsylvania

Philadelphia, PA, USA 19104

What a lovely paper! I use a number of the topics and points raised by Hill (1965) toreconsider aspects of the notion of causation in research in the social sciences and especiallyin sociology, the field with which I am most familiar. One could do more. The paper’sintellectual concision and economy of expression are laudable. Each time I read it, I generatea new set of marginal comments.

I first read the paper many years ago. But why? The topical ambit is restricted:association or causation in epidemiology—occupational medicine in particular. It is veryEnglish in that it features chimney sweeps, although there is nothing Mary Poppins-y aboutthem: These poor men were dying from scrotal cancer at a rate that was extraordinarilyhigh relative to such deaths among other workers. This implicated as causes of their cancersthe tars and oils characteristic of their trade (Hill 1965, p. 295).

I am a sociologist, not an epidemiologist; have not studied scrotal cancer or any ofthe other diseases and physical conditions discussed by Hill (1965); and until late in lifehad never been to England. I would only have known of the paper via Holland (1986,pp. 956-957), where it figures among the canonical disciplinary treatments of causationrelated to Rubin’s (1974) model for causal inference. At that time I did not pick up muchfrom it, because I was reading it with a whiggish cast of mind, as if it were evident that thediffuse treatments of causation in the past were noble-but-incomplete efforts on route to theprecision of the present. Now I wonder, especially where the social sciences are concerned.

The mental discipline imposed by the potential outcomes framework (Rubin 2005) isvery powerful. When I was first exposed to it (Holland 1986; Rosenbaum 1984), it wasas though the scales fell from my eyes. I used this framework to first think (Smith 1990),then re-think (Smith 2013) all manner of studies in sociology, demography, criminology,and social epidemiology. Developments in causal thinking in the social sciences have beentremendous (e.g., Morgan 2013). But as I read Hill (1965) in retrospect, I think I see somethreads of my own re-thinking of the situation, which is an admix of professional, scientific,and intellectual critique.

In brief, and without nuance: We have harnessed ourselves to a “game” in which theobjective is to make a world of interconnected, purposive actors bound in historical timeand changing social structures look something like a randomized experiment. This feedsinto a reductionist, individualist view of social science—and of the world we live in. Causesbecome embodied in the subjects on whom we make measurements and do causal calcu-lations. Researchers claim priority for the importance of causal analysis because of its

c©2020 Herbert Smith.

Page 34: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Smith

importance for action (in polite terms, “policy”), even if the action and active agents areat great remove from the assignment mechanisms (random or as-if-random) that constitutemanipulation (action) in the experimental model. We find ourselves at the wrong level ofanalysis, justifying our claims about how the world works on the basis of a precision thatis specious. It’s not that we are stupid. It’s the sociology of situation. The generalizedscientific development of causal analysis for observational studies (because that’s what mostempirical social science is) feeds into a status hierarchy not just of ideas, but of individualswithin the profession.

One or more versions of many of these points are elaborated in a more genteel andconsidered manner in Smith (2013). I stab further at a few here, drawing on topics suggestedby Hill (1965), although I do not mean to implicate him anachronistically, which would beparticularly unfair to someone who valued coherence (p. 298).

Those Lurking Confounds

Hill’s (1965) first criterion with respect to causation is strength of an association. In ad-dition to giving some trenchant examples of just how much damage certain environmentalconditions induce in the humans who are exposed to them (pp. 295-296), he makes theimportant statistical point that for a strong observed association to be explained by someconcomitant or antecedent factor, that factor must be very strongly associated in turn withthe variable perhaps being mistaken for a cause. In particular:

...to explain the pronounced excess in cancer of the lung in any other environ-mental terms requires some feature of life so intimately linked with cigarettesmoking and with the amount of smoking that such a feature should be easilydetectable. (p. 296)

In sociology, it was once common practice to teach this kind of thinking with reference tocross-classified survey data (Rosenberg 1968). A large zero-order association was one witha large percentage difference, i.e., the percentage with some outcome characteristic undera potential treatment condition less the percentage with that same outcome characteristicconditional on an alternative (control). Before one spent a lot of time re-tabulating thedata conditional on one (or two) control variables, it behooved one to create zero-ordertables checking the association between a control variable and the original independent (ortreatment) variable, and between the control variable and the dependent variable. If thesezero-order percentage differences were not at least as large as the original effect (association)observed, then finding that that original association could be explained – in the senseof a substantial reduction in the association treatment/control (independent/dependentvariable) conditional on re-categorization by the control variable – was just not going tohappen.

This must have been the kind of knowledge people didn’t really want to have, be-cause the rise of high-speed computing and regression models with many variables led toa situation in which the vagueness of high-dimensional space gave license to all manner ofseminar-room speculation and criticism of interpretations of strong observed associations asreflecting causal processes. Hope could spring, if not eternal, then at least more provisionalthan would be warranted given a tighter focus on elementary facts. With controlling and

34

Page 35: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Causation in Action

partialling taking place through the calculation and inversion of ever-expanding variance-covariance matrices, it was easy to forget just how big an association a possible confoundingor lurking variable needed to have to be the real explanation of an observed association. Ithus sympathize with Hill’s (1965) plaint:

If we cannot detect it or reasonably infer a specific [confounding factor], thenin such circumstances I think we are reasonably entitled to reject the vaguecontention of the armchair critic ‘you can’t prove it, there may be such a feature’.(p. 296)

Also, things improved: The statistical methods and vocabulary for addressing hiddenbias in observational studies (Rosenbaum 1991) were a bracing antidote to “vague con-tention,” since the establishment of bounds on estimated treatment effects entails a state-ment of just how strong selection into a treatment—how far a departure from randomassignment, how “intimately linked” the confound and treatment– would need to be togainsay the treatment effect as estimated conditional on observables and other aspects ofstudy design.

These positive developments have been taken up by sociologists (e.g., DiPrete and Gangl2004), but I would still like to quibble with the current state of scientific and scholarlyaffairs. Here I piggyback on Hill’s (1965) perspicacious allusion to the idea that a strongconfound “should be easily detectable” (p. 296). Our overriding concern with hidden biasin causal inference suggests to me (a) a level of social science so immature as to not yethave recognized the most powerful features of the explanatory environment; and/or (b) thehuman tendency to imagine that the forces that we cannot see are far larger than thosethat are in front of our eyes.

I suspect that there are non-sociologists and sociologists alike who would plump for theformer characterization. I am not among them. Later I shall comment on some aspectsof sociological explanation that run counter to the individual-level reductionism intrinsicto statistical and econometric causal analysis. Here I offer an example of the hoary socio-logical approach to causal analysis, in this case an investigation of whether a theoreticallyanticipated association is suppressed by observable, plausible confounds.

Davis (1982) presents findings “. . . [that] cast considerable doubt on the ‘class culture’notion that occupational strata have vast and diffuse effects on the texture of our lives” (pp.580-581). There is no evidence that an effect of occupation cum class culture on variousdispositions is being suppressed by associations of occupation with race, age, and/or sex.Pace “’middle-class values,’ ‘the culture of poverty,’ ‘hard hat mentality,’ ‘working classauthoritarianism’” (p. 582) and so on:

...[T]he association between race and [occupational] stratum, net of [e]ducation,is not all that strong... Since test variables must have stronger associationswith X and Y than the associations they explain, race is not a promising [sup-pressor variable]. Sex, on the other hand, does have a healthy association with[o]ccupational stratum, net of [e]ducation..., but at the other end of the line,[s]ex is not, in general, a strong correlate of...attitude... Finally, we consider[a]ge. Very young workers do have lower prestige jobs... [and] there is a decent[association] for [a]ge and [o]ccupation net of [e]ducation, and numerous studies

35

Page 36: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Smith

show younger people to be more ‘liberal.’ I suspect further analyses introduc-ing [a]ge would allow more occupational effects to peep through. Nevertheless,[the partial association of age with occupation] is not a whopper vis-a-vis [otherassociations]. Consequently, age would have to show extraordinary effects onthe [outcome] variables for it to suppress any but small effects of occupationalstratum. Thus, I doubt that [a]ge has strong enough effects to change the broadpicture. (pp. 582-583)

Davis (1982) concludes in a manner that is in line with Hill’s (1965) thinking on confoundingvariables and causation:

Similar mental exercises with other obvious test variables did not lead to anymore promising ones... In sum, my conclusion is that occupational stratumsimply does not have the diffuse and strong effects on our nonvocational attitudesand opinions that Sociologists have generally assumed. (p. 583; emphasismine)

As for our penchant to credit the power of unseen forces for the acts of men: In theWestern canon, we can date the imputation of reflexive action – of reasoning, of means-endsorientations consequent to thought and to choice – to the 5th century of the Athenian era(Romilly [1984] 1994). To be sure, prior to then people acted – Homer’s heroes were nothingif not men of action – but this preceded the idea that a cause could and should attach itselfto some purposive action, as in action after reflection. Before this, stuff just happens, asin the deterministic, inevitable cycles of revenge detailed by Herodotus (Romilly [1984]1994, p. 177) or in consequence to exogenous shocks: storms, waves, and other accidentalmisfortunes (p. 34). But the most interesting causal attributions in the pre-psychologicalera were the forces that substituted for cognition in the minds of men. It was all aboutthe gods, who hovered everywhere. They were primordial in Homer, for whom “action goesfaster than reflection, brushing it away. So that, as necessary, a god can take it on himselfto intervene and decide” (Romilly [1984] 1994, p. 34; my translation). In consequence

divine causality often comes to remove much of the importance of human causa-tion. . . . And above all, even when a man seems to be making a decision alone,one never knows if there isn’t a god leading him along. Eschylus shows con-stantly this divine causation that doubles for, and puts the lie to, the free playof human motives. (pp. 36 and 65 [my translation])

Hill’s (1965) skepticism regarding alternative explanations for the much higher rates oflung cancer among cigarette smokers than among non-smokers was restricted to “any otherenvironmental terms” (p.296; emphasis mine). But what of non-environmental factors?Nobody would now claim that the gods of the ancients still exist, or that they meddle inthe affairs of men. Fisher (1958, p. 163), the leading proponent for the view that thesmoking-cancer relationship was spurious and not causal, instead emphasized genotype asthe hidden variable (“common cause”) behind the association between smoking and lungcancer. With a solicitude that would have been touching were it not so murderous, Fisher(1958) fulminated against telling people that something they were doing could be makingthem deadly ill, especially if their reasons for smoking were reasonable and, in the event,beyond their control. As with the gods, back in the day.

36

Page 37: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Causation in Action

I’m kidding. Sort of. Genes exist. We can now observe them. They are correlated withall manner of things, many in the sociological realm (Bearman 2008). Let’s set aside for themoment the fraught place of genetics in sociology and other social sciences—the difficultyin conversation between those for whom the science of the situation is so appealing as tomake its desirability self-evident and those who smell yet another non-human-agency ratio-nalization for social inequality. Are these the massive, powerful lurking variables, hithertounobservable but now eminently so, that constitute the hidden bias in the estimation ofcausal effects in the social world? To date, the answer would seem to be “no,” at least ifwe think about causation, especially in observational studies, with reference to a “fall fromgrace” relative to the experimental model. By this I mean genes that are non-randomlyassigning folks to one social position or another while simultaneously determining somedesirable social achievement, thereby creating the illusion that the social position is in somesense causing the social achievement. When really it is just the genes, doing their thing.

Instead, “genetic expression can only reveal itself through social structural change”(Bearman 2008, p. v). This remark derives from a collection of studies that find, interalia, that all manner of gene-behavior associations are altered if not neutralized as socialenvironments vary (e.g. school atmospherics [Guo, Tong, and Cai 2008], family life [Martin2008], networks of social support [Pescosolido et al. 2008] , and national educational policies[Penner 2008]). The lurking variables, from a causative perspective, are less the gods andalleles—they are what they are—than all the social circumstances staring us in the face. Thereference to social structure, as opposed to the socially meaningful attributes of individuals,may seem murky at this point. I address it further below. An overarching point is that ourpenchant for reductionism tends to steer us away from causation operating above the levelof the individual (Smith 2013, pp. 65-69).

Causes with Many Effects

To help in adducing causation in observational studies, we can elaborate our theories tospecify unaffected units, essentially equivalent treatments, and unaffected responses (Rosen-baum 1984, pp. 42-44). We gain confidence that something is a cause of something elseif we find its effects where we anticipate them and not where we do not. A now commonfeature of econometrics, a placebo or falsification test (e.g., Rothstein 2010), is typicallyconcerned with specifying unaffected responses.

Specificity was the third of Hill’s (1965) criteria for establishing causation, althoughhe was quick to add that “[w]e must not, however, overemphasize the importance of thecharacteristic” (p. 297; see, also, Holland 1986, pp. 956-957). Small wonder. The re-lationship between cigarette smoking and lung cancer is very strong (p. 296), but “theprospective investigations of smoking and cancer of the lung have been criticized for notshowing specificity – in other words the death rate of smokers is higher than the death rateof non-smokers from many causes of death...” (p. 297). In the event, smoking is just flat-out bad for human health, even if its signature effect is on lung cancer. Hill (1965, p, 296)was quick to point out that “it does not follow...that [the] best measure of the effect uponmortality is also the best measure in relation to ætiology...” The case against smoking wasbest made with reference to differential mortality with respect to lung cancer, although thepopulation prevalence of cardiac disease means that the weaker causal effects of smoking

37

Page 38: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Smith

are nonetheless associated with more excess deaths (e.g., Fenelon and Preston 2012). AsHill (1965, p. 296) noted,

It does not, of course, follow that the differences revealed by ratios are of anypractical importance. Maybe they are, maybe they are not; but that is anotherpoint altogether.

Analogies have their limits. Cigarettes and lung cancer may be an extreme causalrelation. But the generalized noxiousness of smoking is not. In the social sciences there areseveral factors that are associated with all manner of outcomes. Education is the canonicalexample, and fortunately the effects of education are in general positive. Davis (1982)was combing the General Social Survey for predictors of morale (social life feelings), socialattachment, political opinion, values and tastes, and stances on social issues. Again andagain, education popped up as a strong predictor (if not a cause), even after netting outeducation’s effect on occupational attainment, the indicator of the class cultures that Davis(1982) found wanting in explanatory power. Cutler and Lleras-Muney (2010) do a similarsurvey of the association between education and various health behaviors. Education isagain an ever-present factor. It is hard to think of a domain of social life when it is not.When there is a factor whose effects are so ubiquitous, the precision of any one effect is ofdecreasing interest, in the sense that building an argument for action (or not) on the preciseresults in one domain may be a bit blind with respect to the sociology of the situation.Studies seeking to estimate the causal effect of education on this or that abound, and ofcourse the estimated causal effects do not always look like the zero-order associations, oreven the partial associations net of standard sets of observable antecedent variables. I don’tmean to be a scientific Luddite: I am not arguing that we break our tools for determiningwhat effects are and are not “causal.” I do want to suggest that such results be takenin the larger context, where the larger context includes the mass of effects on all mannerof outcomes. One counter-argument gets back to the gods and the genes: That once weuncover the hidden factors that are determining both education and everything else we do,we’ll feel silly for having imagined that any of this was within our control—in the provisionof education, for example.

Against this, I would argue that many social causative factors—especially those withmany effects—are operating at a level above the inter-individual variability that tends todominate our sense of what is causal and what is not. Education is in one sense embedded inthe social structure; but changes in education—especially the stock of education—are alsochanging the social structure. Mass education and fertility is a good example. Educationhas long been associated with declines in fertility. But how exactly? Caldwell’s (1982)theory of fertility and fertility decline brings into focus the social structural character ofthe factors first supporting high fertility, then precipitating its decline (Smith 1989, pp.172-173):

...Patriarchy is a social institution subsuming large numbers of women withinfamilies, families within kin groups, and kin groups within communes or vil-lages. The subordination of youth to their elders and of women to men is nota feature of particular households, families, or kin groups, but of the larger so-cial structure. Individual variation (deviance) is of little account when arrayed

38

Page 39: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Causation in Action

against the larger forces militating for conformity to essential behaviors, includ-ing fertility (Caldwell, 1982:172). When change comes, it comes not through thecollective exercise of individual choice, but through the collapse of a larger sys-tem that had heretofore constrained all choices of behavior open to individuals.Theories of modernization that concentrate on personality change are criticized,due to their implication that ”individuals could always have lived different waysof life by opting to do so, whether or not the needed economic and social institu-tions for the new way of life had yet come into existence” (Caldwell, 1982:280).A primary source of fertility change is mass education. Mass education cre-ates both educated (expensive, ungrateful, questioning) children and educatedwives. Educated wives reduce the net wealth flow from wife to husband (andmother-in-law and father-in-law), strengthen the bonds between husband andwife (undermining the traditional family structure and its morality), and seekto avoid repeated pregnancies and periods with infants. Education is easilymeasured at the individual level, and its incorporation into micro models offertility and fertility-related behaviors is on occasion justified with reference toCaldwell’s (1982) emphasis on education as a source of fertility decline. ButCaldwell unambiguously points to the macro properties of education: ”[T]heeducation of only half the community does not have the same effect on thathalf of the population, nor half the effect on the whole population” (1982:329).When there remain many in a community who have not attended school, strongforces maintaining the traditional family morality still abound. ”[T]he evidencesuggests that the most potent force for change is the breadth of education (theproportion of the community receiving some schooling) rather than the depth(the average duration of schooling among those who have attended school).”

Causation and Action

If you are reading about John Snow and cholera, you are probably not studying cholera.More likely, you are being instructed regarding what causal inference in good observationalscience looks like (e.g., Fisher 1958, pp. 156-157; Freedman 1991, pp. 294-299). Everwilling to ape my betters, I have adduced Snow’s admirable studies of cholera in supportof a point that strikes me as basic but nevertheless goes against the grain of contemporaryresearch habits: If you have a nice estimate of an effect based on random assignment orcredible as-if-by-random assignment (Snow’s case [Smith 2013, p. 50]), do you really needto understand the intervening process to have established causation? I think not (Smith2013, pp. 60-63), and am buoyed in seeing that Hill (1965, p. 298) had also used Snow asan example in getting to the same point (albeit half a century earlier):

Before deducing ‘causation’ and taking action we shall not invariably have tosit around awaiting the results of that research. The whole chain may have tobe unraveled or a few links may suffice. It will depend upon circumstances. (p.295)

Moreover, the social circumstances linking social conditions to specific outcomes, in-cluding health outcomes can be intransigent, in a way that is not well captured by their

39

Page 40: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Smith

elaboration in terms of individual-level intervening health behaviors and biological charac-teristics. “[F]undamental causes can defy efforts to eliminate their effects when attempts todo so focus solely on the mechanisms that happen to link them to a disease in a particularsituation” (Link and Phelan 1995, p. 81). This is reflected in the enduring individual-levelassociation between socioeconomic status, including education, and health. How does thishappen?

...[T]he association between a fundamental cause can be preserved throughchanges either in the mechanisms or in the outcomes...[S]ome causes...“basiccauses,” have enduring effects on a dependent variable because, when the ef-fect of one mechanism declines, the effect of another emerges or becomes moreprominent. (Link and Phelan 1995, p. 87; my emphasis)

Why does this happen? Social structures are about nothing if not the differential allocationof scarce resources, both for their intrinsic value and for the maintenance of status hier-archies that will allow for differential allocation in the future, as new outcomes of interestarise. In the case of social status and health,

...[T]he essential feature of fundamental social causes...is that they involve ac-cess to resources that can be used to avoid risks or to minimize the consequencesof disease once it occurs...[R]esources...include money, knowledge, power, pres-tige, and the kinds of interpersonal resources embodied in the concepts of socialsupport and social networks. Variables like SES, social networks, and stigma-tization are used...to directly assess these resources and are therefore especiallyobvious as potential fundamental causes. However, other variables...such asrace/ethnicity and gender...are so closely tied to resources like money, power,prestige, and/or social connectedness that they should be considered as poten-tial fundamental causes of disease as well (Link and Phelan 1995, p.87)

Because the social structures that we create have an internal logic that transcends thetemporal correlations that they create with so-called intervening variables, action orientedtoward those intervening variables alone may not have the anticipated outcomes.

Knowing more is better, and elaborating the process leading from some factor to anoutcome of interest is a hallmark of science. But it is not necessarily a hallmark of causation,especially inasmuch as causation is understood as what would or will happen if one wereactually to do something (e.g. Hill 1965, p. 300). The distinction has been made by Holland(2008, p. 99):

One of the problems of communication between social scientists and policy mak-ers is related to the distinction I make between assessing effects and describingmechanisms. Understanding some aspect of a causal mechanism often advancesscience (i.e., theory), whereas the needs of public policy often require an answerthat assesses the effects of an intervention, rather than reasons or speculationsas to how these effects come about. If class size reduction results in better stu-dent learning, a policy maker might argue that it does not matter if this effectis due to more time for individualized instruction, fewer classroom disruptions,

40

Page 41: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Causation in Action

or something else. On the other hand, the mechanism might matter to thepolicy maker if other reform policies besides class size reduction are of interest.Knowledge of the causal mechanism could indicate that other policies would besupportive or possibly contraindicated when classes are small. My view is thatboth positions need to be clearly delineated and not confused with each other.

This confusion of purpose stares at us on the first pages of one of the best texts conveyingthe modern armamentarium for causal analysis within economics:

A causal relationship is useful for making predictions about the consequencesof changing circumstances or policies; it tells us what would happen in alter-native (or “counterfactual”) worlds. For example, as part of a research agendainvestigating human productivity capacity—what labor economists call humancapital—we have both investigated the causal effect of schooling on wages. . . .The causal effect of schooling on wages is the increment to wages an individualwould receive if he or she got more schooling. A range of studies suggest thecausal effect of a college degree is about 40 percent higher wages on average,quite a payoff. The causal effect of schooling on wages is useful for predictingthe earnings consequences of, say, changing the costs of attending college, orstrengthening compulsory attendance laws. (Angrist and Pischke 2009, pp.3-4)

I am not contending that estimates of the “the causal effect of schooling on wages” are useless“for predicting the earnings consequences of, say, changing the costs of attending college.”They are definitely useful, and it is possible to integrate formally the two perspectives. Toddand Wolpin (2006) is an admirable example, based on some serious theory (Wolpin 2013).On the ground, I think that we are far closer to the confusion that Holland (2008, p. 99)describes. In seminar rooms, researchers present and debate the fine points of soi-disantcausal analysis. With their authority duly established by dint of having wrangled somemicro-level observational data into the as-if-by-random-assignment computational frame,they soon weigh in on actions, which are virtually always policy prescriptions for alteringchoice sets, not the micro factors involved in the preceding causal estimation demonstration.

The focus on establishing “causation” as a form of legitimation (Holland 2008, p. 101)without keeping the action orientation in mind can lead us to strange places. For example:In the United States, lists of eligible voters are publicly available, so that one can observewho actually voted in a given election (if not for whom they voted, individually). In 2004a team of political scientists obtained files of the electoral rolls in Illinois—more than sevenmillion names—including information on their addresses, telephone numbers, and demo-graphic characteristics including their sex and age, along with their histories of electoralparticipation (Arceneaux, Gerber, and Green 2010). They eliminated very large householdsand those without telephone numbers (talk about a bygone era!) This led to a file of 2.7million households with at least one eligible voter (and less than five). They then tooka random sample of 16,000 potential electors (only one per household) and tried to reachthem by telephone to encourage them to vote in an upcoming election. Only 41% of thepotential voters sampled could be reached by phone, and the researchers were concernedthat the sort of folks who still pick up a telephone no matter who is calling are also themost likely to vote following a get-out-to-vote call.

41

Page 42: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Smith

They thus sought to estimate a treatment-on-treated effect, “the causal effect of a phonecall among those who are reachable” (Arceneaux et al. 2010, p. 260). They were well awarethat a control group—even one carefully selected from among the possible voters who werenot called but who might be matched on observables (age, voting history, etc.) to those whowere reached and hence had been encouraged to vote—would be mixing up the polite folkswho feel obliged to pick up the phone when it rings with those who don’t trust unsolicitedphone calls, are rarely at home (remember: a bygone era!), and/or who didn’t specify a homephone number. Therefore, to evaluate the effects of a get-out-the-vote phone call on thosewho did in fact receive one (they picked up the phone, they got the message), Arceneauxet al. (2010) did an analysis via two-stage least-squares, where the random selection forpossible contact served as the instrumental variable in a regression of subsequent vote (wentto the polls or did not) on whether contact was actually made (yes or no). It turns outthat (a) the probability of going to vote increased by 2% as a result of receiving (implyinganswering) an encouragement-to-vote phone call; and as suspected, (b) the people whotended to answer such a phone call are also the kind of people who go out to vote, whetherthey are called or not.

Arceneaux et al. (2010) were interested in demonstrating the insufficiency of a differentpotential method—matching on observables, as described above—for estimating their pre-ferred parameter, the effect of the treatment on the treated. Agreed: Matching doubtlessdoes not control for an important unobservable, the tendency to pick up the telephone whenit rings. In comparison, the two-stage estimator gives an unbiased estimate of the effect ofcalling among those who pick up the phone. In that sense it is a preferable causal estimator.But suppose you are a party worker or other campaign operative who is interested in findingmore votes: The effect of a phone call on those who answer the phone is not your primaryinterest. You want to know what is going to happen when you inundate the registeredvoting population with telephone calls that will, in the main, go unanswered. The effecttouted and precisely measured by Arceneaux et al. (2010) does not accord well with theintervention that one could imagine making. In which case I think the tendency to putcausal estimation first and action not first is problematic, at least in the action-orientationsense of causation “What would happen if...?”

Social Structure as a Cause

A social structure is a relational system that exists not just in function of the individualspresent in a society and their characteristics (psychological, genetic, social and otherwise),but of a set of social roles that tend to persist (or change) independent to great degree ofthe characteristics of the incumbents of roles. It includes power relations, norms, and habitsof mind that weigh on opportunities and behavior.

This can sound imprecise and unscientific, certainly relative to measures of observableson individuals, such as alleles, levels of education, criminal records, etc. – maybe evenpeople’s sex and race – that figure in most statistics-based treatments of cause and effect.For example,

At the class certification hearing in federal district court, Plaintiffs’ sociologicalexpert witness testified regarding his “social framework analysis” of Wal-Mart’s“culture” and personnel practices, and concluded that the company was “vul-

42

Page 43: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Causation in Action

nerable” to gender discrimination. The reasoning here was from the general– that of Wal-Mart’s “strong corporate culture” – to the specific – that Wal-Mart discriminated against its women employees as a consequence... (Dawid,Faigman, and Fienberg 2014, p. 380).

The many terms placed inside quotation marks speaks volumes: Dawid et al. (2014) werenot fans of this form of analysis (nor was the court).

I cannot speak for the legal issues in general, nor about this case in particular, butI would not be so quick to dismiss the validity of the concepts—that is, their reality asdepictions of essential aspects (yes, in the causative sense) of the organization in which thesewomen were working. Imagine we were to describe the sort of gender system within whichwe are living, I as the writer, you as the readers. Unless the readership of ObservationalStudies is substantially different from what I imagine it to be, the rules of engagement,the expectations, the opportunities, the behaviors, and the relationships (in every sense ofthe word), not just between men and women, but between women and other women, menand other men—not to mention the creation of gender identities not bound to these binarycriteria—well, I suspect that they would preclude if not sanction placing in our papers ahomily of the form:

If your wife ran off with the lodger last week you still have to take your perforatedulcer to hospital without delay. But with a hernia you might prefer to stay homefor a while – to mourn (or celebrate) the event. (Hill 1965, p. 296)

By which I do not mean to impugn Sir Austin Bradford Hill. He, like most of us, was aman of his times with respect to many things, in addition to being farsighted intellectuallyin the ways we celebrate in revisiting his work.

I do, however, mean to illustrate why something along the lines of a “strong [XXXX]culture” might look like, what it might feel like, and how it might be perpetuated, even(especially) by those who mean no harm. The modern statistical and philosophical litera-tures on causation have long been stuck on how best to capture the “effects” of race and sex(or gender) – what sociologists call the ascriptive aspects of individuals. (Education andincome, in contrast, are statuses that one putatively achieves). I would have thought thatscholars would have moved past the idea that racial characteristics ascribed to individualswere in any sense a cause of what might or might not be happening to them (Smith 2003, p.465), but I would have been wrong (cf. Marcellesi 2013). Yet even reframing the problemso that the subjects who are causing things are the units across whom race inputs are beingmeasured (e.g. Pager 2003; Greiner and Rubin 2011) does not get at the social structurethat is in a real sense causing the situation:

[A]fter a society becomes racialized, racialization develops a life of its own. Al-though it interacts with class and gender structurations in the social system,it becomes an organizing principle of social relations in itself...Race, as mostanalysts suggest, is a social construct, but that construct, like class and gender,has independent effects in social life. After racial stratification is established,race becomes an independent criterion for vertical hierarchy in society. There-fore different races experience positions of subordination and superordination insociety and develop different interests. (Bonilla-Silva 1997, p. 475)

43

Page 44: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Smith

The idea of racism without racists (Bonilla-Silva 2013) undercuts the continual focus onrace as a causal property of individuals, because what one is dealing with are the pervasive,interrelated effects of a social structure with any number of taken-for-granted roles andsocial relations. Perhaps in another social world, the set of experiences would be verydifferent:

If race is not a causal variable, how do we analyze issues of social discriminationin causal terms, if at all? We certainly do think of racial discrimination in causalterms because many of us think racial discrimination is something that couldbe changed, reduced, or in some way altered. There are those who dream of aday when racial discrimination is a thing of the past and long forgotten. Whatis it that has to change? Certainly not the color of people’s skin or some otherphysical characteristic. Clearly discrimination is a social phenomenon, one thatis learned; it is taught and fostered by a social system in which it plays a complexpart. When we envision a world without racial discrimination we thus envisionit as a whole social system that must be different in a variety of ways from whatwe see before us. One almost has to envision a parallel world, so to speak, inwhich things are so different that what we recognize in our own world as racialdiscrimination does not exist in this other parallel world... (Holland 2008, p.102)

Coda

I may be straying into dogmatism, which would be unfortunate, because this is preciselythe point: Our interest in causation could use far less dogmatism, far less sense that thereare some principles that are intrinsically more important than others in understanding oneaspect of how the world works: i.e., what might we expect to happen if we did somethingor another? I read Hill (1965) as a reminder of what that sensibility might look like.

References

Angrist, Joshua D., and Jorn-Steffen Pischke (2009). Mostly Harmless Econometrics.Princeton: Princeton University Press.

Arceneaux, Kevin, Alan Gerber, and Donald P. Green (2010). “A Cautionary Note onthe Use of Matching to Estimate Causal Effects: An Empirical Example ComparingMatching Estimates to an Experimental Benchmark.” Sociological Methods & Research,39(2):256–282. https://doi.org/10.1177%2F0049124110378098

Bearman, Peter (2008). “Introduction: Exploring Genetics and Social Structure.” AmericanJournal of Sociology, 114(S1):v-x. https://doi.org/10.1086/596596

Bonilla-Silva, Eduardo (1997). “Rethinking Racism: Toward a Structural Interpretation.”American Sociological Review, 62(3):465-480. https://www.jstor.org/stable/2657316

Bonilla-Silva, Eduardo (2013). Racism without Racists: Color-Blind Racism and the Per-sistence of Racial Inequality in America (4th ed.) Lanham, MD: Rowman and Littlefield.

Caldwell, John C (1982). Theory of Fertility Decline. New York: Academic Press.

44

Page 45: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Causation in Action

Cutler, David M., and Adriana Lleras-Muney (2010). “Understanding Differences in HealthBehaviors by Education.” Journal of Health Economics, 29(1):1-28. https://doi.org/

10.1016/j.jhealeco.2009.10.003

Davis, James A. (1982). “Achievement Variables and Class Cultures: Family, Schooling,Job, and Forty-Nine Dependent Variables in the Cumulative GSS.” American SociologicalReview, 47(5):569-586. https://www.jstor.org/stable/2095159

Dawid, Philip A., David L. Faigman, and Stephen E. Fienberg (2014). “Fitting Science intoLegal Contexts: Assessing Effects of Causes or Causes of Effects?” Sociological Methods& Research, 43(3):359-390. https://doi.org/10.1177%2F0049124113515188

DiPrete, Thomas A., and Markus Gangl (2004). “Assessing Bias in the Estimation ofCausal Effects: Rosenbaum Bounds on Matching Estimators and Instrumental VariablesEstimation with Imperfect Instruments.” Sociological Methodology, 34:271-310. https:

//doi.org/10.1111%2Fj.0081-1750.2004.00154.x

Fenelon, Andrew and Samuel H. Preston (2012). “Estimating Smoking–Attributable Mor-tality in the United States.” Demography, 49(3)797-818. https://doi.org/10.1007/

s13524-012-0108-x

Fisher, Sir Ronald (1958). “Cigarettes, Cancer, and Statistics.” The Centennial Review ofArts & Science, 2:151–166. www.jstor.org/stable/23737529

Freedman, David A. (1991). “Statistical Models and Shoe Leather.” Sociological Methodol-ogy, 21:291-213. http://www.jstor.org/stable/270939

Greiner, James, and Donald Rubin (2011). “Causal Effects of Perceived Immutable Char-acteristics.” Review of Economics and Statistics, 93(3):775–85. https://doi.org/10.

1162/REST_a_00110

Guo, Guang, Yuying Tong, and Tianji Cai (2008). “Gene by Social Context Interactions forNumber of Sexual Partners among White Male Youths: Genetics-Informed Sociology.”American Journal of Sociology, 114(S1:S36-S66. https://doi.org/10.1086/592207

Hill, Sir Austin Bradford (1965). “The Environment and Disease: Association or Causa-tion?” Proceedings of the Royal Society of Medicine, 58(5):295-300. https://journals.sagepub.com/doi/pdf/10.1177/003591576505800503

Holland, Paul W. (1986). “Statistics and Causal Inference.” Journal of the American Sta-tistical Association, 81(396):945-960. http://www.jstor.org/stable/2289064

Holland, Paul W. (2008). “Causation and Race.” pp. 93-109 in White Logic, White Methods:Racism and Methodology, edited by Tukufu Zuberi and Eduardo Bonilla-Silva. Lanham,MD: Rowman & Littlefield.

Link, Bruce G., and Jo Phelan (1995). “Social Conditions As Fundamental Causes ofDisease.” Journal of Health and Social Behavior 35 (Extra Issue): 80-94. http://www.

jstor.org/stable/2626958

Marcellesi, Alexandre (2013). “Is Race a Cause?” Philosophy of Science, 80(5):650-659.https://www.jstor.org/stable/10.1086/673721

Martin, Molly A. (2008). “The Intergenerational Correlation in Weight: How Genetic Re-semblance Reveals the Social Role of Families.” American Journal of Sociology 114(S1):S67-S105. https://doi.org/10.1086/592203

Morgan, Stephen L. (ed.) (2013). Handbook of Causal Analysis for Social Research, editedby Stephen L. Morgan. Dortrecht: Springer.

45

Page 46: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Smith

Pager, Devah (2003). “The Mark of a Criminal Record.” American Journal of Sociology,108(5):937-975. https://doi.org/10.1086/374403

Penner, Andrew M. (2008). “Gender Differences in Extreme Mathematical Achievement:An International Perspective on Biological and Social Factors.” American Journal ofSociology, 114(S1):S138-S170. https://doi.org/10.1086/589252

Pescosolido, Bernice A., Brea L. Perry, J. Scott Long, Jack K. Martin, John I. Nurnberger,Jr., and Victor Hesselbrock (2008). “Under the Influence of Genetics: How Transdisci-plinarity Leads Us to Rethink Social Pathways to Illness.” American Journal of Sociology,114(S1):S171-S201. https://doi.org/10.1086/592209

Romilly, Jacqueline (de). [1984] 1994. “Patience mon coeur !” L’essor de la psychologiedans la litterature grecque classique. Paris: Pocket.

Rosenbaum, Paul R. (1984). “From Association to Causation in Observational Studies:The Role of Tests of Strongly Ignorable Tests of Treatment Assignment.” Journal ofthe American Statistical Association, 79(385):41-48. https://www.jstor.org/stable/

2288332

Rosenbaum, Paul, R. (1991). “Discussing Hidden Bias in Observational Studies.” Annals ofInternal Medicine 115(11):901-905. https://doi.org/10.7326/0003-4819-115-11-901

Rosenberg, Morris (1968). The Logic of Survey Analysis. New York: Basic Books.Rothstein, Jesse (2010). “Teacher Quality in Educational Production: Tracking, Decay, and

Student Achievement.” The Quarterly Journal of Economics, 125(1):175–214. https:

//doi.org/10.1162/qjec.2010.125.1.175

Rubin, Donald B. (1974). “Estimating Causal Effects of Treatments in Randomized andNonrandomized Studies.” Journal of Educational Psychology, 66(5):688-701. http://

dx.doi.org/10.1037/h0037350

Rubin, Donald B. (2005). “Causal Inference Using Potential Outcomes: Design, Modeling,Decisions.” Journal of the American Statistical Association, 100(469):322-331. https:

//doi.org/10.1198/016214504000001880

Smith, Herbert L. (1989). “Integrating Theory and Research on the Institutional Deter-minants of Fertility,” Demography, 26(2):171-184. http://www.jstor.org/page/info/

about/policies/terms.jsp

Smith, Herbert L. (1990). “Specification Issues in Experimental and Nonexperimental So-cial Research.” Sociological Methodology, 20:59-91. https://www.jstor.org/stable/

271082

Smith, Herbert L. (2003). “Some Thoughts on Causation as It Relates to Demographyand Population Studies.” Population and Development Review, 29(3):459-469. http:

//www.jstor.org/stable/3115284

Smith, Herbert L. (2013). “Research Design: Toward a Realistic Role for Causal Analysis.”Chapter 4 (pp. 45-73) in Handbook of Causal Analysis for Social Research, edited byStephen L. Morgan. Dortrecht: Springer.

Todd, Petra E., and Kenneth I. Wolpin (2006). “Assessing the Impact of a School SubsidyProgram in Mexico: Using a Social Experiment to Validate a Dynamic Behavioral Modelof Child Schooling and Fertility.” American Economic Review, 96(5):1384-1417. https:

//www.jstor.org/stable/30034980

Wolpin, Kenneth I. (2013). The Limits of Inference without Theory. Cambridge, MA: MITPress.

46

Page 47: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 47-54 Submitted 9/19; Published 1/20

Hill’s Causal Considerations and the Potential OutcomesFramework

Tyler J. VanderWeele [email protected]

Department of Epidemiology and Department of Biostatistics

Harvard University

Boston, MA, USA 02115

Hill’s considerations for assessing causality (Hill, 1965) are arguably still important to-day, as indeed they were at the time of writing. The formalization of causal inference has,however, advanced considerably since Hill’s paper. The potential outcomes or quantitativecounterfactual framework (Rubin, 1974; Rosenbaum and Rubin, 1983a; Robins, 1986; Im-bens and Rubin, 2015; Hernan and Robins, 2019) has clarified and made more precise ourreasoning about causation and the underlying assumptions for much of causal inference.It has also allowed for dramatic expansion of the types of causal questions that can beaddressed (Angist et al., 1995; Robins et al., 2000; Pearl, 2009; Hudgens and Halloran,2008; Imai et al., 2010; Tchetgen Tchetgen and VanderWeele, 2012; VanderWeele, 2015).And it has moreover, I will argue below, altered how evidence concerning causation is oftenassessed in practice.

Within the potential outcomes framework, in its most basic form, we can let Y (a)denote the potential outcome or counterfactual outcome that would have been observedfor an individual if the exposure A had, possibly contrary to fact, been set to level a. Welet Y (1) − Y (0) denote the causal effect for an individual i.e. the difference in potentialoutcomes for that individual. We say that the covariates C suffice to control for confounding(or that “exchangeability holds conditional on C” or “ignorability conditional on C”) ifthe counterfactuals Y (a) are independent of A conditional on C i.e. if within strata ofC, the group that actually had exposure status A = a is representative of what wouldhave occurred had the entire group with C = c been given exposure A = a. We saythat the assumption of consistency holds if whenever the actual exposure A equal levela, then Y (a) = Y . Under these assumptions of exchangeability and consistency we havethat the average causal effect conditional on C = c is given by E[Y (1)|c]–E[Y (0)|c] =E[Y (1)|A = 1, c]–E[Y (0)|A = 0, c] = E[Y |A = 1, c]–E[Y |A = 0, c], where the first equalityholds by exchangeability and the second by consistency. This final expression is just theassociation between exposure A and outcome Y on a difference scale. Under the assumptionsof exchangeability and consistency, this association is equal to the conditional causal effect;under these assumptions, association implies causation.

The potential outcomes framework thus effectively employs deductive logic in its conclu-sion about causation. In contrast, as will be discussed further below, Hill’s considerations orviewpoints, approach causation from an inductive approach. In this commentary, I will offersome discussion on how the potential outcomes framework has contributed to, or clarified,each of Hill’s considerations, how it contrasts with those considerations by using deduc-

c©2020 Tyler VanderWeele.

Page 48: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

VanderWeele

tive rather than inductive inference, and how it has thus also altered approaches to causalreasoning in practice.

1. Strength

Hill noted that, other things being equal, the greater the strength of associationbetween exposure and outcome, the more plausible that the relationship is causal.The underlying reasoning is that a stronger association is more difficult to entirelyattribute to explanations other than causality. This intuitive notion has been formal-ized through various sensitivity analysis approaches (Rosenbaum and Rubin, 1983b;Imbens, 2003; Hsu and Small, 2013). Indeed sensitivity analysis to assess sensitivityor robustness to unmeasured confounding has been around since before Hill’s paper(Cornfield et al., 1959). However, these approaches to sensitivity analysis have ex-panded dramatically since the introduction of the potential outcomes framework toobservational studies, and have been extended to address other possible biases such asmeasurement error and selection bias (Greenland, 2005; Rothman et al., 2008; Lash etal., 2009). For some of these techniques, the relevant sensitivity analysis parametersthat are required to explain away an association can be expressed as simple mono-tonic functions of the magnitude of the association between the exposure and theoutcome (VanderWeele and Ding, 2017; Smith and VanderWeele, 2019; VanderWeeleand Li, 2019). Such techniques reinforce the contention that strength of associationis a critical consideration in assessing evidence for causation.

2. Consistency

Hill devoted considerable discussion to consistency, to arriving at the same conclusionrepeatedly by different persons, in different places, circumstances, and times, and us-ing different designs. Meta-analysis and systematic review is of course often usefulin accumulating evidence over different persons, places, times and circumstances andmuch of the statistical development of meta-analysis proceeded independent of the po-tential outcomes framework. More recently, however, methods have been developedthat employ the potential outcomes framework to combine meta-analytic approachesassessing consistency over studies, with the possibility of unmeasured confounding inthose studies (Mathur and VanderWeele, 2019). Such approaches effectively tackleboth the first and the second of Hill’s considerations simultaneously. In a differentcontext, recent work on evidence factors provides another quantitative approach toassess robustness of evidence from different sources of data (Rosenbaum, 2010, 2017).Perhaps an even more profound contribution from the potential outcomes frameworkto Hill’s consideration of consistency arises from his point about consistency of conclu-sions across different designs. Hill explicitly references prospective and retrospectivestudies; however he also makes mention, in his discussion of nickel refiners, of whatmight be referred to today as an interrupted time series design. Indeed, the number ofdesigns that employ assumptions other than “control for confounding by covariates”to assess causation has grown dramatically, often motivated by, grounded in, or madeprecise through the potential outcomes framework. These include instrumental vari-able approaches including Mendelian randomization, regression discontinuity designs,difference-in-difference methods, and interrupted time series designs (Angist et al.,

48

Page 49: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill and the Potential Outcomes Framework

1995; Didelez and Sheehan, 2007; Angrist and Pischke, 2009; Morgan and Winship,2015). Each of these employ different assumptions and consistency of evidence acrossthese different designs will render more plausible the conclusion of causality. Therehas been recent discussion of this as a process of “triangulation” (Lawlor et al., 2016).However, it should also be noted that each of these designs also often identifies adifferent causal effect, for different subpopulations, and thus discrepant results acrossthese different study designs do not necessarily imply a contradiction, or that oneof them is wrong, and thus, in words of Hill “different results of a different inquirycertainly cannot be held to refute the original evidence.” Nevertheless, the potentialoutcomes framework has arguably dramatically expanded the different ways to lookfor consistency of evidence, and has also clarified when discrepant results are or arenot problematic.

3. Specificity

Hill indicated that if the association with an exposure is specific to a particular out-come (or is considerably stronger for that outcome) then this may render a causalconclusion more plausible. Recent work on negative controls has formalized this no-tion within a potential outcomes framework (Tchetgen Tchetgen, 2013) and such moreformal quantitative approaches to leveraging specificity, while perhaps not yet widelyused, could potentially contribute a great deal to our causal reasoning. However, aswith consistency, so also with specificity, Hill notes, “if it is not apparent, we are notthereby necessarily left sitting irresolutely on the fence.”

4. Temporality

Causes of course must precede effects. However, Hill’s discussion of the temporalityconsideration in fact concerns ruling out the possibility of reverse causation i.e. thepossibility that the supposed outcome might in fact precede and subsequently alterthe variable taken as the exposure. This temporality or “reverse-causation” consider-ation is mirrored in calls within the potential outcomes literature to use longitudinaldata to assess evidence for causation in observational studies and to control for base-line outcome to rule out that possibility of reverse causation (VanderWeele et al.,2016). Such control will often effectively be needed to ensure the “no-confounding” or“ignorability” assumption within the potential outcome approach does indeed hold.

5. Biological Gradient

Hill argues that in some cases a causal relationship may be more plausible if there is amonotonic dose-response relationship. Some discussion of when such dose-response re-lationships may render evidence for causality more plausible has been formalized usingpotential outcomes (Rosenbaum, 2003). However, here too it must be acknowledgedthat the absence of a monotonic dose-response relationship is not necessarily evidenceagainst causality. Causal relationships can plausibly be U-shaped, for example.

6. Plausibility

Hill writes “It will be helpful if the causation we suspect is biologically plausible.”However, Hill does not place a great deal of weight on this consideration in his dis-cussion because it is entirely possible that the relevant biological knowledge may be

49

Page 50: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

VanderWeele

lacking. Certainly one consideration with regard to biological plausibility concernsknowledge of the mechanisms. The potential outcomes framework has supplied a setof concepts and methods, and has clarified the underlying assumptions, to quanti-tatively assess the relative importance of various proposed mechanisms (Robins andGreenland, 1992; Pearl, 2009; Imai et al., 2010; VanderWeele, 2015), or, said anotherway, to assess the extent to which each potential mechanism might mediate the effect.Unfortunately, the assumptions required by these approaches are in fact generallyconsiderably stronger than those required to establish causation between the expo-sure and the outcome itself. These strong assumptions are needed for quantification.However, from a more qualitative perspective the logic of mediation may still be ofhelp for contributing biological plausibility: if we have evidence that the exposure al-ters particular mechanisms, and also separate evidence that the mechanisms alter theoutcome, then this renders more plausible that the exposure itself alters the outcome(VanderWeele, 2015).

7. Coherence

Hill places some emphasis on the coherence with existing knowledge, that “the cause-and-effect interpretation of our data should not seriously conflict with the generallyknown facts.” This is of course an important principle. It is perhaps one that hasnot as often been explicitly and quantitatively exploited within the counterfactualframework. However, consideration of “triangulation” may come into play here aswell (Lawlor et al., 2016).

8. Experiment

Of course experimental evidence from a trial randomizing the exposure can contributesubstantially to the case for causality. The potential outcomes framework helpx makeclear the logic, but more profoundly, the theory and methods that have developedfrom the potential outcomes framework has been useful in evaluating the strengthof evidence from randomized trials when something goes wrong e.g. when there isnon-compliance, or attrition, or censoring due to death (Robins, 1986; Angrist et al.,1995; Scharfstein et al., 1999; Frangakis and Rubin, 2002).

9. Analogy

Hill’s consideration of analogy (e.g. that when we have evidence for a causal effect ofone exposure of some class on an outcome of a particular class, it may render causal-ity more plausible between other members of the same exposure class and outcomeclass) does not have a direct counterpart within the potential outcomes framework ofwhich I am aware. There are perhaps some indirect resonances with certain questionsconcerning surrogacy. For example, when it is known from randomized trials that aparticular drug has beneficial effects on a health outcome, then when is it reasonableto draw conclusions using a randomized trial for the effect of a different drug, but inthe same drug class, if data is only available on a surrogate, rather than the primaryoutcome of interest? Such questions have been formalized using counterfactual ap-proaches (Frangakis and Rubin, 2002; Gilbert and Hudgens, 2008; Joffe and Green,2009; VanderWeele, 2013), but there is almost always some reasoning by analogy in

50

Page 51: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill and the Potential Outcomes Framework

such cases when these approaches are employed in practice. Such reasoning by anal-ogy can of course go wrong, in general, and of course with surrogacy as well, and therehave been a number of disastrous failures concerning such reasoning by analogy withsurrogates (Moore, 1995; Fleming and DeMets, 1996). Caution must be exercised.

Inductive vs. Deductive Reasoning

Hill’s conclusion is that these “are nine different viewpoints from all of which we shouldstudy association before we cry causation.” He believes that “None of my nine viewpointscan bring indisputable evidence for or against the cause-and-effect hypothesis and none canbe required as a sine qua non.” Hill’s considerations for assessing evidence for causationessentially relies on inductive reasoning. Hill states that the “fundamental question” iswhether there is “any other way of explaining the set of facts before us, is there any otheranswer equally, or more likely than cause and effect?” and he believes the nine viewpointsare helpful in this type of inductive reasoning. As I have argued above, I believe theseviewpoints are indeed still helpful from an inductive standpoint.

The potential outcomes framework, in contrast, follows a deductive form of logic to reachthe conclusion of causation. Under the potential outcomes framework, under a certain setof assumptions (e.g. sometimes stated as “exchangeability, consistency, and positivity”(Hernan and Robins, 2019) or as “stable unit treatment value assumption and ignorabletreatment assignment” (Imbens and Rubin, 2015)), then any actual association impliescausation. While the potential outcomes framework in principle employs deductive logicto reach the conclusion of causation, in practice, there are still inductive elements. Weare never sure that the premises hold. In fact, we often are sure that they do not exactlyhold and must use sensitivity analysis to assess whether our conclusions are robust toviolations. We also often cannot definitively conclude that an association is present; rather,we use statistical inference to assess the evidence for an association. We are still left onlywith evidence, and must evaluate whether, and to what extent, that evidence justifiesa conclusion about causation. There is thus an inductive element to the application ofthe potential outcomes approach in practice. Nevertheless, the deductive form of logicunderlying the potential outcomes approach itself simplifies the inductive process. Becausewe understand the logic relating premises to conclusions, we can carefully evaluate eachpremise. Rather than having to rule out all possible alternative explanations to a causalrelation that either we, or someone else, may come up with, we instead can focus morenarrowly on each premise that is used by the potential outcomes framework to deducecausation and can carefully evaluate these premises. We can often do so quantitatively bymeans of sensitivity analysis (Rosenbaum and Rubin, 1983b; Imbens, 2003; Hsu and Small,2013). This can sometimes lend considerable confidence to the conclusions that are drawn,and can also indicate when caution may be needed. However, when such sensitivity analysesfor unmeasured confounding, and measurement error, and selection bias (Greenland, 2005;Rothman et al., 2008; Lash et al., 2009) do indicate substantial robustness, the confidencewith which we can draw causal conclusions may often be considerably greater than if we hadhad, in a more purely inductive approach, to consider every other non-causal alternativeexplanation for an association.

51

Page 52: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

VanderWeele

It is my view that it is this move of only needing to evaluate the premises imbedded inthe potential outcomes framework’s logic to draw causal conclusions that has altered, formany, the form of reasoning and the types of evidence considered when assessing causalclaims in practice. The focus has shifted from Hill’s considerations to the premises charac-teristic of potential outcomes logic. It has simplified our reasoning. This has been, in myview, an advance, and has resulted in real changes in practice with regard to how causalevidence is assessed. Nevertheless, with simplified reasoning can come also the danger ofcomplacency. Because the deductive logic is relatively straightforward, we may be temptedto no longer even carefully evaluate the premises. Hill’s considerations can still be helpfultoday in countering that complacency, by looking at the association through a different,more inductive, lens and thinking creatively about what alternative explanations might be.However, if we return to the potential outcomes framework itself, then I think the centraltool in confronting complacency over the assumptions is sensitivity analysis. As describedabove, sensitivity analysis is the principal means by which the premises fundamental to thepotential outcomes approach can be evaluated. As I have written elsewhere (VanderWeeleand Ding, 2017; VanderWeele et al., 2019), I believe these should be routine in analysis ofobservational data and should always take place when such data is being used to argue fora conclusion of causality. In attempt to help facilitate this, I have elsewhere proposed aset of extremely simple easy-to-use sensitivity techniques that do not require sophisticatedsoftware resources or a more technical background in statistics and that are applicable tounmeasured confounding, selection bias, and measurement error (VanderWeele and Ding,2017; Smith and VanderWeele, 2019; VanderWeele and Li, 2019); they can effectively beimplemented by hand, and are relatively easy to interpret. There are many other goodsensitivity analysis techniques available, but these sometimes require considerably moreeffort to implement. My hope is that these simple techniques leave empirical researcherswithout any excuse for not carrying out what is at least a very simple sensitivity analysis.Such sensitivity analysis, in conjunction with the simple deductive logic of the potentialoutcomes framework, can, at least sometimes, establish very strong evidence indeed for acausal conclusion.

References

Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects usinginstrumental variables (with discussion). Journal of the American Statistical Association,91:444–472.

Angrist JD, and Pischke J-S. (2009). Mostly Harmless Econometrics: An Empiricist’sCompanion. Princeton University Press.

Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B., andWynder, L. L. (1959). Smoking and lung cancer: Recent evidence and a discussion ofsome questions. Journal of the National Cancer Institute, 22:173–203.

Didelez, V., and Sheehan, N. (2007). Mendelian randomization as an instrumental variableapproach to causal inference. Statistical Methods in Medical Research, 16(4), 309-330.

Fleming, T. R. and DeMets, D. L. (1996). Surrogate end points in clinical trials: Are webeing misled? Annals of Internal Medicine, 125, 606–613.

52

Page 53: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill and the Potential Outcomes Framework

Frangakis, C. E., and Rubin, D. B. (2002). Principal stratification in causal inference.Biometrics, 58(1), 21-29.

Gilbert, P. B., and Hudgens, M. G. (2008). Evaluating candidate principal surrogate end-points. Biometrics, 64(4), 1146-1154.

Greenland, S. (2005). Multiple-bias modeling for analysis of observational data (with dis-cussion). Journal of the Royal Statistical Society Series A, 168:267–308.

Hernan MA, Robins JM (2019). Causal Inference. Boca Raton: Chapman & Hall/CRC,forthcoming.

Hill, A. B. (1965). The environment and disease: association or causation? Proceedings ofthe Royal Society of Medicine, 295-300.

Hsu, J. Y., and Small, D. S. (2013). Calibrating sensitivity analyses to observed covariatesin observational studies. Biometrics, 69(4), 803-811.

Hudgens, M. G., and Halloran, M. E. (2008). Toward causal inference with interference.Journal of the American Statistical Association, 103(482), 832-842.

Imai, K., Keele, L., and Tingley, D. (2010). A general approach to causal mediation analysis.Psychological Methods, 15(4):309–334.

Imbens, G., and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and BiomedicalSciences: An Introduction. New York: Cambridge University Press.

Joffe, M. M., and Greene, T. (2009). Related causal frameworks for surrogate outcomes.Biometrics, 65(2), 530-538.

Lash, T.L., Fox, M.P., and Fink, A.K. (2009). Applying Quantitative Bias Analysis toEpidemiologic Data. New York: Spring.

Lawlor, D. A., Tilling, K., and Davey Smith, G. (2016). Triangulation in aetiologicalepidemiology. International Journal of Epidemiology, 45(6), 1866-1886.

Mathur, M. B., and VanderWeele, T. J. (2019). Sensitivity analysis for unmeasured con-founding in meta-analyses. Journal of the American Statistical Association, in press,https://doi.org/10.1080/01621459.2018.1529598

Moore, T. (1995). Deadly Medicine: Why Tens of Thousands of Patients Died in America’sWorst Drug Disaster. New York:Simon and Schuster.

Morgan, S. L., and Winship, C. (2015). Counterfactuals and Causal Inference. Cambridge,UK: Cambridge University Press, 2nd Edition.

Pearl, J. (2009). Causality: Models, Reasoning, and Inference. 2nd Edition. Cambridge:Cambridge University Press.

Robins, J. M. (1986). A new approach to causal inference in mortality studies with sus-tained exposure period – Application to control of the healthy worker survivor effect.Mathematical Modelling, 7:1393–1512.

Robins, J. M., and Greenland, S. (1992). Identifiability and exchangeability for direct andindirect effects. Epidemiology, 143-155.

Robins J.M., Hernan M.A. and Brumback B. (2000). Marginal structural models and causalinference in epidemiology. Epidemiology, 11:550-560.

Rosenbaum, P. R. (2003). Does a dose–response relationship reduce sensitivity to hiddenbias?. Biostatistics, 4(1):1-10.

Rosenbaum, P. R. (2010). Evidence factors in observational studies. Biometrika, 97(2),333-345.

53

Page 54: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

VanderWeele

Rosenbaum, P. R. (2017). The general structure of evidence factors in observational studies.Statistical Science, 32(4), 514-530.

Rosenbaum, P.R. and Rubin, D.B. (1983a). The central role of the propensity score inobservational studies for causal effects. Biometrika, 1983; 70:41-55.

Rosenbaum, P. R., and Rubin, D. B. (1983b). Assessing sensitivity to an unobserved binarycovariate in an observational study with binary outcome. Journal of the Royal StatisticalSociety: Series B (Methodological), 45(2), 212-218.

Rothman KJ, Greenland S, Lash TL. (2008) Modern Epidemiology. 3rd Edition. Lippincott.Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonran-

domized studies. Journal of Educational Psychology, 66 :688–701.Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. (1999). Adjusting for nonignorable

drop-out using semiparametric nonresponse models. Journal of the American StatisticalAssociation, 94(448), 1096-1120.

Smith, L. and VanderWeele, T.J. (2019). Bounding bias due to selection. Epidemiology,30(4), 509-516.

Tchetgen Tchetgen, E. (2013). The control outcome calibration approach for causal infer-ence with unobserved confounding. American Journal of Epidemiology, 179(5), 633-640.

Tchetgen Tchetgen, E. J. and VanderWeele, T. J. (2012). On causal inference in thepresence of interference. Statistical Methods in Medical Research – Special Issue on CausalInference, 21:55–75.

VanderWeele, T. J. (2013). Surrogate measures and consistent surrogates (with Disucssion).Biometrics, 69(3), 561-581.

VanderWeele, T.J. (2015). Explanation in Causal Inference: Methods for Mediation andInteraction. New York: Oxford University Press.

VanderWeele, T.J., Jackson, J.W., and Li, S. (2016). Causal inference and longitudinaldata: a case study of religion and mental health. Social Psychiatry and PsychiatricEpidemiology, 51:1457-1466.

VanderWeele, T.J. and Ding, P. (2017). Sensitivity analysis in observational research:introducing the E-value. Annals of Internal Medicine, 167:268-274.

VanderWeele, T.J. and Li, Y. (2019). Simple sensitivity analysis for differential measure-ment error. American Journal of Epidemiology, in press. https://doi.org/10.1093/

aje/kwz133

VanderWeele, T. J., Mathur, M. B., and Chen, Y. (2019). Outcome-wide longitudinaldesigns for causal inference: A new template for empirical studies. Statistical Science, inpress.

54

Page 55: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 6 (2020) 55-56 Submitted 2/19; Published 1/20

Standing on the shoulders of Austin Bradford Hill: Therefinement of “specificity” as a consideration in causal

inference

Noel S. Weiss [email protected]

Department of Epidemiology

University of Washington

Seattle, WA, USA 98195

I began my study of epidemiology in 1968, just a few years after Bradford Hill (and hisAmerican counterparts) outlined sets of guidelines to assist in inferring the causes of disease.At that time both Bradford Hill and the Americans had as a focus of their thinking theexample of cigarette smoking and lung cancer. Both invoked the notion of the specificityof the association: for example, the Americans argued (U.S. Public Health Service, 1964)that “on an intuitive basis...when a given characteristic is found to be associated with one,or at most a few diseases, then the evidence for a causal relationship is more convincing.”However, they also acknowledged that “it would not be surprising to find that the diversesubstances in tobacco smoke could produce more than a single disease”, and Bradford Hill(Hill, 1965) said that “we must not overemphasize the importance of the characteristic [i.e.specificity].” So, while the notion of specificity as a consideration when seeking to infercause and effect was mentioned by Bradford Hill in his 1965 article, that mention camewith qualifications.

After completing my academic training in epidemiology, I sought a career in academia,and was fortunate to find a position at the University of Washington where I could immedi-ately (and for the next 45 years!) indulge my passion for the teaching of the principles andmethods of epidemiology. For a number of years I would discuss the guidelines for causalinference in class, mentioning specificity, but always downplaying its importance. I wouldsay something like, “the fact that cigarette smoking predisposes to emphysema, bladdercancer, coronary heart disease, etc. should not in any way detract from the inference thatcigarette smoking predisposes to the development of lung cancer”.

Then, around the year 2000, I was reading a document prepared by one of our doc-toral students in which she was addressing a possible causal influence of the presence ofbacterial vaginosis in the acquisition of HIV infection. I remember gently chastising herfor taking into account the notion of specificity, reminding her that this often was not auseful guideline. No more than two days following this conversation I received an articlefrom my colleague, Victoria Holt, on a topic of mutual interest, the potential etiologic roleof ovarian endometriosis in the development of ovarian cancer. That article reported aseveral-fold increase in the incidence of ovarian cancer associated with the presence of ovar-ian endometriosis, but no corresponding increased risk associated with endometriosis thataffected only pelvic structures other than the ovary. I wrote a message to Victoria thanking

c©2020 Noel Weiss.

Page 56: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Weiss

her for the paper, and added that because of the strength and specificity of the association,I suspected that a causal link was present.

Yes, I used the word “specificity”! Startled that I had done so, I began to search mymemory for other instances in which I had invoked the notion without actually using theterm. They were not hard to find. Here are two examples:

• In 1992, I had joined Joe Selby and others at Northern California Kaiser Permanentein publishing the results of a case-control study of fatal colorectal cancer in relation toa history of screening sigmoidoscopy (Selby et al., 1992). We reported that a consider-ably lower proportion of fatal cases of rectal and distal colon cancer than controls hadpreviously undergone screening, but that no similar association was present for fatalcases whose tumor had arisen beyond the reach of the sigmoidoscope. The specificityof this association, as well as its large magnitude, led to the widespread acceptanceof the results as indicative of genuine secondary prevention.

• In my teaching on clinical epidemiology, for some years I had featured a study usingMedicaid data (Ray et al., 1997), which examined the occurrence of hip fracture amongpersons using psychoactive drugs. An increased risk of 50 to 80% was seen amongcurrent users of antipsychotic, tricyclic antidepressant, and long half-life hypnotic oranxiolytic agents. However, the question of confounding loomed - were people whotook these drugs inherently prone to falls and/or fractures? One piece of evidencein support of a causal relation came from the specificity of the association - therewas no increased risk of fracture associated with the use of short half-life hypnoticor anxiolytic agents, just as would have been predicted based on the relatively briefimpact these agents might have on motor function.

Soon thereafter, I wrote an article (Weiss, 2001) to describe my new appreciation of theconcept of specificity as a guide to causal inference, and went on to modify my classroomteaching accordingly. And yes, I offered an apology to the graduate student! On each ofthe several times we have encountered one another in the nearly 20 years since that time,we have shared a good laugh recalling this episode.

References

U.S. Public Health Service (1964). Smoking and health. Report of the Advisory Committeeto the Surgeon General of the Public Health Service. PHS Pub. No. 1103.

Hill, A.B. (1965) The environment and disease: association or causation. Proc R Soc Med,58:295-300.

Selby JV, Friedman GD, Quesenberry CP, et al. (1992). A case-control study of screeningsigmoidoscopy and colorectal cancer mortality. N Engl J Med, 326:653-7.

Ray WA, Griffin MR, Schaffner W et al. (1987). Psychotropic drug use and the risk of hipfracture. N Engl J Med, 316:363-9.

Weiss NS (2001). Can the “specificity” of an association be rehabilitated as a basis forsupporting a causal hypothesis? Epidemiology, 13:6-8.

56

Page 57: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Observational Studies 5 (2020) 57-65 Submitted 10/19; Published 1/20

Hill and Campbell: Separate or symbiotic paradigms?

William H. Yeaton [email protected]

Measurement and Statistics program

Florida State University

Tallahassee, FL 32306, U.S.A.

1. Brief Introduction to Hill

At its heart, A.B. Hill’s 1965 Presidential address includes a set of nine “viewpoints” that,if judged to be present, increase the chance that a causal claim is correct. When taken inthis light, as Hill cautions, the nine viewpoints are neither necessary nor sufficient for cause.Instead, they might be thought of as attendant features whose presence will enhance causalinference.

What I do not believe – and this has been suggested – is that we can usefullylay down some hard-and-fast rules of evidence that must be observed beforewe accept cause and effect. None of my nine viewpoints can bring indisputableevidence for or against the cause-and-effect hypothesis and none can be requiredas a sine qua non. What they can do, with greater or less strength, is to help usto make up our minds on the fundamental question – is there any other way ofexplaining the set of facts before us, is there any other answer equally, or more,likely than cause and effect? (Hill, 1965, p. 299)

2. Introduction to the Campbellian paradigm: Two symbols and thetruth1

A second system of inquiry, elaborated by Donald Campbell and his colleagues (Shadish,Cook and Campbell, 2002), also informs casual inference but does so from what is essentiallyan opposing perspective. It asks if one or more of a lengthy list of internal validity threatsare present, thereby decreasing the probability of a correct causal claim. To be precise,Shadish, Cook, and Campbell list nine internal validity threats (p. 55), though one of these,“ambiguous temporal precedence,” is the same as “temporality” in the Hillian paradigm.In addition to these threats to causal inference, four “interaction with selection” threats(selection by history, by maturation, by instrumentation, and by regression) are elaborated,along with four threats to construct validity (compensatory equalization, compensatoryrivalry, resentful demoralization, and treatment diffusion) which were previously listed asinternal validity threats in Cook and Campbell (1979).

Campbell’s primary mechanism for ruling out internal validity threats is an individualresearch design. While this research design-internal validity threats nexus is rich and de-

1“Two symbols” refers to Xs and Os, from which many research designs are created.

c©2020 William Yeaton.

Page 58: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Yeaton

tailed, Campbell and his colleagues’ discussion of different design options and their role ineliminating validity threats is beyond the scope of this paper. Here, the central point isthat the Cambell-ian approach looks for negative study features and aims to rule them out,while the Hillian paradigm looks for positive study features and tries to rule them in. ForHill, the researcher’s task is to demonstrate that favorable conditions exist; for Campbell,the task is to demonstrate that unfavorable conditions do not exist.

3. Combine Hill and Campbell paradigms?

It is difficult to argue that one of these paradigm’s approach to truth is intrinsically betterthan the other. Certainly, researchers’ tasks are different, and each approach ultimatelyrequires judgment as neither process provides a “for sure” determination of cause. ClassicCampbell verbiage demands that we “render less plausible” a particular validity threat. ACampbellian might ask if selection bias is likely absent in a particular observational study,while a Hillian might ponder whether sufficient specificity is present to infer cause. Journalreviewers might reasonably disagree about the absence of selection bias or the presence ofspecificity.

I maintain that these alternative approaches, foundationally based upon very differentlogical and disciplinary orientations, can advance science more that either, alone. Below, Iprovide examples that show how each of Hill viewpoints can be amplified using the Camp-bellian paradigm. The “more is better” argument is not new; as Feyerabend long ago isreported to have said “Let a thousand methods bloom” (e.g., Heit, 2016). Restricting dif-ferent methodological principles to particular research orientations (isolation) leads to fewerinsights. A hybrid orientation (symbiosis) would spark cross-disciplinary collaboration viamethodological pluralism, allowing the Hill and Campbell approaches to be simultaneouslyimplemented and enabling each Hill principle to be empirically embellished using elementsof the Campbellian approach.

Unfortunately, mutually beneficial symbiosis is not the usual way of science (Campbell,2005). There are many precedents in which researchers have subscribed to apparentlyadversarial approaches and acted as rivals. Friction between Bayesians and frequentists islongstanding. Those who utilize the causal diagrams of Judea Pearl (2018) and those whofollow the potential outcomes, statistical modelling approach of Don Rubin (1974) are lesslikely to read the same studies or to conduct research in the same way.

I assert that few social scientists are familiar with Hill’s paradigm. Consistent withthis claim is the fact that neither the prominent, Cambellian-based textbooks (Campbelland Stanley, 1966; Cook and Campbell, 1979; Shadish, Cook, and Campbell, 2002) nora recently published quasi-experimental design text by a Campbellian disciple (Reichardt,2019) cite Hill’s Presidential address. In a similar vein, Hill would have been unfamiliarwith the Campbellian paradigm. Near the time of Hill’s 1965 Presidential address, fewof Campbell’s design-based papers had been published (Campbell, 1957; Campbell andStanley, 1966). And while Campbell-based methods have increasingly diffused into themedical and health areas, their spread is far from complete.

Consistent with methodological pluralism, I claim that the Campbellian approach, par-ticularly those dimensions that consider the quality of different research designs or imple-ment multiple design elements, can directly inform Hill guidelines. The juxtaposition of

58

Page 59: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill and Campbell

Campbell’s design thinking as a means to rule out threats to internal validity can serveas a powerful adjunct to the nine guidelines that Hill aims to rule in. In addition, to myknowledge, no one has directly addressed the Hillian approach from the Campbellian per-spective using research designs as the bridging process. In short, the explicit methods thatCampbell uses to rule out internal validity threats provide an empirical basis for gaugingthe presence of the Hill viewpoints.

4. Using both single and multiple design orientations to embellish Hill’sviewpoints

To illustrate how the Campbellian approach might inform Hill’s viewpoints, I provide ex-amples that follow either the usual practice of incorporating a single research design or amore novel approach that utilizes multiple research designs (Yeaton, 2019). I argue thatthis context of single and multiple designs creates a better, value-added basis to evaluateHill’s viewpoints.

While this multiple design strategy is structurally similar to an approach named within-study comparison (WSC), its purpose is fundamentally different. In most WSCs (e.g.,Wong and Steiner, 2018), an RCT estimate is used to validate an estimate from a quasi-experiment. In contrast, the primary aim of the multiple design approach is to producecorroborative evidence in which estimates from different designs are comparable in directionand magnitude. Thus, multiple designs within the same study are used to produce resultsaimed to closely agree. With similar design quality, one result from one design does nothold more inferential weight than a second result from a second design.

More formally, this design-in-design approach reflects the principle of “critical multi-plism” which Reichardt (2019) defines as “Using multiple methods with different strengthsand weaknesses so that, to the extent the results of multiple methods agree, the credibilityof results is increased.” Here, “multiple methods” can be thought of as multiple designs. Ifestimates from different designs within the same study are comparable, then causal inferenceis enhanced.

I searched for a case in health or medicine in which a multiple-design approach had beenused to yield this kind of corroborative evidence, but my efforts were unsuccessful. A recenthigh quality meta-analytic review (Chaplin et al., 2018) reported 15 WSCs in which RCTsresults were used to validate regression discontinuity (RD) results. In a personal commu-nication with one of the authors, (Morris, September 5, 2019), I wondered if the searchingprocedures these authors had used to identify WSCs might also have identified examples inwhich more than one design had been used as corroborating evidence. Unfortunately, theChaplin et al. search strategy did not yield a relevant RCT-RD example in the health ormedicine field that had been used in this corroborating way.

5. Examples that relate Campbell’s approach to Hill’s viewpoints

Two fictional but realistic examples apply the Campbellian approach to amplify the nineknowledge probes from Hill. The first example incorporates the regression discontinuitydesign to assess a drug intervention for hypertension. Imagine a drug intervention aimedto reduce hypertension. Recent rules establish a cut-point for clinically significant systolic

59

Page 60: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Yeaton

blood pressure (SBP) of 120 mm. of mercury (e.g., Mohammad and Deshpande, 2016).Certainly, family history and other risk factors enter into a clinical decision to prescribehypertension medication, but, for the sake of presentation brevity, those factors will beignored. For the RD design, with initial SBP on the x-axis and subsequent SBP on they-axis, one looks for a discontinuous SBP drop at the cut point in the regression line forpatients above 120 who received the drug.

The second example embeds an RCT within the RD design (e.g., Chaplin, 2018).Though also hypothetical, the example’s logic is borrowed from an application of the design-in-design approach used in the social sciences (Yeaton and Moss, 2020). For medical appli-cations, Kaiser (2015) has extolled the virtues of combining RCT and observational studiesbut does so by incorporating both kinds of information within the same statistical model.In the current study, the combination design is used to corroborate effect estimates.

In the recent past, the line between normal and elevated systolic blood pressure hasvacillated between 120 and 140, and best placement of the distinguishing cut-point still bearsresearch scrutiny (e.g., Flint et al., 2019). Thus, one might imagine an RCT specificallyaimed at persons in the range of uncertainty between 120 and 140. We would expect anRCT-based reduction to be comparable to the size of the discontinuous drop for the RD(keeping in mind that the RCT measures average treatment effect and the RD measuresaverage treatment effect at the cut-point).

To better articulate the value-added of combining the Campbell and Hill paradigms, Ifirst note and briefly describe each of Hill’s nine viewpoints. Then, in the context of thetwo fictional examples introduced above, I elaborate upon additional knowledge gained byadding data from the Campbellian perspective.

1. Strength (size of effect)

For Hill, a strong environment-disease relationship is one that produces large effects;it is easier to discern causal relationships in the context of large rather than smalleffects. To best assess the comparability of effects from two different designs, wewould compare the size of the RD discontinuity to the distance between the RCT’streatment and control regression lines at the cut-point. If the average treatmenteffect in the RCT and the average treatment effect at the cut-point in the RD weresimilar, that corroboration would provide a more robust strength estimate...one atthe cut-point and one near the cut-point.

2. Consistency (similar results across different study units

Hill seeks findings that are similar across “different persons, places, circumstances,times.” The RD assignment variable runs the gamut of baseline SBPs. The RCT islimited to SBPs for persons in the 120-140 range. If the RCT were based on a randomsample of participants in the region near the cut, further consistency of study findingswould be established.

3. Specificity (asks if the relationship is one-to-one: does a given disease have one kindof outcome and is that disease created by a single cause)

Hill notes that one-to-one, independent-dependent variables relationships are infre-quent. To illustrate, he asks whether cancers related to smoking will increase but

60

Page 61: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill and Campbell

cancers and deaths due to other causes (e.g., brain cancer) will not. To probe speci-ficity in the Campbellian approach, both RD and RCT can utilize the so-called “non-equivalent dependent variable” design (e.g., Herman and Walsh, 2011). The idea: thehypertensive drug’s benefit will be reflected by a difference in SBP between the treat-ment and the control group in both designs, but a second disease outcome, say ulcerprevalence, will be not show such a difference. The multiple design approach allowsreplication of a no-difference finding in diseases unlikely to change from hypertensionmedication.

4. Temporality (causes precede effects)

RD and RCT designs are both prospective. The cause necessarily precedes the effect.

5. Biological gradient (dose-response)

In the absence of confounders that covary with dose, if increased doses lead to in-creased effects, cause is buttressed. In the RD design, one could track different doselevels in different patient subsets to determine if there were different degrees of SPBdecrease. In the RCT, one could randomly assign patients different doses.

In an educational version of the multiple design approach, a random sample of collegestudents who had been placed on academic probation with semester grades belowC (2.0) but at or above D (1.0) were first chosen (Yeaton and Moss, 2018). Next,students in this 1.0-2.0 region were randomly allocated to receive either a stronglyworded, warning letter sent by certified mail (stronger dose) or an email warningletter containing less strong language (weaker dose). In addition, RD results in thehigher dose, probation intervention were compared to results for students who scoredat or above 2.0. Thus, the dose-response data from the RCT were used in collaborationwith the higher dose treatment versus no-treatment results of the RD design.

6. Experiment (random allocation)

A small sample RCT was used in the design-in-design approach. This hybrid designapproach with experimental and quasi-experimental evidence has coincident advan-tages; RD had larger sample size (enhanced statistical power) and enrolled patientswith a wider range of pretest scores.

Each of the remaining three Hillian viewpoints addresses the credibility of the independent-dependent variable relationship. In the Campbellian paradigm, these three viewpointsrelate to causal attribution but are not direct threats to internal validity. Rather theyaddress construct validity, the representations and labels given to the cause and effectconstructs and the causal mechanism itself. Empirical study of these three viewpointsusing the Campbellian paradigm would allow researchers to address the nature of thepresumed cause. Investigation of these three Hillian viewpoints requires elaboratedCampbellian designs (e.g., additional design elements such as comparative groups orsupplementary data that establish causal links) to inform each viewpoint.

7. Plausibility (asks if the relationship between the independent and dependent variable“makes sense,” from the standpoint of contemporary knowledge)

61

Page 62: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Yeaton

8. Coherence (asks if different levels of evidence “tell a similar story” regarding causeand effect. As Hill notes, that evidence may be more expansive; women begin to sufferthe same lung cancer fate as men when women begin to smoke more, or the patternof harm is similar when the biology of smoking is examined in rats and man).

9. Analogy (asks if another drug or disease would have similar biological effects)

To address the plausibility of benefit of a particular hypertensive drug, each design mightinclude measures of the biological markers known to be associated with that hypertensivedrug. If the predicted causal links occur within such a mechanism study, then the drug’s linkwith benefit is enhanced. To address coherence, inference from the RD and RCT designsmight be buttressed by using multiple study conditions (e.g., different drug treatment groupsin both the RCT and the RD...perhaps those with long and short histories of hypertension).To address analogy, different hypertensive drugs of the same class, say a different betablocker, might be used, in both designs, to better probe the “logic” of the disease.

6. Cochran’s causal crossword in the context of Hill and Campbell

Rosenbaum (2015) reflected upon Cochran’s idea of a “causal crossword” to better articulatean intricately woven argument showing how cigarette smoking might be causally related tolung cancer. Below, Rosenbaum’s words appear in italics but are interrupted to allowrelevant Hillian and Campbellian commentary. Each element of Rosenbaum’s argumentreflects a Hillian viewpoint. And each Hillian viewpoint can be directly or indirectly relatedto design thinking via Campbell.

A controlled randomized experiment shows that deliberate exposure to the toxin causesthis same cancer in laboratory animals.

Experiment (with rats, not humans)

Consider the same ideas in a biological context. A high level of exposure to a toxin,such as cigarette smoke, is associated with a particular disease, say a particular cancer, ina human population, where experimentation with toxins is not ethical.

Consistency (with humans) in an observational design (smokers versus non-smokers)

A DNA-adduct is a chemical derived from the toxin that is covalently bound to DNA,perhaps disrupting or distorting DNA transcription. Exposure to the toxin is observed to beassociated with DNA-adducts in lymphocytes in humans.

Plausibility (addresses mediator/moderator issues by providing data on biological mech-anism)

A further controlled experiment shows that the toxin causes these DNA adducts in cellcultures.

Coherence (via experiment, using evidence at the cellular level)

A case-control study finds cases of this cancer have high levels of these DNA-adducts,whereas non-cases (so-called “controls”) have low levels.

Temporality (observational data with humans, but the data are retrospective)

A pathology study finds these DNA-adducts in tumors of the particular cancer understudy, but not in most tumors from other types of cancer.

Specificity (idea is reflected in the non-equivalent dependent variable design)

62

Page 63: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill and Campbell

Certain genes are involved in repairing DNA, for instance, in removing DNA adducts;see Goode et al. (2002). In human populations, a rare genetic variant has a reduced abilityto repair DNA, in particular a reduced ability to remove adducts, and people with this variantexhibit a higher frequency of this cancer, even without high levels of exposure to the toxin.

Analogy (idea of an observational study for human cancer rates with and without thisvariant)

Each of these entries in the larger puzzle is quite tentative as an indicator that the toxincauses cancer in humans, and some of the entries do not directly intersect; e.g., the raregenetic variant is not directly linked to the toxin. Yet, the filled in puzzle with its manyintersections may be quite convincing.

This summary statement nicely reflects the Hillian perspective; only the interwoventapestry argument becomes definitive. Note that Rosenbaum does not include the view-points strength or dose-response. However, these two viewpoints could easily be incor-porated using relevant design elements via Campbell (by adding one pack-a-day and threepack-a-day smokers to an observational study and by reporting the large expected differencein lung cancer rates for these two smoking conditions).

7. What is missing from Hill

The Hillian paradigm is incomplete. The Campbellian perspective considers dimensions notdirectly discussed by Hill that are critical to correct causal inference. For example, differen-tial attrition, treatment compliance, and diffusion from the treatment to control group arenot explicitly addressed in Hill’s viewpoints. The construct validity of cause threats men-tioned above (compensatory equalization compensatory rivalry, and resentful demoraliza-tion) directly relate to the one of the Stable Unit Treatment Value Assumption requirementsof Rubin’s (1974) potential outcomes approach–that administering treatment to interven-tion group members does not change the likelihood that control group participants receivetreatment. In addition, subsequent to Hill’s 1965 address, design advancements such asinstrumental variable analysis (Angrist, Imbens, and Rubin, 1996) have been applied toprovide high quality outcome estimates in the face of incomplete treatment compliance.

8. Value-added considerations via Campbell

Five of Hill’s viewpoints are directly informed by Campbellian, design-based approaches,while three Hill viewpoints can be informed using empirical evidence of a surrogate nature.(Recall, one Hillian viewpoint, temporality, is redundant with a Campbellian internal valid-ity threat.) Multiple research designs used in the same context can better inform the Hillianparadigm, since one design informs fewer viewpoints. For example, an estimate of effectfrom a single RD design would not inform consistency. However, one might use a single,prospective design (e.g., RD only) to address temporality, specificity, and dose-response,then add a second design, say an RCT that incorporates a design-in-design approach, toenhance both strength and consistency (to judge both size and similarity of effect sizes).As an extra bonus, with evidence from multiple designs within the same study, consistentwith Hill’s values, the need to rely upon statistical significance is further reduced.

63

Page 64: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Yeaton

By itself, Hill’s paradigm provides a rich, principle-driven perspective that contributesimportant theoretical perspectives. By itself, Campbell’s approach provides design-contingent,empirical evidence allowing one to investigate the pattern of multiple study findings andto elaborate Hill’s nine queries. Together, the empirically- and conceptually-based probesof causal truth contribute complementary rule-in and rule-out arguments on which to re-duce uncertainty and to enhance inference. Though sometimes non-intersecting, the result-ing Hill-Campbell methodological combinations allow researchers to form a more completecausal mosaic.

References

Angrist, J., Imbens, G. and Rubin, D. (1996). Identification of causal effects using instru-mental variables. Journal of the American Statistical. Association, 91, 444–472.

Campbell, D.T. (1957). Factors relevant to validity of experiments in field settings. Psy-chological Bulletin, 54, 297-312.

Campbell, D.T. (2005). Ethnocentrism of disciplines and the fish-scale model of omni-science. In S.J. Derry, C.D. Schunn, and M.A. Gernsbacher, (Eds.), InterdisciplinaryCollaboration: An Emerging Cognitive Science, pp. 3-21. Lawrence Erlbaum, Mahwah,NJ.

Campbell, D.T. and Stanley, J.C (1966). Experimental and quasi-experimental designs forresearch. Rand McNally, Chicago.

Chaplin, D.D, Cook, T.D., Zurovac, J., Coopersmith, J.S., Finucane, M.M., Vollmer, L.N.,and Morris, R.E., (2018). The internal and external validity of the regression discontinu-ity design: A meta-analysis of 15 within-study comparisons. Journal of Policy Analysisand Management, 37, 403-429, https://doi.org/10.1002/pam.22051

Cook, T.D. and Campbell, D.T. (1979). Quasi-experimentation: Design and analysis forfield settings. Rand McNally, Chicago.

Flint, A.C. et al. (2019). Effect of systolic and diastolic blood pressure on cardiovascularoutcomes. New England Journal of Medicine, 381, 243-251.

Heit, H. (2016). Reasons for relativism: Feyerabend on the ‘rise of rationalism’ in ancientGreece. Studies in History and Philosophy of Science, 57, 70-78.

Herman, P.M. and Walsh, M.E. (2011). Hospital admissions for acute myocardial infarction,angina, stroke, and asthma after implementation of Arizona’s comprehensive statewidesmoking ban. American Journal of Public Health, 101, 491-496.

Hill, A. B. (1965). The environment and disease: Association or causation? Proceedings ofthe Royal Society of Medicine, 58, 295-300.

Kaisar, E.E. (2015). Incorporating both randomized and observational data into a singleanalysis. Annual Review of Statistics and its Application, 2, 49-72.

Pearl, J. (2018). The book of why: The new science of cause and effect. Basic Books, NewYork.

Reichardt, C.S. (2019). Quasi-experimentation. A guide to design and analysis. Guilford,New York.

Rosenbaum, P.R. (2015). Cochran’s causal crossword. Observational Studies, 1, 205-211.

Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomizedstudies. Journal of Educational Psychology, 66, 688–701.

64

Page 65: The environment and disease: association or causation?Observational Studies 6 (2020) 1-9 Submitted 1965; Published reprinted, 1/20 The environment and disease: association or causation?

Hill and Campbell

Saklayen, M.G. and Deshpande, N.V. (2016). Timeline of history of hypertension treatment.Frontiers in Cardiovascular Medicine, 3, 1-14.

Shadish, W.R., Cook, T.D., and Campbell, D.T. (2002). Experimental and quasi-experimentaldesign for generalized causal inference. Houghton Mifflin, Boston.

Wong, V.C. and Steiner, P.M. (2018) Designs of empirical evaluations of nonexperimentalmethods in field settings. Evaluation Review, 42, 176-213.

Yeaton, W.H. and Moss, B.G. (2020). A multiple-design, experimental strategy: Academicprobation warning letter’s impact on student achievement. Journal of Experimental Ed-ucation, 88, 123-144.

Yeaton, W. H. (2019). Experimental and quasi-experimental designs. In P. Brough (Ed.),Advanced Research Methods for Applied Psychology, pp. 107-123. Routledge, New York.

65