causal effects in integrative genomics carlo berzuini and brian tom mrc biostatistics unit,...

42
Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones, Department of Cardiovascular Sciences, University of Leicester Cambridge, December 2006

Upload: morris-wilfred-gilbert

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Causal Effects in Integrative Genomics

Carlo Berzuini and Brian Tom

MRC Biostatistics Unit, Cambridge

Collaboration with Alison Goodall and Chris Jones,Department of Cardiovascular Sciences,University of Leicester

Cambridge, December 2006

Page 2: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Aim• estimating the (causal) effect of endophenotypes on a disease condition of interest

•Endophenotype = inheritable, measurable characteristic along the pathway from genes to a disease condition of interest

•e.g. effect of platelet aggregation on risk of thrombosis

in order to:

•inform medical interventions, help discovering new drug targets, predict adverse clinical events, improve power to detect genetic effects….

Page 3: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

ENDOPHENOTYPE DISEASE

Presence of prion proteins vCJDTubulointerstitial fibrosis RenalHigh platelet reactivity Thrombosis Fibrinogen level CHD…….. ……..

Page 4: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Outline

•Causation vs statistical association

•Reverse causation, confounding, …..

•Causality and Probabilistic Graphical models

•A formal graphical method to assess estimability of causal effects in the context of functional genomics experiments

•Illustrative examples

Page 5: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Difficulty:association IS NOT causation

In fact:

A and B may be associated, but an intervention on A have no effect on B , and viceversa

Want to establish whether the associations we observe under a given observational regime,

i.e., our data, allow us to infer causal relationships

Page 6: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Example 1

We observe a positive association between infections in early life and asthma. Someone interprets this to indicate that former “cause” the latter.

earlyinfections asthma

However, there is also evidence that asthma may itself cause an increased risk of infections, and that asthmatics are likely to carry an inheritable defective response to rhinovirus, making them more vulnerable to rhinovirus infection (Pekkanen, 2004)

earlyinfections

?

asthma

Rhinovirusresponsegenotype

Page 7: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Example 2

Some authors claim evidence that moderate alcohol consumption protectsfrom heart disease:

moderatealcohol

vsno alcohol

heartdisease

protects from

positiveattitudetowards

life events

heartdisease

But confounders might be operating:

moderatealcohol

Page 8: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

confounders

heartdisease

ADH3mutation

slow alcohol

clearance

There is statistical evidence that people with an ADH3 mutation, provided they are not heavy drinkers, are at lower risk of heart disease. In this people clearance of alcohol is slower, resulting in an higher exposure to alcohol. Hence the association between ADH3 mutation and lower incidence of heart disease proves that an increased exposure to alcohol is cardio-protective.

higherexposureto alcohol

Example 2 (continued)

Page 9: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

A FORMAL GRAPHICAL APPROACH TO STATISTICAL

CAUSALITY

Page 10: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Z

clinical outcome

intermediate

phenotype

measured covariates

genotype

Y

C

X

express the conditional independence relationships among the domain variables in the form of a directed acyclic graph (DAG).

Probabilistic directed graphical models

Page 11: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Z

Y

C

X

),,|P( )|P( )P( )(P ),,,P( CZXYXZXCYZXC

Implied conditional independence propertiesof the joint distribution over the graph can beread off the graph by the moralization criterion

Page 12: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Moralization criterion(Lauritzen et al., 1990)

Suppose we want to ascertain whether

Z| Y X

First we remove any node which is neitherin ),,( ZYX nor an ancestor of a node

in this set. Then we add a line between anytwo nodes with a common child, if they arenot already connected. Finally, we removearrowheads.

Page 13: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Z

Y

C

X

Z

Y

C

X

Unmoralized

Moralized

Page 14: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Z

Y

C

X

To check

Z| Y X

look for a path between X and Y that does

not intersect Z. If there is no such path, the

above relationship is true.

Page 15: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

P(Y | X=x) is in general different from what Pearl denotes as

P(Y | do(X=x)) that is, the distribution of Y following an intervention that sets X to take a specific value x.

Traditional DAGs not sufficiently expressive to reason about causes

and effects of causes

Page 16: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Augment the DAG by addingintervention nodes

The value of the intervention node, Fz , indicates

what type of intervention is performed on Z.

Z

Y

C

X

Fz

z. value takeset to is Z

naturally arise to allowed is Z:oninterventi no

zF

F

z

z

Page 17: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Z

Y

C

X

Fz=z

Interventionnode/indicator

Page 18: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

INTERVENTION DISTRIBUTION

• P(Y|Fx=x) denotes the distribution of Y

when we intervene by setting X to take value x.

• Causal effect of X on Y measured by an appropriate contrast between P(Y|Fx=x) and P(Y|Fx=x*), where x* is a chosen

reference or baseline value

Page 19: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Average Causal Effect

Let x* represent a baseline value of X. Then the Average Causal Effect on Y due to setting X to be equal to x, is defined by

ACE(X, Y) = EE(Y | Fx=x) - EE(Y | Fx=x*)

ACE is straightforwardly estimated in a controlled experimental setting where we have the power to fix the value of X and then observe the resulting Y.

Given that we have a specific, non-necessarily experimental, set of data, how do we determine whether we can estimate a specific ACE ?

Page 20: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Estimability of a causal effectfrom observational data:

the back-door criterion (Pearl, 1999)

)|P(c ), ,|P( )|P(

:on onsinterventifor have, Then we

:assert toable are weSuppose .for indicator on interventian denotelet and riables,domain va of sets be and ,Let

TTT

T

T

T

FcctTFytFy

Tt T)(C FY

FC

TFCYT

|

Page 21: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X Y

TFT

),,(

),(

TCX FY

F CX

| T

T

Back-door conditions for ACE(T,Y) satisfied:

C

satisfied

satisfied

Page 22: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X Y

TFT

T)X FY

F CX

| T

T

,(

),(

Back-door conditions for ACE(T,Y):

satisfied

not satisfied

U unobserved

Page 23: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X

A typical observational scenario

Y

Z UFz

genotype

intermediatephenotype unobserve

d

clinical outcome

cannot estimate the causal effect of Z on Y,because a back-door condition is violated

Page 24: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

However, we can measure the association between X and Y. If significant, it implies that Z causally influences Y.

Causal effect of Z on Y cannot be measured, unlessunder strict parametric assumptions (instrumental variable method).

X Y

Z UFz

genotype

intermediatephenotype unobserve

d

clinical outcome

Page 25: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X Y

Z UFz

genotype

intermediatephenotype unobserved

confounders

clinical outcome

W unobserved

population

stratification

A more realistic scenario

causal variant H

Page 26: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X Y

Z UFz

genotype

intermediatephenotype

unobserved

clinical outcome

W

“No unobserved confounders between intermediate phenotype and clinical outcome”

unobserved

Page 27: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X Y

Z UFz

genotype

intermediatephenotype

unobserved

clinical outcome

W unobserved

) | P( z) |P(

)'(P ),,'|P( z) |P(

Z

Z x' Z

xZFZ

xFzxYFY

Causal effect of Z upon Y estimated via:

Effect of X upon Y estimated via:

Page 28: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

“No unobserved confounders between intermediate phenotype and clinical outcome”

This condition can be approximated by choosing an intermediate phenotype (Z) that:

• is relevant to disease• is inheritable• is pathway-specific (typically involving in vitro

experiments)• in vitro response should accurately reflect in vivo

response• is reproducible (rank preserving)

Page 29: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X1Y

ZU

Fz

genotype

intermediatephenotype

unobserved

clinical outcome

unobserved

….genotype

XK

W

z y

Several genes:

Page 30: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Illustrative study:the role of platelet in the genesis of

occlusive thrombosis

Page 31: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

The medical problem: carotid endarterectomy surgery removes the plaque that narrows neck arteries and increases the risk of stroke. This surgery is associated with a risk of stroke after the procedure, due to tiny clots called microemboli that break off the surface of the cleaned artery.

Scientific question: why do some patients appear to be at higher risk of forming these blood clots, and of dying as a consequence of this ?

Causal hypothesis: some patients might be at high risk because their blood platelets, the cells which initiate clotting, are highly sensitive to a chemical called collagen. Such a hypersensitivity is likely to be a genetically inheritable trait.

Page 32: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Study sampleStudy sample

Collaboration with: Prof Ross Naylor, Prof Alison Goodall, Mr Paul Hayes, Mr David Payne and Mr Chris Jones, Department of Cardiovascular Sciences at the University of Leicester.

260 carotid endarterectomy patients

Each patient characterized by: • number of post-operative emboli (detected via transcranial Doppler)• multilocus unphased genotypes at seven candidate genes that code

for platelet membrane receptors involved in the clotting response • in vitro measurements of platelet reactivity (described next) under

collagen stimulation

Page 33: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

platelet

collagenmolecule andits receptors

Page 34: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

collagen binding activates a signalling cascade,leading to an increased concentration of calcium

Calciumconcentration

collagenmolecule andits receptors

Page 35: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

An increase in calcium concentration pushesp-selectin proteins outside the membrane

platelet becomes “sticky”

Calciumconcentration

p-selectincollagenmolecule andits receptors

Page 36: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Because platelet is now sticky, a lot of fibrinogen molecules adhere to it. This favours

platelet aggregation

Calciumconcentration

collagenmolecule andits receptors

fibrinogenmolecules

Page 37: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Stimulate platelet byin vitro exposure todifferent doses ofcollagen

Measurefibrinogenbinding

Measuring platelet reactivity

•highly inheritable, highly pathway specific, highly reproducible•in vitro response accurately reflects in vivo response

Page 38: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

hyperreactive

Page 39: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Incorporating genotype information

Each individual typed at several SNP in the coding region of each of the known types of receptor involved in collagen binding. In total, we considered 7 multilocus genotypes corresponding to 7 unlinked genes.

In the future we shall include genotypes, and expression levels, for more than 100 genes known to belong to the relevant pathways and/or with highly differential expression between groups of individuals with extreme platelet response

Page 40: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

X1Y

Z U

genotype

hyperreactive ?(YES/NO)

unobserved

number of emboli >25 (YES/NO)

unobserved

….genotype

X7

W

z y

Page 41: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

Preliminary results of our analysis highlight the role of non-synonimousmutations in the GPVI collagen receptor:

odds-ratio ofhyperreactivity

vs GPVI mutation

ACE(hyperreactive,no.emboli)

Page 42: Causal Effects in Integrative Genomics Carlo Berzuini and Brian Tom MRC Biostatistics Unit, Cambridge Collaboration with Alison Goodall and Chris Jones,

References

Judea Pearl: Causality, Cambridge University Press, 2000

Phil Dawid: Causal inference without counterfactuals (with Discussion). Journal of the American Statistical Association, 95, 407-48, 2000

Jones et al.: Mapping the platelet profile for functional genomicstudies. Submitted to Circulation, 2006

Didelez,V. and Sheehan, N.: Mendelian randomisation andinstrumental variables: what can and what can’t be done.Research Report. Department of Health Sciences,University of Leicester, 2006.