causal effects in integrative genomics carlo berzuini and brian tom mrc biostatistics unit,...
TRANSCRIPT
Causal Effects in Integrative Genomics
Carlo Berzuini and Brian Tom
MRC Biostatistics Unit, Cambridge
Collaboration with Alison Goodall and Chris Jones,Department of Cardiovascular Sciences,University of Leicester
Cambridge, December 2006
Aim• estimating the (causal) effect of endophenotypes on a disease condition of interest
•Endophenotype = inheritable, measurable characteristic along the pathway from genes to a disease condition of interest
•e.g. effect of platelet aggregation on risk of thrombosis
in order to:
•inform medical interventions, help discovering new drug targets, predict adverse clinical events, improve power to detect genetic effects….
ENDOPHENOTYPE DISEASE
Presence of prion proteins vCJDTubulointerstitial fibrosis RenalHigh platelet reactivity Thrombosis Fibrinogen level CHD…….. ……..
Outline
•Causation vs statistical association
•Reverse causation, confounding, …..
•Causality and Probabilistic Graphical models
•A formal graphical method to assess estimability of causal effects in the context of functional genomics experiments
•Illustrative examples
Difficulty:association IS NOT causation
In fact:
A and B may be associated, but an intervention on A have no effect on B , and viceversa
Want to establish whether the associations we observe under a given observational regime,
i.e., our data, allow us to infer causal relationships
Example 1
We observe a positive association between infections in early life and asthma. Someone interprets this to indicate that former “cause” the latter.
earlyinfections asthma
However, there is also evidence that asthma may itself cause an increased risk of infections, and that asthmatics are likely to carry an inheritable defective response to rhinovirus, making them more vulnerable to rhinovirus infection (Pekkanen, 2004)
earlyinfections
?
asthma
Rhinovirusresponsegenotype
Example 2
Some authors claim evidence that moderate alcohol consumption protectsfrom heart disease:
moderatealcohol
vsno alcohol
heartdisease
protects from
positiveattitudetowards
life events
heartdisease
But confounders might be operating:
moderatealcohol
confounders
heartdisease
ADH3mutation
slow alcohol
clearance
There is statistical evidence that people with an ADH3 mutation, provided they are not heavy drinkers, are at lower risk of heart disease. In this people clearance of alcohol is slower, resulting in an higher exposure to alcohol. Hence the association between ADH3 mutation and lower incidence of heart disease proves that an increased exposure to alcohol is cardio-protective.
higherexposureto alcohol
Example 2 (continued)
A FORMAL GRAPHICAL APPROACH TO STATISTICAL
CAUSALITY
Z
clinical outcome
intermediate
phenotype
measured covariates
genotype
Y
C
X
express the conditional independence relationships among the domain variables in the form of a directed acyclic graph (DAG).
Probabilistic directed graphical models
Z
Y
C
X
),,|P( )|P( )P( )(P ),,,P( CZXYXZXCYZXC
Implied conditional independence propertiesof the joint distribution over the graph can beread off the graph by the moralization criterion
Moralization criterion(Lauritzen et al., 1990)
Suppose we want to ascertain whether
Z| Y X
First we remove any node which is neitherin ),,( ZYX nor an ancestor of a node
in this set. Then we add a line between anytwo nodes with a common child, if they arenot already connected. Finally, we removearrowheads.
Z
Y
C
X
Z
Y
C
X
Unmoralized
Moralized
Z
Y
C
X
To check
Z| Y X
look for a path between X and Y that does
not intersect Z. If there is no such path, the
above relationship is true.
P(Y | X=x) is in general different from what Pearl denotes as
P(Y | do(X=x)) that is, the distribution of Y following an intervention that sets X to take a specific value x.
Traditional DAGs not sufficiently expressive to reason about causes
and effects of causes
Augment the DAG by addingintervention nodes
The value of the intervention node, Fz , indicates
what type of intervention is performed on Z.
Z
Y
C
X
Fz
z. value takeset to is Z
naturally arise to allowed is Z:oninterventi no
zF
F
z
z
Z
Y
C
X
Fz=z
Interventionnode/indicator
INTERVENTION DISTRIBUTION
• P(Y|Fx=x) denotes the distribution of Y
when we intervene by setting X to take value x.
• Causal effect of X on Y measured by an appropriate contrast between P(Y|Fx=x) and P(Y|Fx=x*), where x* is a chosen
reference or baseline value
Average Causal Effect
Let x* represent a baseline value of X. Then the Average Causal Effect on Y due to setting X to be equal to x, is defined by
ACE(X, Y) = EE(Y | Fx=x) - EE(Y | Fx=x*)
ACE is straightforwardly estimated in a controlled experimental setting where we have the power to fix the value of X and then observe the resulting Y.
Given that we have a specific, non-necessarily experimental, set of data, how do we determine whether we can estimate a specific ACE ?
Estimability of a causal effectfrom observational data:
the back-door criterion (Pearl, 1999)
)|P(c ), ,|P( )|P(
:on onsinterventifor have, Then we
:assert toable are weSuppose .for indicator on interventian denotelet and riables,domain va of sets be and ,Let
TTT
T
T
T
FcctTFytFy
Tt T)(C FY
FC
TFCYT
|
X Y
TFT
),,(
),(
TCX FY
F CX
| T
T
Back-door conditions for ACE(T,Y) satisfied:
C
satisfied
satisfied
X Y
TFT
T)X FY
F CX
| T
T
,(
),(
Back-door conditions for ACE(T,Y):
satisfied
not satisfied
U unobserved
X
A typical observational scenario
Y
Z UFz
genotype
intermediatephenotype unobserve
d
clinical outcome
cannot estimate the causal effect of Z on Y,because a back-door condition is violated
However, we can measure the association between X and Y. If significant, it implies that Z causally influences Y.
Causal effect of Z on Y cannot be measured, unlessunder strict parametric assumptions (instrumental variable method).
X Y
Z UFz
genotype
intermediatephenotype unobserve
d
clinical outcome
X Y
Z UFz
genotype
intermediatephenotype unobserved
confounders
clinical outcome
W unobserved
population
stratification
A more realistic scenario
causal variant H
X Y
Z UFz
genotype
intermediatephenotype
unobserved
clinical outcome
W
“No unobserved confounders between intermediate phenotype and clinical outcome”
unobserved
X Y
Z UFz
genotype
intermediatephenotype
unobserved
clinical outcome
W unobserved
) | P( z) |P(
)'(P ),,'|P( z) |P(
Z
Z x' Z
xZFZ
xFzxYFY
Causal effect of Z upon Y estimated via:
Effect of X upon Y estimated via:
“No unobserved confounders between intermediate phenotype and clinical outcome”
This condition can be approximated by choosing an intermediate phenotype (Z) that:
• is relevant to disease• is inheritable• is pathway-specific (typically involving in vitro
experiments)• in vitro response should accurately reflect in vivo
response• is reproducible (rank preserving)
X1Y
ZU
Fz
genotype
intermediatephenotype
unobserved
clinical outcome
unobserved
….genotype
XK
W
z y
Several genes:
Illustrative study:the role of platelet in the genesis of
occlusive thrombosis
The medical problem: carotid endarterectomy surgery removes the plaque that narrows neck arteries and increases the risk of stroke. This surgery is associated with a risk of stroke after the procedure, due to tiny clots called microemboli that break off the surface of the cleaned artery.
Scientific question: why do some patients appear to be at higher risk of forming these blood clots, and of dying as a consequence of this ?
Causal hypothesis: some patients might be at high risk because their blood platelets, the cells which initiate clotting, are highly sensitive to a chemical called collagen. Such a hypersensitivity is likely to be a genetically inheritable trait.
Study sampleStudy sample
Collaboration with: Prof Ross Naylor, Prof Alison Goodall, Mr Paul Hayes, Mr David Payne and Mr Chris Jones, Department of Cardiovascular Sciences at the University of Leicester.
260 carotid endarterectomy patients
Each patient characterized by: • number of post-operative emboli (detected via transcranial Doppler)• multilocus unphased genotypes at seven candidate genes that code
for platelet membrane receptors involved in the clotting response • in vitro measurements of platelet reactivity (described next) under
collagen stimulation
platelet
collagenmolecule andits receptors
collagen binding activates a signalling cascade,leading to an increased concentration of calcium
Calciumconcentration
collagenmolecule andits receptors
An increase in calcium concentration pushesp-selectin proteins outside the membrane
platelet becomes “sticky”
Calciumconcentration
p-selectincollagenmolecule andits receptors
Because platelet is now sticky, a lot of fibrinogen molecules adhere to it. This favours
platelet aggregation
Calciumconcentration
collagenmolecule andits receptors
fibrinogenmolecules
Stimulate platelet byin vitro exposure todifferent doses ofcollagen
Measurefibrinogenbinding
Measuring platelet reactivity
•highly inheritable, highly pathway specific, highly reproducible•in vitro response accurately reflects in vivo response
hyperreactive
Incorporating genotype information
Each individual typed at several SNP in the coding region of each of the known types of receptor involved in collagen binding. In total, we considered 7 multilocus genotypes corresponding to 7 unlinked genes.
In the future we shall include genotypes, and expression levels, for more than 100 genes known to belong to the relevant pathways and/or with highly differential expression between groups of individuals with extreme platelet response
X1Y
Z U
genotype
hyperreactive ?(YES/NO)
unobserved
number of emboli >25 (YES/NO)
unobserved
….genotype
X7
W
z y
Preliminary results of our analysis highlight the role of non-synonimousmutations in the GPVI collagen receptor:
odds-ratio ofhyperreactivity
vs GPVI mutation
ACE(hyperreactive,no.emboli)
References
Judea Pearl: Causality, Cambridge University Press, 2000
Phil Dawid: Causal inference without counterfactuals (with Discussion). Journal of the American Statistical Association, 95, 407-48, 2000
Jones et al.: Mapping the platelet profile for functional genomicstudies. Submitted to Circulation, 2006
Didelez,V. and Sheehan, N.: Mendelian randomisation andinstrumental variables: what can and what can’t be done.Research Report. Department of Health Sciences,University of Leicester, 2006.