fast, accurate causal search algorithms from the center ... · causal discovery methods...

34
Fast, Accurate Causal Search Algorithms from the Center for Causal Discovery (CCD) The CCD Algorithms Group University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center Yale University BD2K All Hands Meeting 11/29/2016

Upload: others

Post on 26-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Fast, Accurate Causal Search Algorithms from the Center for Causal Discovery (CCD)

The CCD Algorithms Group

University of PittsburghCarnegie Mellon University

Pittsburgh Supercomputing CenterYale University

BD2K All Hands Meeting 11/29/2016

Page 2: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Causal Discovery in Biomedicine

Science is centrally concerned with the discovery of causal relationships in nature

• Understanding• Control

Examples:• Determine the genes and cell signaling pathways that

cause breast cancer • Discover the clinical effects of a new drug• Uncover the mechanisms of pathogenicity of a recently

mutated virus that is spreading rapidly in the population

Page 3: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Why Establish a Center for Causal Discovery Now?

Algorithmic Advances+

Availability of Big Biomedical Data

Page 4: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Algorithmic Advances

• In the past 25 years, there has been tremendous progress in the development of computational methods for representing and discovering causal networks from a combination of observational data, experimental data, and knowledge.

• These methods are generally applicable to biomedical data.

Page 5: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Availability of Big Biomedical Data

• The variety, richness, and quantity of biomedical data havebeen increasing very rapidly.

• The appropriate analysis of these data has great potential to advance biomedical science.

http://aldousvoice.files.wordpress.com/2014/06/database.jpg

Page 6: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Primary Goals of the CCD

• Goal 1. Develop and implement state-of-the-art methods for discovering causal knowledge from biomedical big data using causal graphical models– Make some of the best existing causal discovery methods

available as free, open source software– Develop new methods and make them available

• Goal 2. Investigate three biomedical projects (cancer, lung disease, brain functional connectivity) to evaluate methods and drive their further development

• Goal 3. Disseminate causal discovery software and knowledge widely to biomedical researchers and data scientists

Page 7: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Typical Causal Analysis Workflow

Prior Knowledge

DataCausal

Analysis

Causal Network

Page 8: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Typical Causal Analysis Workflow

Prior Knowledge

DataCausal

Analysis

Causal Network

Causal Hypothesis

Generation by Biomedical Scientists

Page 9: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Typical Causal Analysis Workflow

Prior Knowledge

DataCausal

Analysis

Causal Network

Causal Hypothesis

Generation by Biomedical Scientists

Experiments

Page 10: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Typical Causal Analysis Workflow

Prior Knowledge

CausalAnalysis

Causal Network

Causal Hypothesis

Generation by Biomedical Scientists

Experiments

Data

Page 11: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Typical Causal Analysis Workflow

Prior Knowledge

CausalAnalysis

Causal Network

Causal Hypothesis

Generation by Biomedical Scientists

Experiments

Data

Page 12: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Basic Components Needed to LearCausal Networks from Data

n

• Model representation• Model evaluation• Model search

Page 13: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Model Represenation• Causal Bayesian network (CBN)

– Directed acyclic graph– Nodes represent variables– Arcs represent causal influence– Specify P(X | parents(X)) for each X

This figure is adapted from: Sachs K, et al. Protein-signaling networks learned from multi-parameter single-cell data of human T cells Science 308 (2005) 523-529.

Page 14: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Model Representation with CBNs

Page 15: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Model Representation Issues

Page 16: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Model Evaluation

• Constraint based (e.g., tests of conditional independence)

• Score based (e.g., Bayesian scores)

Page 17: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

What is the Big Data Problem on which the CCD is Primarily Focused?

Page 18: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

The Number of Causal Model Structuresas a Function of the Number of Measured Variables*

Number of variables (nodes) Number of Causal Model Structures

1 1

2 3

* Assumes there are no latent variables and no directed cycles.

Page 19: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

The Number of Causal Model Structures as a Function of the Number of Measured Variables*

Number of variables (nodes) Number of Causal Model Structures

1 1

2 3

3 25

4 543

* Assumes there are no latent variables and no directed cycles.

Page 20: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

The Number of Causal Model Structuresas a Function of the Number of Measured Variables*

Number of variables (nodes) Number of Causal Model Structures

1 1

2 3

3 25

4 543

5 29,281

6 3,781,503

7 1.1 x 109

8 7.8 x 1011

9 1.2 x 1015

10 4.2 x 1018

* Assumes there are no latent variables and no directed cycles.

Page 21: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Our Main Big Data Problem

Analyze biomedical datasets containing a large number of variablesin order to generate plausible hypotheses of the causal relationships that hold among those variables

Page 22: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

An Example Algorithm for Causal Discovery with Many Variables: FGES

• GES: A popular CBN learning algorithm that uses greedy search and Bayesian scoring*

• We developed a fast version of GES, called FGES– Optimized the single processor version of GES– Parallelized GES

* Chickering DM. Optimal structure identification with greedy search. Journal of Machine Learning Research 3 (2002) 507-554.

Page 23: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Evaluation of FGES• Generated 10 random CBNs

– 30,000 nodes and 60,000 edges– Continuous-variables with linear relationships and Gaussian noise

• Sampled each CBN to generate 1,000 cases• Provided those cases to FGES and measured its ability to

learn the data-generating CBN

Average Directed Arc

Precision

AverageDirected Arc

Recall

# Processors AverageLearning

Time99% 84% 128 2.3 minutes

For more information:• http://arxiv.org/ftp/arxiv/papers/1507/1507.07749.pdf

• Ramsey J, Glymour C. A Million Variables and More: The Fast Greedy Search (FGS) Algorithm for Learning High Dimensional Graphical Causal Models (to appear).

Page 24: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Another Example of an Algorithm for Causal Discovery with Many Variables: GFCI

• FGES assumes there are no latent confounders, that is, there are no latent variables that cause two or more measured variables

• Biomedical data often contain latent confounders• GFCI* allows for the possibility of latent confounders

• Ogarrio JM, Spirtes P, Ramsey J (2016). A hybrid causal search algorithm for latent variable models. JMLR Workshop and Conference Proceedings, 52, 368-379.

Page 25: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Evaluation of GFCI• Generated more than 100 random CBNs

– 1,000 nodes and 2,000 edges– Continuous variables with linear Gaussian relationships

• Sampled each CBN to generate 2,000 cases• Provided cases to GFCI and measured its performance

% Latent Nodes

Average Directed Arc

Precision

AverageDirected Arc

Recall

# Processors AverageLearning

Time

5% 92% 93% 1 15 seconds

For more information: Ogarrio JM, Spirtes P, Ramsey J (2016). A hybrid causal search algorithm for latent variable models. JMLR Workshop and Conference Proceedings, 52, 368-379.

Page 26: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Ongoing Algorithm Work Includes …

• Modeling non-linear relationships

• Modeling causal feedback

• Handling a mixture of continuous and discrete variables

• Outputting uncertainty in edge relationships

• Learning the causal relationships among latent variables

Page 27: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Summary

• Causal discovery is central to biomedical science

• The variety, richness, and quantity of biomedical data are increasing rapidly

• The CCD is providing software now for analyzing big biomedical data to discover causal relationships

• Causal discovery algorithms with additional capabilities will soon be available as well

Page 28: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Acknowledgements

• Thanks to the members of the Algorithms Group of the Center for Causal Discovery for their contributions to the activities described in this talk.

• The Center for Causal Discovery is supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov). The content of this presentation is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Page 29: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Thank you

[email protected]

CCD software is available at:www.ccd.pitt.edu

Page 30: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory
Page 31: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Extra Slides

Page 32: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Association Versus Causation

• Association • Represents statistical relationships • Predicts outcomes from passive observations• Example uses: classification and regression

• Causation: • Represents mechanisms• Predicts outcomes of active intervention• Example uses: decision making and planning

Page 33: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Example

• Association Smoking – lung cancer – coughing Both smoking and coughing predict lung cancer

• Causation Smoking lung cancer coughing Smoking influences lung cancer Coughing does not influence lung cancer

Page 34: Fast, Accurate Causal Search Algorithms from the Center ... · Causal Discovery Methods Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory

Recent Examples of the Use of Graphical Causal Discovery Methods

Anticipation-related brain connectivity in bipolar and unipolar depression: A graph theory approachAnna Manelis, Jorge R. C. Almeida, Richelle Stiffler,1 Jeanette C. Lockovich, Haris A. Aslam, Mary L. Phillips. Brain 139 (2016) 2554-2566.

Dobryakova, E., Costa, S. L., Wylie, G. R., DeLuca, J., & Genova, H. M. (2016). Altered effective connectivity during a processing speed task in individuals with multiple sclerosis. Journal of the International Neuropsychological Society: JINS, 22(2), 216-224.

Otsuka, J. (2016). Discovering phenotypic causal structure from nonexperimental data. Journal of evolutionary biology, 29(6), 1268-1277.

Attur, M., Statnikov, A., Samuels, J., Li, Z., Alekseyenko, A. V., Greenberg, J. D., et al. (2015). Plasma levels of interleukin-1 receptor antagonist (IL1Ra) predict radiographic progression of symptomatic knee osteoarthritis. Osteoarthritis and Cartilage, 23(11), 1915-1924.