joe ramsey with enormous help from clark glymour as well as russell poldrack, stephen and catherine...

46
Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez, and others. 1 Causal Modeling of FMRI Data

Upload: rudolf-gibbs

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

1

Joe RamseyWith enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson,

Stephen Smith, Erich Kummerfeld, Ruben Sanchez, and others.

Causal Modeling of FMRI Data

Page 2: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

2

Goals

1. From Imaging data, to extract as much information as we can, as accurately as we can, about which brain regions influence which others in the course of psychological tasks.

2. To generalize over tasks

3. To specialize over groups of people.

Page 3: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

3

What Are the Brain Variables?

In current studies, from20,000 + ………………………………….3

voxels ROIs

ROI = Region of interest

Question: How sensitive are causal inferences to brain variable selection?

Page 4: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

4

How are ROIs constructed (FSL)?

• Define an experimental variable (box function).• Use a generalized linear model to determine which

voxels “light up” in correlation with the experimental variable.

• Add a group level step if voxels lighting up for the group is desired.

• Cluster the resulting voxels into connected clusters.– Small clusters are eliminated.– Remaining clusters become the ROIs.– Symmetry constraints may be imposed.

Page 5: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

5

Simple point about latents…

• Note that if variables are all picked out this way and the model is entirely cyclic, there can be no latent variables! … I is correlated with L…

• Probably not all cycles though…

Page 6: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

6

Search Complexity: How Big is the Set of Possible Explanations?

X YX YX YX YX Y

}For N variables:

8

2

N

For graphical models:

Page 7: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

7

Statistical Complexity• Graphical models are untestable unless

parameterized into statistical models.• Incomplete models of associations are likely to

fail tests.• Multiple testing problems.• Multiple subjects/Missing ROIs.• No fast scoring method for mixed ancestral

graphs that model feedback and latent common causes.

• Weak time lag information.

Page 8: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

8

Measurement Complexity• Sampling rate is slower than causal interaction

speed.• Indirect measurement creates spurious

associations of measured variables:N1 N2 N3 X1 X3X1 X2 X3 X2

Neural N, measured X Regression of X3 on X1, X2

Page 9: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

9

Specification Strategies

1. Guess a model and test it.2. Search the model space or some restriction

of it.a. Search for the full parameterized structureb. Search for graphical structure alonec. Search for graphical features (e.g, adjacencies)

Page 10: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

10

What Evidence of What Works, and Not?

• Theory.– Limiting correctness of algorithms (PC, FCI, GES,

LiNGAM, etc., under usually incorrect assumptions for fMRI).

• Prior Knowledge– Do automated search results conform with

established relationships?• Animal Experiments (Limited)• Simulation Studies

Page 11: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

11

Brief Review: Smith’s Simulation Study

• 5 to 50 variables• 28 simulation conditions, 50 subjects/condition.• 38 search methods• Search 1 subject at a time.

Page 12: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

12

Methods tested by Smith• DCM, SEM excluded; no search. (Not completely true.)• Full correlation in various frequency bands• Partial correlation• Lasso (ICOV)• Mutual Information, Partial MI• Granger Causality• Coherence• Generalized Synchronization• Patel’s Conditional Dependence Measures

– P(x|y) vs P(y|x)• Bayes Net Methods

– CCD, CPC, FCI, PC, GES• LiNGAM

Page 13: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

13

Smith’s Results

• Adjacencies:– Partial Correlation methods (GLASSO) and several

“Bayes Net” methods from CMU get ~ 90% correct in most simulations.

• Edge Directions– Smith: “None of the methods is very accurate, with

Patel's τ performing best at estimating directionality, reaching nearly 65% d-accuracy, all other methods being close to chance.” (p. 883)

– Most of the adjacencies for Patel’s τ are false.

Page 14: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

14

Simulation conditions (see handout)…

Page 15: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

15

SIMULATION 2 (10 variables, 11 edges)

Page 16: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

16

Simulation 4 (50 variables 61 edges)

Page 17: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

17

Simulation 7: 250 minutes, 5 variables

Page 18: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

18

Simulation 8: Shared Inputs

Page 19: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

19

Simulation 14: 5-Cycle

Page 20: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

20

Simulation 15: Stronger Connections

Page 21: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

21

Simulation 16: More Connections

Page 22: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

22

Simulation 22: Nonstationary Connection Strengths

Page 23: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

23

Simulation 24: One Strong External Input

Page 24: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

24

Page 25: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

25

Page 26: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

26

Take Away Conclusion?

• Nothing works!• Methods that get adjacencies (90%) cannot get

directions of influence.• Methods that get directions (60% - 70%) for

normal session lengths cannot tell true adjacencies from false adjacencies.

• Even with unrealistically long sessions (4 hours), the best method gets 90% accuracy for directions but finds very few adjacencies.

Page 27: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

27

Idea…

• If we could:– Increase sample size (effectively) by using data from

multiple subjects– Focus on a method with strong adjacencies– Combine this with a method with strong orientations

• We may be able to do better (Ramsey, Hanson and Glymour, NeuroImage)– This is the strategy of the PC Lingam algorithm of

Hoyer and several of us, though there are other ways to pursue the same strategy.

Page 28: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

28

Reminder: If noises are non-Gaussian, we can learn more than a pattern.

Linear Models, Covariance Data,Pattern/CPDAG

Linear Models, Non-Gaussian Noises (LiNG),

Directed Graph

(1) (2)

Page 29: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

29

Are noises for FMRI models non-Gaussian?• Yes. This is controversial but shouldn’t be.– For word/pseudoword data of Xue and Poldrack (Task 3),

kurtosis ranges up to 39.3 for residuals.• There is a view in the literature that noises are distributed

(empirically) as Gamma—say, with shape 19 and scale 20.

Page 30: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

30

Are connection functions linear for FMRI data?

• You tell me:

• I’ve not done a thorough survey of studies.

Page 31: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

31

Coefficients?• One expects them to be positive.– Empirically, in linear models of fMRI data, there are very

few negative coefficients (1 in 200, say).– They’re only slightly negative if so.– This is consistent with negative coefficient occurring due to

small sample regression estimation errors.• For the most part, need to be less than 1.– Brain activations are cyclic and evolving over time.– Empirically, in linear models of fMRI, most coefficients are

less than 1. To the extent that they’re greater than 1, one suspects nonlinearity.

Page 32: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

32

The IMaGES algorithm• Adaptation for multiple subjects of GES, a Bayes net

method tested by Smith, et al.• Iterative model construction using Bayesian scores

separately on each subject at each step; edge with best average score added.

• Tolerates ROIs missing in various subjects.• Seeks feed forward structure only.• Finds adjacencies between variables with latent

common causes.• Forces sparsity by penalized BIC score to avoid

triangulated variables (see Measurement Complexity)

Page 33: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

33

IMaGES/LOFS

• Smith (2011): “Future work might look to optimize the use of higher-order statistics specifically for the scenario of estimating directionality from fMRI data.”

• LiNGAM orients edges by non-Normality of higher moments of the distributions of adjacent variables.

• LOFS uses the IMaGES adjacencies, and the LiNGAM idea for directing edges (with a different score for non-Normality, and without independent components).

• Unlike IMaGES, LOFS can find cycles.• LOFS (from our paper) is R1 and/or R2…

Page 34: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

34

Procedure R1(S)

You don’t have to read these—I’ll describe them!• G <- empty graph over variables of S• For each variable V– Find the combination C of adj(V, S) that maximizes

NG(eV|C).– For each W in C• Add W V to G

• Return G

Page 35: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

35

Procedure R2(S)• G <- empty graph over variables of S• For each pair of variables X, Y

– Scores<-empty– For each combination of adjacents C for X and Y

• If NG(eX|Y) < NG(X) & NG(eY|X) > NG(Y)– score <- NG(X) + NG(eY|X)– Add <X->Y, score> to Scores

• If NG(eX|Y) > NG(X) & NG(eY|X) < NG(Y)– score <- NG(eX|Y) + NG(Y)– Add <X<-Y, score> to Scores

– If Scores is empty• Add X—Y to G.

– Else• Add to G the edge in Scores with the highest score.

• Return G

Page 36: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

36

Non-Gaussianity Scores

• Log Cosh – used in ICA • Exp = -e^(-X^2/2) – used in ICA• Kurtosis – ICA (one of the first tried, not great)• Mean absolute – PC LiNGAM• E(e^X) – Cumulant arithmetic

= e^(κ1(X) + 1/(2!) κ2(X) + 1/(3!) κ3(X) + …)

• Anderson Darling A^2 – LOFS – Empirical Distribution Function (EDF) score with heavy

weighting on the tails.– We’re using this one!

Page 37: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

37

Mixing Residuals

• We are assuming that residuals for ROIs from different subjects are drawn from the same population, so that they can be mixed.

• Sometimes we center residuals from different subjects before mixing, sometimes not.

• For Smith study, doesn’t matter—the data is already centered!

Page 38: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

38

Precision and Recall

• Precision = True positives / all positives– What fraction of the guys you found were correct?

• Recall = True positives / all true guys– What fraction of the correct guys did you find?

Page 39: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

39

Page 40: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

40

Page 41: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

41

Page 42: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

42

Page 43: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

43

Page 44: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

44

LiNG, KPC

• It has been suggested that LiNG (cyclic version of LiNGAM) be tried on this type of data.– LiNG typically does not come back with an answer, or if it

does, it does not find a stable graph.– Like LiNGAM, cannot be scaled up to multiple subjects.

• KPC?– I’ve applied KPC to the first Smith simulation (5 variables,

single subject).– Results have been much less accurate than IMaGES/LOFS.– On larger simulations it doesn’t come back.– Cannot be scaled up to multiple subjects (?).

Page 45: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

45

Some Further Problems

• Discovering nearly canceling 2 cycles is hard (but we will try anyway…)

• Identifying latent latents for acyclic models• Reliability of search may be worse with event

designs than with block designs• Subjects that differ in causal structures will

yield poor results for multi-subject methods.

Page 46: Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

46

• S. M. Smith, K L. Miller, G. Salimi-Khorshidi, M. Webster, C. F. Beckmann, T. E. Nichols, J. D. Ramsey, M. W. Woolrich (2011), Network modelling methods for fMRI, NeuroImage.

• J.D. Ramsey, S.J. Hanson, C. Hanson, Y.O. Halchenko, R.A. Poldrack, and C. Glymour(2010), Six Problems for causal inference from fMRI, NeuroImage.

• J.D. Ramsey, S.J. Hanson, C. Glymour. Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage.

• G. Xue, Poldrack, R., 2007. The neural substrates of visual perceptual learning of words: implications for the visual word form area hypothesis. J. Cogn. Neurosci.

• Thanks to the James S. McDonnell Foundation.

Thanks!