pspm preliminaries: retrodictive validity & why do we need...

[email protected]

PsPM preliminaries: retrodictive validity & why do we need this?

Dominik R BachWellcome Centre for Human Neuroimaging & Max Planck UCL Centre for Computational Psychiatry and Ageing, University College London

Clinical Research Priority Program "Synapse & Trauma" & Department of Psychiatry, Psychotherapy, and Psychosomatics, University of Zurich

06.04.2020

[email protected]

@bachlab_cog

WELLCOME CENTRE FOR HUMAN NEUROIMAGINGMAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH

mailto:[email protected]?subject=

[email protected]

Threat learning as preclinical model

Post-traumatic stress disorder

Specific phobias


[email protected]

10 different conditioned responses in human literaturesANS measures: skin conductance, pupil dilation, bradycardia, respiration amplitudeMotor behaviour: modulation of startle eye blink, gaze patterns, limb withdrawalCognitive measures: reaction times in detection tasks, modulation of instrumental behaviour (PIT)Meta-cognition: reported contingency

Lesion studies: (macroscopically) different neural circuits for learning [1]Computational studies: possibly different learning algorithms/quantities [1]Methodological studies: different signal-to-noise ratio [2-4]

Measuring fear learning

Measure d [4]SCR peak scoring 0.44SCR model-based 0.75HPR model-based 0.97RAR model-based 0.61PSR model-based 0.82SEBR peak scoring 1.00SEBR model-based 1.17

[1] Ojala & Bach (pre-print), [2] Bach & Friston (2013) Psychophysiology, [3] Bach et al. (2018) Psychophysiology [4] Bach & Melinscak (2020) Beh Res Ther


[email protected]

Data pre-processing not standardised:> 15 ways of indexing ‚fear extinction‘ [1]> 20 ways of excluding ‚non-learners‘ [2]> 10 ways of excluding outlier reaction times [3]

Small choices dramatically affect conclusions:Multiverse analysis: 210 plausible alternatives to one data processing pipeline, 6%-50% of all options lead to the reported significant outcomes [4] Crowdsourcing data analysis: „Are soccer referees more likely to give red cards to dark-skin-toned players than to light-skin-toned players?“. 29 teams, 20 significant results, estimated odds ratio: 0.89-2.23 [5]

Flexible data analysis massively increases false positives:Simulations: Common data processing and analysis practices („follow the data“) lead to 80% probability of a „trend-level“ result, and 60% probability of a „significant“ result [3]Evolutionary modelling: Problematic data analysis practice is naturally selected (through progeny and selection for high output rates) despite incentives to not „cheat“ [6]

Pre-registration may not solve the problem:Regulates false-positive rate but conclusions are still arbitrary

Data pre-processing choices

[1] Lonsdorf et al. (2019), [2] Lonsdorf et al. (2020), [3] Simmons et al. (2011), [4] Steegen et al. (2016) [5] Silberzahn et al. (2018), [6] Smaldino & McElreath (2016)


[email protected]

Evaluating measurement methods:Latent variables, classic psychometrics, retroactive validity

Calibrating measurement methods:Optimised measurement, experimental design, power analysis

Measurement models in psychophysiology:Heuristic and formal models

Psychophysiological modelling:General concepts & formalism, development, application

Topics


[email protected]

CS+/CS- US Memory SCR difference between CS+/CS-?

CS+/CS- US Memory Memory difference between CS+/CS-?

Forward perspective: does aversive memory influence SCR?

Inverse perspective: does my procedure establish aversive memory (measured by SCR)?

Forward and inverse perspective


[email protected]

Latent variables and true scores

Latent attribute Observable


[email protected]

Latent variables and true scores


Observable

Observable


[email protected]

Measurement model


t y := ̂t = f(x) xClassical true score theory

Heuristic models

Formal measurement modelsitem-response theory (Embretson & Reise, 2013)

expected utility models in behavioural economics (Camerer, 1995)drift-diffusion models in decision psychology (Forstmann, Ratcliff, & Wagenmakers, 2016),

psychophysiological models (Bach, Castegnetti, et al., 2018; Bach & Friston, 2013)associative learning models (Mathys, Daunizeau, Friston, & Stephan, 2011)

Generic formalism: structural equation models (Bollen, 1989; Muthén, 2002)

CS memory := SCRpeak – SCRtrough

x = t + ϵ; y = x


[email protected]

Evaluation of a measurement model

t y


[email protected]

Construct validity: the nomological net

t

y

Latent attribute 1

Latent attribute 2

More stable attribute Observable

„Concurrent validity“

„Predictive validity“

?

Problems:1. Relations are not quantitatively defined2. No theory how to interpret small changes

in several of these relationships.

Cronbach & Meehl (1955), Campbell & Fiske (1959), van der Maas et al. (2011), Eid et al. (2016)


[email protected]

Reliability

Reliability assesses precision, not accuracy

Example:IQ := length(index finger)

Cronbach & Meehl (1955); Brandmaier et al. (2018)


[email protected]

Retrodictive validity

t

y

More stable attribute

Latent attribute 1

Latent attribute 2

Observable

„Concurrent validity“

„Predictive validity“

?


[email protected]


t

y

More stable attribute

?


[email protected]


ρt,y := Cor(t, y)t

y

Experimental manipulation: intended

values e

ρe,y := Cor(e, y)


[email protected]

AccuracyFor different t, high correlation between t and averaged y PrecisionFor fixed t, high correlation between t and individual values of y

Under variation of t, Cor(t, y) measures joint accuracy and precision.

Evaluation of a measurement model

t yρt,y := Cor(t, y)


[email protected]


ρt,y := Cor(t, y)

t

yρe,y := Cor(e, y)

e

Experimentalaberration

ω Measurementerrorϵ

0 2 4Intended score e

0

0.5

1

1.5Tr

ue s

core

t

0.5 1 1.5True score t

0

0.5

1

1.5

2

Est.

scor

e y


0

0.5

1

1.5

2

Est.

scor

e y


0

0.5

1

1.5

True

sco

re t

0.5 1 1.5True score t

0

0.5

1

1.5

2Es

t. sc

ore

y


0

0.5

1

1.5

2

Est.

scor

e y


0

0.5

1

1.5

2

True

sco

re t

0 1 2True score t

0

1

2

3

4

5

Est.

scor

e y


0

1

2

3

4

5

Est.

scor

e y


-1

0

1

2

3

4

Est.

scor

e y


-1

0

1

2

3

4

Est.

scor

e y


-1

0

1

2

3

4Es

t. sc

ore

y


[email protected]


ρt,y := Cor(t, y)

t

yρe,y := Cor(e, y)

e



Bach, Melinscak, Fleming, Voelkle (pre-print)

17

?),( says something about ?&,(, the correlation between true and estimated scores. We will

see that this depends on ?*,+ = Cor'*%!" , 8%!"(, the correlation between the experimental

aberration, and the total measurement error.

Lemma.

(1) For two vectors of estimated scores, y and #D, if ?),( > ?D),( and?*,+ = ?D*,+ = 0, then

?&,( > ?D&,(.

(2) Let the Frobenius norm E'!!"(E = 1 and ∑ !!"" = 0. If ?*,+ = 0, then ?&,( =

G1 + ‖*%‖,?),(.

(3) If ?),( > ?D),( and ?*,+ ≠0 and/or ?D*,+ ≠ 0, then at least one of the following statements

is true:

(a) ?&,( > ?D&,(and?*,+ > − ‖+!‖.*!"./‖*!‖

;

(b) ?*,+ < ?D*,+ and?*,+ > − ‖+!‖.*!"./‖*!‖

;

(c) ?*,+ ≤ − ‖+!‖.*!"./‖*!‖

.

A geometrical proof is given in the appendix.

In the following, we explain this Lemma and give an intuition about how it can be used. In

general it is reasonable to assume ?*,+ = 0, i.e. that the correlation between the

experimental aberration and measurement errors is zero. In this case, increasing ?),( also

increases ?&,(. This is a standard case and will apply in most circumstances. Otherwise, if ?*,+

is positive, or the measurement error is large compared to the experimental aberration,


[email protected]





Topics


[email protected]

Calibration experiment

CS+ US Memory

CS- No US Memory

Memory

Memory

Calibration experiment with intended values of dependent

psychological variable

Correlate intended values e (from design) with estimated values y (from

observable). Better method yields higher retrodictive validity.

Unless aberration and error are correlated, higher retrodictive validity

means higher correlation with true score, and thus jointly higher

accuracy and precisionDerive estimate y of true score t from observable

(e.g. heuristic processing or measurement model).

Remark: if two measures have exactly the same retrodictive validity, the one with higher precision will have higher reliability.


[email protected]

Methods evaluation

CS+ US Memory

CS- No US Memory

Memory

MemoryDerive estimate y of true score t from observable

(e.g. heuristic processing or measurement model).

• Compare heuristic methods • Compare measurement models • Compare observables (as long as

they measure the same thing) • Machine-learning approach to

measurement models (does not generalise - yet)

Calibration experiment with intended values of dependent



[email protected]

Calibration data and iteration


[email protected]

Power analysis

• Intervention effect size: aberration in paradigm, measurement error, variability of the intervention

• Maximum effect size when intervention variability is zero

• Best-case power analysis: sample size often much higher than what is standard in the field

Bach, Tzovara, Vunder (2017) Molecular Psychiatry


[email protected]

Keep error constant - minimise aberration

• Evaluate experimental designs: how well can the psychological variable be measured

• Compare lab standards: how well can psychological variable be measured in my lab

ρt,y := Cor(t, y)

t

yρe,y := Cor(e, y)

e



Melinscak & Bach (2020) Plos Computational Biology


[email protected]





Topics


[email protected]



Forward perspective: does aversive memory influence SCR?

Time-bin wise analysis

Inverse perspective: does my procedure establish aversive memory (measured by SCR)?

Condense data time series into one estimate

Measurement from continuous observables


[email protected]

CS memory := SCRpeak – SCRtrough

Heuristic analysis: selection of data features based on informal model.Problems: (1) information loss (2) usually not evaluated


Heuristic methods


[email protected]


Memory

PsPM: estimates the most likely (ML) psychological variable, given the entire data time series and a standard response model.

Psychophysiological modelling


[email protected]





Topics


[email protected]

Psychological variable

Neural activity

Physiological signal

Neural model

Peripheral LTI model

Examples: Instantaneous impulse with constant latency Short Gaussian impulse

Basic formalism

Bach & Friston (2013), Bach et al. (2018)


[email protected]

Psychological variable Neural activity Physiological

signal


Ledalab, cvxEDA: • model-based estimation of neural activity• heuristic method to relate to psychological variable

Evaluation for evoked SCR• Ledalab not systematically better than peak-scoring• PsPM decisively better than Ledalab or peak-scoring

Hybrid approaches

Alexander et al. (2005), Benedek & Kaernbach (2010ab), Greco et al. (2016), Bach (2014), Green et al. (2014)


[email protected]

Establish suitable forward model (PsPM)• Which psychological variable impacts on peripheral measure?• Formalise forward model in mathematical form

Develop inversion algorithm• Estimates most likely value of the psychological variable, given data & PsPM• Usually GLM, sometimes non-linear inversion using Variational Bayes

Evaluate and optimise PsPM & inversion• Empirically determine retrodictive validity• Optimise method to yield empirically minimal variance estimator of




CS+ US Memory

CS- No US Memory

Memory

Memory

PsPM development: summary

Bach & Friston (2013), Bach et al. (2018)


[email protected]

Estimate parameters for each subject, per condition or per trial, then test parameters at the group level• Conceptually similar to standard ("operational") analysis for SCR, RT, ...• Same approach used in many fMRI packages (e.g. SPM)• Statistics are done on the estimated psychological variable• Noise in the original data is discarded and not used for statistical tests

Hierarchical parameter estimation• It would in principle be possible to estimate parameters on the group level

and test against explained variance in the data (as in some fMRI packages)• However, there are conceptual and statistical problems associated with this

approach: e.g. higher model complexity required, degrees of freedom reduced due to autocorrelations

Hierarchical summary statistics approach


[email protected]

Attentional variables <- pupil responses• de Gee et al., 2017; de Gee, Knapen, & Donner, 2014

Fear memory <- SCR, SEBR, PSR, HPR• Bach, Weiskopf, & Dolan, 2011; Bulganin, Bach, & Wittmann, 2014; Tzovara, Korn,

& Bach, 2018; Bach, Tzovara, & Vunder, 2018; Staib & Bach 2018; Staib et al. (2019); Xia et al., 2019; Bach et al. 2019

Arousal during decision making <- SCR• Alvarez, et al., 2015; Bach, 2015a; de Berker, et al., 2016; Nicolle, Fleming,

Bach, Driver & Dolan, 2011; Talmi, Dayan, Kiebel, Frith, & Dolan, 2009

Bach et al. (2018) Psychophysiology

Arousal during perception <- SCR• Bach, Seifritz, & Dolan, 2015; Hayes, et al., 2013; Koban, Kusko, & Wager, 2018;

Koban & Wager, 2016; Sulzer, et al., 2013

Arousal during rest <- SCR• Fan et al. 2012

Application examples


[email protected]

Which psychological variables can be inferred?• Specific vs. unspecific responses• Experimental design• Convergent vs. divergent measures• A priori definition of contrasts to test

Experimental requirements• Trial order & timing (design optimisation)• Number of participants (power analysis)

How is the model structured?• By-trial vs. by-condition• Condition estimates interpretable, or only contrasts?• Meaning of different parameters of the neural model


Application tips


[email protected]

PsPM file:Data time-series

(Marker time stamps)Recorded file

Analogue data recording Digitisation

Preprocessing:Trim unnecessary data

Detect missing fixation and exclude/(correct) pupil sizeHeart beat detection & interpolation

Respiration cycle detection & interpolationStartle eyeblink EMG filtering and rectification

Import

Model inversion:GLM, non-linear models

1st (participant) level model files

Group-level model (t-test,

ANOVA, LME, ...)

If possible, only anti-aliasing filter

High sampling rate if no anti-aliasing

filter

Each step usually generates a new file with a prefix

(SPM-style)

2nd-level t-test

Export parameters to SPSS, R, ...

2nd (group) level model file

All necessary filters applied on-the-fly

during model inversion

PsPM pipeline


[email protected]

Psychological variable Neural activity Physiological

signal

Neural model



The "best possible" approximation to the true psychological variable.

Summary

Lecture 2: 09.04.2020

Lecture 3: 16.04.2020 Lecture 4: 23.04.2020Lecture 5: 30.04.2020 Lecture 6: 07.05.2020Lecture 6: 07.05.2020

Lecture 7: 14.05.2020


[email protected]

Thank you!Project teamGiuseppe CastegnettiSamuel GersterSaurabh KhemkaChristoph KornFilip Melinčšak Karita OjalaPhilipp PaulusMatthias StaibAthina Tzovara Yanfang Xia

ProgrammersLaure CiernikGabriel GräniTobias MoserEshref ÖzdemirIvan RojkovLinus Rüttimann

Project collaboratorsJean DaunizeauRay DolanMikael ElamGuillaume FlandinSteve FlemingKarl FristonBarbara NamerManuel Voelkle

Funders


pspm preliminaries: retrodictive validity & why do we need...

Documents