pspm preliminaries: retrodictive validity & why do we need...
TRANSCRIPT
PsPM preliminaries: retrodictive validity & why do we need this?
Dominik R BachWellcome Centre for Human Neuroimaging & Max Planck UCL Centre for Computational Psychiatry and Ageing, University College London
Clinical Research Priority Program "Synapse & Trauma" & Department of Psychiatry, Psychotherapy, and Psychosomatics, University of Zurich
06.04.2020
@bachlab_cog
WELLCOME CENTRE FOR HUMAN NEUROIMAGINGMAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
Threat learning as preclinical model
Post-traumatic stress disorder
Specific phobias
10 different conditioned responses in human literaturesANS measures: skin conductance, pupil dilation, bradycardia, respiration amplitudeMotor behaviour: modulation of startle eye blink, gaze patterns, limb withdrawalCognitive measures: reaction times in detection tasks, modulation of instrumental behaviour (PIT)Meta-cognition: reported contingency
Lesion studies: (macroscopically) different neural circuits for learning [1]Computational studies: possibly different learning algorithms/quantities [1]Methodological studies: different signal-to-noise ratio [2-4]
Measuring fear learning
Measure d [4]SCR peak scoring 0.44SCR model-based 0.75HPR model-based 0.97RAR model-based 0.61PSR model-based 0.82SEBR peak scoring 1.00SEBR model-based 1.17
[1] Ojala & Bach (pre-print), [2] Bach & Friston (2013) Psychophysiology, [3] Bach et al. (2018) Psychophysiology [4] Bach & Melinscak (2020) Beh Res Ther
Data pre-processing not standardised:> 15 ways of indexing ‚fear extinction‘ [1]> 20 ways of excluding ‚non-learners‘ [2]> 10 ways of excluding outlier reaction times [3]
Small choices dramatically affect conclusions:Multiverse analysis: 210 plausible alternatives to one data processing pipeline, 6%-50% of all options lead to the reported significant outcomes [4] Crowdsourcing data analysis: „Are soccer referees more likely to give red cards to dark-skin-toned players than to light-skin-toned players?“. 29 teams, 20 significant results, estimated odds ratio: 0.89-2.23 [5]
Flexible data analysis massively increases false positives:Simulations: Common data processing and analysis practices („follow the data“) lead to 80% probability of a „trend-level“ result, and 60% probability of a „significant“ result [3]Evolutionary modelling: Problematic data analysis practice is naturally selected (through progeny and selection for high output rates) despite incentives to not „cheat“ [6]
Pre-registration may not solve the problem:Regulates false-positive rate but conclusions are still arbitrary
Data pre-processing choices
[1] Lonsdorf et al. (2019), [2] Lonsdorf et al. (2020), [3] Simmons et al. (2011), [4] Steegen et al. (2016) [5] Silberzahn et al. (2018), [6] Smaldino & McElreath (2016)
Evaluating measurement methods:Latent variables, classic psychometrics, retroactive validity
Calibrating measurement methods:Optimised measurement, experimental design, power analysis
Measurement models in psychophysiology:Heuristic and formal models
Psychophysiological modelling:General concepts & formalism, development, application
Topics
CS+/CS- US Memory SCR difference between CS+/CS-?
CS+/CS- US Memory Memory difference between CS+/CS-?
Forward perspective: does aversive memory influence SCR?
Inverse perspective: does my procedure establish aversive memory (measured by SCR)?
Forward and inverse perspective
Latent variables and true scores
Latent attribute Observable
Latent variables and true scores
Latent attribute Observable
Latent variables and true scores
Latent attribute Observable
Observable
Observable
Measurement model
Latent attribute Observable
t y := ̂t = f(x) xClassical true score theory
Heuristic models
Formal measurement modelsitem-response theory (Embretson & Reise, 2013)
expected utility models in behavioural economics (Camerer, 1995)drift-diffusion models in decision psychology (Forstmann, Ratcliff, & Wagenmakers, 2016),
psychophysiological models (Bach, Castegnetti, et al., 2018; Bach & Friston, 2013)associative learning models (Mathys, Daunizeau, Friston, & Stephan, 2011)
Generic formalism: structural equation models (Bollen, 1989; Muthén, 2002)
CS memory := SCRpeak – SCRtrough
x = t + ϵ; y = x
Construct validity: the nomological net
t
y
Latent attribute 1
Latent attribute 2
More stable attribute Observable
„Concurrent validity“
„Predictive validity“
?
Problems:1. Relations are not quantitatively defined2. No theory how to interpret small changes
in several of these relationships.
Cronbach & Meehl (1955), Campbell & Fiske (1959), van der Maas et al. (2011), Eid et al. (2016)
Reliability
Reliability assesses precision, not accuracy
Example:IQ := length(index finger)
Cronbach & Meehl (1955); Brandmaier et al. (2018)
Retrodictive validity
t
y
More stable attribute
Latent attribute 1
Latent attribute 2
Observable
„Concurrent validity“
„Predictive validity“
?
Retrodictive validity
t
y
More stable attribute
?
Retrodictive validity
ρt,y := Cor(t, y)t
y
Experimental manipulation: intended
values e
ρe,y := Cor(e, y)
AccuracyFor different t, high correlation between t and averaged y PrecisionFor fixed t, high correlation between t and individual values of y
Under variation of t, Cor(t, y) measures joint accuracy and precision.
Evaluation of a measurement model
t yρt,y := Cor(t, y)
Retrodictive validity
ρt,y := Cor(t, y)
t
yρe,y := Cor(e, y)
e
Experimentalaberration
ω Measurementerrorϵ
0 2 4Intended score e
0
0.5
1
1.5Tr
ue s
core
t
0.5 1 1.5True score t
0
0.5
1
1.5
2
Est.
scor
e y
0 2 4Intended score e
0
0.5
1
1.5
2
Est.
scor
e y
0 2 4Intended score e
0
0.5
1
1.5
True
sco
re t
0.5 1 1.5True score t
0
0.5
1
1.5
2Es
t. sc
ore
y
0 2 4Intended score e
0
0.5
1
1.5
2
Est.
scor
e y
0 2 4Intended score e
0
0.5
1
1.5
2
True
sco
re t
0 1 2True score t
0
1
2
3
4
5
Est.
scor
e y
0 2 4Intended score e
0
1
2
3
4
5
Est.
scor
e y
0 2 4Intended score e
-1
0
1
2
3
4
Est.
scor
e y
0 2 4Intended score e
-1
0
1
2
3
4
Est.
scor
e y
0 2 4Intended score e
-1
0
1
2
3
4Es
t. sc
ore
y
Retrodictive validity
ρt,y := Cor(t, y)
t
yρe,y := Cor(e, y)
e
Experimentalaberration
ω Measurementerrorϵ
Bach, Melinscak, Fleming, Voelkle (pre-print)
17
?),( says something about ?&,(, the correlation between true and estimated scores. We will
see that this depends on ?*,+ = Cor'*%!" , 8%!"(, the correlation between the experimental
aberration, and the total measurement error.
Lemma.
(1) For two vectors of estimated scores, y and #D, if ?),( > ?D),( and?*,+ = ?D*,+ = 0, then
?&,( > ?D&,(.
(2) Let the Frobenius norm E'!!"(E = 1 and ∑ !!"" = 0. If ?*,+ = 0, then ?&,( =
G1 + ‖*%‖,?),(.
(3) If ?),( > ?D),( and ?*,+ ≠0 and/or ?D*,+ ≠ 0, then at least one of the following statements
is true:
(a) ?&,( > ?D&,(and?*,+ > − ‖+!‖.*!"./‖*!‖
;
(b) ?*,+ < ?D*,+ and?*,+ > − ‖+!‖.*!"./‖*!‖
;
(c) ?*,+ ≤ − ‖+!‖.*!"./‖*!‖
.
A geometrical proof is given in the appendix.
In the following, we explain this Lemma and give an intuition about how it can be used. In
general it is reasonable to assume ?*,+ = 0, i.e. that the correlation between the
experimental aberration and measurement errors is zero. In this case, increasing ?),( also
increases ?&,(. This is a standard case and will apply in most circumstances. Otherwise, if ?*,+
is positive, or the measurement error is large compared to the experimental aberration,
Evaluating measurement methods:Latent variables, classic psychometrics, retroactive validity
Calibrating measurement methods:Optimised measurement, experimental design, power analysis
Measurement models in psychophysiology:Heuristic and formal models
Psychophysiological modelling:General concepts & formalism, development, application
Topics
Calibration experiment
CS+ US Memory
CS- No US Memory
Memory
Memory
Calibration experiment with intended values of dependent
psychological variable
Correlate intended values e (from design) with estimated values y (from
observable). Better method yields higher retrodictive validity.
Unless aberration and error are correlated, higher retrodictive validity
means higher correlation with true score, and thus jointly higher
accuracy and precisionDerive estimate y of true score t from observable
(e.g. heuristic processing or measurement model).
Remark: if two measures have exactly the same retrodictive validity, the one with higher precision will have higher reliability.
Methods evaluation
CS+ US Memory
CS- No US Memory
Memory
MemoryDerive estimate y of true score t from observable
(e.g. heuristic processing or measurement model).
• Compare heuristic methods • Compare measurement models • Compare observables (as long as
they measure the same thing) • Machine-learning approach to
measurement models (does not generalise - yet)
Calibration experiment with intended values of dependent
psychological variable
Power analysis
• Intervention effect size: aberration in paradigm, measurement error, variability of the intervention
• Maximum effect size when intervention variability is zero
• Best-case power analysis: sample size often much higher than what is standard in the field
Bach, Tzovara, Vunder (2017) Molecular Psychiatry
Keep error constant - minimise aberration
• Evaluate experimental designs: how well can the psychological variable be measured
• Compare lab standards: how well can psychological variable be measured in my lab
ρt,y := Cor(t, y)
t
yρe,y := Cor(e, y)
e
Experimentalaberration
ω Measurementerrorϵ
Melinscak & Bach (2020) Plos Computational Biology
Evaluating measurement methods:Latent variables, classic psychometrics, retroactive validity
Calibrating measurement methods:Optimised measurement, experimental design, power analysis
Measurement models in psychophysiology:Heuristic and formal models
Psychophysiological modelling:General concepts & formalism, development, application
Topics
CS+/CS- US Memory SCR difference between CS+/CS-?
CS+/CS- US Memory Memory difference between CS+/CS-?
Forward perspective: does aversive memory influence SCR?
Time-bin wise analysis
Inverse perspective: does my procedure establish aversive memory (measured by SCR)?
Condense data time series into one estimate
Measurement from continuous observables
CS memory := SCRpeak – SCRtrough
Heuristic analysis: selection of data features based on informal model.Problems: (1) information loss (2) usually not evaluated
CS+/CS- US Memory Memory difference between CS+/CS-?
Heuristic methods
CS+/CS- US Memory Memory difference between CS+/CS-?
Memory
PsPM: estimates the most likely (ML) psychological variable, given the entire data time series and a standard response model.
Psychophysiological modelling
Evaluating measurement methods:Latent variables, classic psychometrics, retroactive validity
Calibrating measurement methods:Optimised measurement, experimental design, power analysis
Measurement models in psychophysiology:Heuristic and formal models
Psychophysiological modelling:General concepts & formalism, development, application
Topics
Psychological variable
Neural activity
Physiological signal
Neural model
Peripheral LTI model
Examples: Instantaneous impulse with constant latency Short Gaussian impulse
Basic formalism
Bach & Friston (2013), Bach et al. (2018)
Psychological variable Neural activity Physiological
signal
Peripheral LTI model
Ledalab, cvxEDA: • model-based estimation of neural activity• heuristic method to relate to psychological variable
Evaluation for evoked SCR• Ledalab not systematically better than peak-scoring• PsPM decisively better than Ledalab or peak-scoring
Hybrid approaches
Alexander et al. (2005), Benedek & Kaernbach (2010ab), Greco et al. (2016), Bach (2014), Green et al. (2014)
Establish suitable forward model (PsPM)• Which psychological variable impacts on peripheral measure?• Formalise forward model in mathematical form
Develop inversion algorithm• Estimates most likely value of the psychological variable, given data & PsPM• Usually GLM, sometimes non-linear inversion using Variational Bayes
Evaluate and optimise PsPM & inversion• Empirically determine retrodictive validity• Optimise method to yield empirically minimal variance estimator of
psychological variable
CS+/CS- US Memory SCR difference between CS+/CS-?
CS+/CS- US Memory Memory difference between CS+/CS-?
CS+ US Memory
CS- No US Memory
Memory
Memory
PsPM development: summary
Bach & Friston (2013), Bach et al. (2018)
Estimate parameters for each subject, per condition or per trial, then test parameters at the group level• Conceptually similar to standard ("operational") analysis for SCR, RT, ...• Same approach used in many fMRI packages (e.g. SPM)• Statistics are done on the estimated psychological variable• Noise in the original data is discarded and not used for statistical tests
Hierarchical parameter estimation• It would in principle be possible to estimate parameters on the group level
and test against explained variance in the data (as in some fMRI packages)• However, there are conceptual and statistical problems associated with this
approach: e.g. higher model complexity required, degrees of freedom reduced due to autocorrelations
Hierarchical summary statistics approach
Attentional variables <- pupil responses• de Gee et al., 2017; de Gee, Knapen, & Donner, 2014
Fear memory <- SCR, SEBR, PSR, HPR• Bach, Weiskopf, & Dolan, 2011; Bulganin, Bach, & Wittmann, 2014; Tzovara, Korn,
& Bach, 2018; Bach, Tzovara, & Vunder, 2018; Staib & Bach 2018; Staib et al. (2019); Xia et al., 2019; Bach et al. 2019
Arousal during decision making <- SCR• Alvarez, et al., 2015; Bach, 2015a; de Berker, et al., 2016; Nicolle, Fleming,
Bach, Driver & Dolan, 2011; Talmi, Dayan, Kiebel, Frith, & Dolan, 2009
Bach et al. (2018) Psychophysiology
Arousal during perception <- SCR• Bach, Seifritz, & Dolan, 2015; Hayes, et al., 2013; Koban, Kusko, & Wager, 2018;
Koban & Wager, 2016; Sulzer, et al., 2013
Arousal during rest <- SCR• Fan et al. 2012
Application examples
Which psychological variables can be inferred?• Specific vs. unspecific responses• Experimental design• Convergent vs. divergent measures• A priori definition of contrasts to test
Experimental requirements• Trial order & timing (design optimisation)• Number of participants (power analysis)
How is the model structured?• By-trial vs. by-condition• Condition estimates interpretable, or only contrasts?• Meaning of different parameters of the neural model
CS+/CS- US Memory Memory difference between CS+/CS-?
Application tips
PsPM file:Data time-series
(Marker time stamps)Recorded file
Analogue data recording Digitisation
Preprocessing:Trim unnecessary data
Detect missing fixation and exclude/(correct) pupil sizeHeart beat detection & interpolation
Respiration cycle detection & interpolationStartle eyeblink EMG filtering and rectification
Import
Model inversion:GLM, non-linear models
1st (participant) level model files
Group-level model (t-test,
ANOVA, LME, ...)
If possible, only anti-aliasing filter
High sampling rate if no anti-aliasing
filter
Each step usually generates a new file with a prefix
(SPM-style)
2nd-level t-test
Export parameters to SPSS, R, ...
2nd (group) level model file
All necessary filters applied on-the-fly
during model inversion
PsPM pipeline
Psychological variable Neural activity Physiological
signal
Neural model
Peripheral LTI model
CS+/CS- US Memory Memory difference between CS+/CS-?
The "best possible" approximation to the true psychological variable.
Summary
Lecture 2: 09.04.2020
Lecture 3: 16.04.2020 Lecture 4: 23.04.2020Lecture 5: 30.04.2020 Lecture 6: 07.05.2020Lecture 6: 07.05.2020
Lecture 7: 14.05.2020
Thank you!Project teamGiuseppe CastegnettiSamuel GersterSaurabh KhemkaChristoph KornFilip Melinčšak Karita OjalaPhilipp PaulusMatthias StaibAthina Tzovara Yanfang Xia
ProgrammersLaure CiernikGabriel GräniTobias MoserEshref ÖzdemirIvan RojkovLinus Rüttimann
Project collaboratorsJean DaunizeauRay DolanMikael ElamGuillaume FlandinSteve FlemingKarl FristonBarbara NamerManuel Voelkle
Funders