weighted representational component models i€¦ · – rhythm x spatial sequence ... •...

Weighted Representational Component Models I

Jörn Diedrichsen

Motor Control Group Institute of Cognitive Neuroscience

University College London

Weighting of different features

f1 f2 fk . . . Simple Model

w2 f2 w1 f1 wk fk . . . Weighted component

model

Examples: •  Different aspects of objects: color, shape •  Different aspects of movements: force, sequence, timing •  Different layers of a computation vision model

Overview

Representational component modelling

•  Features or groups of features (components) can be differently weighted

•  Component weights can be estimated from the data •  Inferences can be made directly on component weights

or •  Model "t can be assessed using cross-validation

Overview

•  Covariances and Distances •  Features and representational components •  Factorial models (MANOVA) •  Linear representational models •  Nonlinear representational models •  Summary

Patter distance

dij = Gii +G jj −Gij −G ji

Pattern covariance

G = UUT

Covariances and Distances K

Con

ditio

ns

P Voxels

Pattern ui

...

Pattern u j

Distances (LDC)

di, j = ui −u j( ) ui −u j( )T= uiui

T +u ju jT − 2uiu j

T

Inner product of pattern

Gi, j = uiu jT

Covariances and Distances

Pattern covariance matrix (inner product matrix)

G = UUT Di, j = Gi,i +G j , j −Gi, j −G j ,i

Mahalanobis-distance matrix

Contains baseline information Contains no baseline information

G = − 12

HDHT

H = Ik −1K1kT

Assuming baseline in the mean of all patterns, columns and row means of G are zero

(centering matrix)

uiuiT +u ju j

T − 2uiu jT

uiuiT

u ju jT

uiu jT

Overview


Weighting of different features

f1 f2 fk . . . Simple Model

w2 f2 w1 f1 wk fk . . . Weighted component

model

Examples: •  Different aspects of objects: color, shape •  Different aspects of movements: force, sequence, timing •  Different layers of a computation vision model

Features and representational components

f1w1Feature 1

Feat

ure

2

Feature 1

Feat

ure

2

f1w1

Pattern for each condition is caused by different features, each associated with a feature pattern.

U K

Con

ditio

ns

P Voxels

Conditions patterns

=

K C

ondi

tions

Feature pattern Feature

fh

P Voxels

wh

h=1

H

∑

Features and representational components Pattern for each condition is caused by different features, each associated with a feature pattern.

The component weight is the variance or power of the feature pattern. ωh

= whwhT( )

ωh

fhfh

T( )Gh

h=1

H

∑

G = UUT

Covariance matrix

= fhwhh=1

H

∑ whT fh

T

h=1

H

∑

= fhwhh=1

H

∑ whT fh

T w iw j

T = 0

Assuming independence of feature patterns:

Component weight

Component matrix (G)

G = ωhGh

h=1

H

∑Covariance matrix

D = ωhDh

h=1

H

∑

Component weight


Distance matrix

= U K

Con

ditio

ns

P Voxels

K C

ondi

tions

Feature pattern Feature Conditions patterns

fh

P Voxels

wh

h=1

H

∑

Features and representational components Most often we do not weight single features, but groups of features: representation components

The component weight is the variance or power of the feature pattern. ωh

= FhWh WhTFh

T

h=1

H

∑h=1

H

∑

= FhWhWhTFh

T

h=1

H

∑

= ωh FhVhFhT( )

Gh

h=1

H

∑

Component weight


G = ωhGh

h=1

H

∑ D = ωhDh

h=1

H

∑

Component weight


WiWjT = 0

Assuming independence of feature patterns across components

Covariance matrix Distance matrix

= U

K C

ondi

tions

P Voxels

K C

ondi

tions

Feature pattern Feature Conditions patterns

Fh Wh

h=1

H

∑Q Features

Q F

eatu

res

P Voxels

WhWhT = Vhωh

Assuming covariance of feature patterns

G = UUT

Covariance matrix


Feature vectors

Fh

Covariance component

Gh = FhVhFDistance component

Dij = Gii +G jj − 2Gij

1

1 00 1

⎡

⎣⎢

⎤

⎦⎥

1 0 ...0 1 ...... ... ...

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

11

1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

1/ 2 00 3 / 2

⎡

⎣⎢⎢

⎤

⎦⎥⎥

1 0.50.5 1

⎡

⎣⎢

⎤

⎦⎥

Vh

Feature correlation

Not unique Unique

Features and representational components: Estimation

D = ωhDh

h=1

H

∑

Vectorise

D d

How do we estimate component weights?

X = d1 d2 ...⎡

⎣⎤⎦

Build component matrix

ω = XTX( )−1XT

X+

d

Ordinary least-squares

ω

Predicted distances (Dh) M

easu

red

dist

ance

s (D

)


•  Features are variables encoded in neuronal elements •  Groups of features with similar encoding strength form a

representational component •  Features of different components are assumed to be

mutually independent •  Many feature sets can lead to the same representational

component •  Models are uniquely speci"ed via their component matrices

(representational similarity trick) •  Component weights estimate variance (or power) of

representations

Overview


Integrated vs. independent encoding

•  Now, we assumed that different components are encoded independently in the brain

•  This does not mean that they are encoded in different regions / voxel: only that their patterns are unrelated to each other

•  BUT: Can we test this?

Integrated vs. independent encoding: factorial models

•  Are two groups of features (variables) encoded independently or dependently?

•  Vary the 2 factors in a fully crossed design – Condition (see / do) x Action (3 gestures) – Rhythm x Spatial sequence – Reach directions (3) x Grasps (3) ....

•  Where is factor A encoded, where is B encoded? •  Are A and B encoded in an integrated or independent

fashion? •  Is Factor B consistently encoded across levels of factor A

(“cross-decoding”)?

Factorial representational models (MANOVA) Fa

ctor

A

Factor B

see do

grasp

pinch

grip

YRaw data

BRegression coefficients

UPattern estimate

Dissimiliarity matrix

First-level GLM

Searchlight Noise normalisation Crossvalidation

D

ω = XTX( )−1

XTvec D( )

Component design matrix

X

Fact

or A

Fa

ctor

B

Inte

ract

ion

Features

0 1

Representational components

0 2

see

do


ctor

A

Factor B

see do

grasp

pinch

grip

Fact

or A

Fa

ctor

B

Inte

ract

ion

Features Representational components

0 1 0 2

ω = XTX( )−1XT

vec D( )

XTX( )−1XT

-1/6 +1/6

Cross “decoding” Pattern consistency Allefeld et al. (2013)

see

do


ctor

A

Factor B

see do

grasp

pinch

grip

Fact

or A

Fa

ctor

B

Inte

ract

ion

Features Representational components

0 1 0 2

XTX( )−1XT

-1/6 +1/6

Cross “decoding” Pattern consistency Allefeld et al. (2013) Component weights

ω

ωA = 0.005

ωB = 0

ω I = 0.01

see

do

Factorial representational models (MANOVA)

•  Factorial models can reveal mean encoding effect and interactions

•  Component weight estimates are unbiased and can be directly tested in group analysis

•  Main effects are assess by pattern consistency across levels of the other variable (replaces cross-classi"cation)

•  Mathematically identical to approach suggested by Allefeld et al. (2013)

Factorial MANOVA designs (example)

Kornysheva et al. (2014). eLife.



4x4 Simulated

activity pattern



lateral medial medial lateral

Contralateral Ipsilateral

Overall encoding


Contralateral Ipsilateral

lateral medial medial lateral


Factorial representational models (MANOVA)

Linearity Assumption Patterns for different components overlap linearly

if the relationship between neural activity and BOLD is approximately linear

AND

if they engage independent neuronal subpopulations

if they combine linearly to determine "ring rate

Experimental conditions should be similar in overall activity

Note: mean value subtraction in analysis does not "x this!

Key insights I

D1. Pattern covariance matrices and squared Euclidean distance matrices capture the same information, but the former retain the baseline D2. A representational component (RC) is a group of representational features. D3. A representation can be modelled as weighted combination of RCs (one weight per RC). D4. Weighted combinations of RCs correspond to weighted combinations of representational distance matrices. D5. Component weights can be estimated using regression and tested directly (against zero) in group analyses.

Weighted Representational Component Models II

Jörn Diedrichsen

Motor Control Group Institute of Cognitive Neuroscience

University College London

Overview


Linear representational models

sequence consisting of chunks

At what level are sequences represented?

Yokoi et al. (in prep)


Covariance Distance

Representational component Feature



sequence consisting of chunks



•  are independent •  have equal variance •  are ~ normally distributed

η1

η2

η3

η4

i=1

4

∑ =

The use of simple regression (OLS) assumes that distances:

i. i. d.

If this is violated we have an unbiased estimator, but not the best linear unbiased estimator (BLUE)


Text

Linear representational models: variance of distances

d = δ m( )δ n( )T = δ + ε m( )( ) δ + ε n( )( )T δ ij = ui −u j

δ ij = δ ij + ε Differences between patterns are measured with noise

d = δ δ T + ε m( )δ T +δε n( )T + ε m( )ε n( )T

Squared distances are a sum of inner products, Signal with signal, signal with noise, and noise with noise

E d( ) = δδ T +ε m( )δ T +δε n( )T + ε m( )ε n( )T

0

In the expected value, the inner products containing noise drop out

var d( ) = var δδ T( )0

+σε

2δδ T +σε2δδ T +σε

2σε2P For the variance, we obtain a part

that depends on the distances, and one part that only depends on the noise. Distance dependent Constant

Distance dependent Constant var d( ) = 4

RΔ Σ + 2P

R R −1( ) Σ ΣThe covariance of p distances when doing exhaustive crossvalidation over R partitions:

Element-by-element multiplication

Σ Within-run covariances of δΔ True inner products

δ ,δ


Text

Taking this into account, we should do better than OLS

Co-variance of distances

Distance dependent Constant var d( ) = 4

RΔ Σ + 2P

R R −1( ) Σ Σ

Linear representational models: IRLS Estimation

Until convergence

4. Use in estimation

Data

ω = XT VX( )−1XT V−1d

3. Calculate variance-covariance of d

V = 4

RΔ Σ + 2P

R R −1( ) Σ Σ1.Start with initial guess of ω

2.Predict distances

Model

d = Xω


Sequence Large chunk Chunk 2-Finger O

LS

estim

ate

IRLS

es

timat

e

Improvements in SD

OLS is unbiased, but suboptimal IRLS can do better, but by how much depends on model structure

-> for factorial designs it does not matter

Linear representational models: Estimation

How do we best estimate and component weights?

Ordinary least- squares (OLS)

Iteratively reweighted least-squares (IRLS)

•  Unbiased estimates •  Can become negative •  Allows direct testing

of parameters

Non-negative least-squares

Maximum likelihood

•  Positive estimates •  Biased •  Model testing

by crossvalidation Diedrichsen et al. (2011)

Khaligh-Razavi & Kriegeskorte (2014)

Training ω

Test Tight link to encoding models

Overview


Non-linear representational models

Models in which component matrices are non-linear functions of parameters

Targ

et

Targ

et

Targ

et

Feature

Features

Target

Distances

Target

Act

ivity

A

ctiv

ity

Act

ivity

Tuning curves Tuning width

σ = 1

σ = 5

σ = 15

Non-linear representational models

•  Sometimes good linear approximations can be found (Example: AR-estimation in "rst-level SPMs)

•  Otherwise, estimate nonlinear parameters to optimize the log-likelihood:

logp d |θ( )∝ − 1

2d− d( )T V d( )−1

d− d( )

Overview


The whole process

B

G

Y

Raw data

Regression coefficients

UPattern estimate

Covariance matrix

RDM (LDC)

D

Component weights

Group map

First-level GLM

Distance / covariance estimation

Representational model fit

Group analysis

Searchlight Prewhitening

ω

Group analysis •  Model coefficients can be directly tested (unbiased)

•  Sometimes it is more sensible to use (SD vs. variance)

ω

ω

ssqrt ω( )

Representational component models

•  Representational component models assume •  independence of data across partitions

-> (zero-distance is meaningful) •  independence of feature patterns across components •  linear overlap of patterns (within small range of variations)

•  Representational component models do NOT assume •  normality of the data •  independence of distance estimates •  linear relationship between psychological variables and

BOLD

Representational component models

Representational component model

Rank-based RSA

Intercept is not included in fitting Needs to be explicitly modeled

Intercept is implicitly removed Does not contribute to model comparision

Predictions on ratio-scale Predictions on ordinal scale

Non-linearity of distances removed Linearity assumption (narrow range)

Non-linearity can be modeled

Flexible factorial and combined models Single models

ω

Representational component models: Key insights II

E1. Large (squared Euclidean) distances are estimated with larger variability than smaller distances. E2. Distance estimates are statistically dependent in a way that is determined by the true distance structure. E3. Component weights can be estimated using iteratively reweighted least squares (IRLS), which yields better estimates than ordinary least squares (OLS) in some cases.

The end

Thanks!

weighted representational component models i€¦ · – rhythm x spatial sequence ... •...

Documents