bcs547 neural decoding. nature of the problem in response to a stimulus with unknown orientation ,...

BCS547

Neural Decoding

Nature of the problem

In response to a stimulus with unknown orientation , you observe a pattern of activity A. What can you say about given A?

Bayesian approach: recover P(|A) (the posterior distribution)

Estimation theory: come up with a single value estimate from A

Population Code

Tuning Curves Pattern of activity (A)

Estimation theory

-100 0 1000

Preferred retinal location

Decoder

Trial 2

2 Encoder

-100 0 1000

Decoder

Trial 1

1 Encoder

Decoder

Trial 200

200 Encoder-100 0 100

Estimation theory

If 2 is as small as possible, the estimate is said to be efficientˆ |

If E[] , the estimate is said to be unbiasedˆ|

is a random variable. To determine the quality of this estimate we can compute its mean, E[ ], and its variance,

ˆ| ˆ |

Estimation theory

• A common measure of decoding performance is the mean square error between the estimate and the true value

• This error can be decomposed as:

2ˆMSE |E

2 2ˆ|

ˆMSE |E

Efficient Estimators

The smallest achievable variance for an unbiased estimator is known as the Cramer-Rao bound, CR

An efficient estimator is such that

In general :

2 2ˆ CR|

Fisher information is defined as:

and it is equal to:

where P(A| ) is the distribution of the neuronal noise.

Fisher Information

ln |PI E

Fisher Information

22 ' ''''

22 ' '

ln P |

P | P |!

ln P | ln ln !

ln P |

i ik fn n

i i i ii

ni i i

i i i i

f ea k

k f f k

kf k ff

f f f fE

Fisher Information

• For one neuron with Poisson noise

• For n independent neurons :

The more neurons, the better! Small variance is good!

Large slope is good!

Fisher Information and Tuning Curves

• Fisher information is maximum where the slope is maximum

• This is consistent with adaptation experiments

Fisher Information

• In 1D, Fisher information decreases with the width of the tuning curves

• In 2D, Fisher information does not depend on the width of the tuning curve

• In 3D and above, Fisher information increases with the width of the tuning curves.

• ATTENTION: this is true for independent gaussian noise.

Ideal observer

The discrimination threshold of an ideal observer, , is proportional to the variance of the Cramer-Rao Bound.

In other words, an efficient estimator is an ideal observer.

• An ideal observer is an observer that can recover all the Fisher information in the activity (easy link between Fisher information and behavioral performance)

• If all distributions are gaussian, Fisher information is the same as Shannon information.

Estimation theory

Examples of decoders

Voting Methods

Optimal Linear Estimator

ˆ i ii

Voting Methods

1ˆ ,T

x w a C C AA AXW A W

Voting Methods

Center of Mass

1ˆ ,T

Center of Mass/Population Vector

• The center of mass is optimal (unbiased and efficient) iff: The tuning curves are gaussian with a zero baseline, uniformly distributed and the noise follows a Poisson distribution

• In general, the center of mass has a large bias and a large variance

Voting Methods

Center of Mass

Population Vector

ˆ ˆ( )

1ˆ ,T

Population Vector

11 112 21

ˆ mi i

AA AP P

P P W A

Typically, Population vector is not the optimal linear estimator.

Population Vector

• Population vector is optimal iff: The tuning curves are cosine, uniformly distributed and the noise follows a normal distribution with fixed variance

• In most cases, the population vector is biased and has a large variance

• The variance of the population vector estimate does not reflect Fisher information

Population Vector

Population vector

Fisher Information

Population vector should NEVER be used to estimateinformation content!!!!

Population Vector

Maximum Likelihood

The estimate is the value of that maximizes the likelihood P(A|). Therefore, we seek such that:

ˆ arg max |

arg max log |

Maximum Likelihood

If the noise is gaussian and independent

Therefore

and the estimate is given by:

2ˆ arg min

2| exp

2log |

Distance measure:Template matching

Maximum Likelihood

Gradient descent for ML

• To minimize the likelihood function with respect to , one can use a gradient descent technique in which is updated according to :

1t t t

Gaussian noise with variance proportional to the mean

If the noise is gaussian with variance proportional to the mean, the distance being minimized changes to:

ˆ arg min2

Data point with small variance are weighted more heavily

Poisson noise

If the noise is Poisson then

ML and template matching

Maximum likelihood is a template matching procedure BUT the metric used is not always the Euclidean distance, it depends on the noise distribution.

Bayesian approach

We want to recover P(|A). Using Bayes theorem, we have:

likelihood of

posterior distribution over prior distribution over A

prior distribution over

Bayesian approach

What is the likelihood of P()It is the distribution of the noise… It is the same distribution we used for maximum likelihood.

Bayesian approach

• The prior P() correspond to any knowledge we may have about before we get to see any activity.

• Ex: Zhang et al.

Bayesian approach

Once we have P(), we can proceed in two different ways. We can keep this distribution for Bayesian inferences (as we would do in a Bayesian network) or we can make a decision about . For instance, we can estimate as being the value that maximizes P(). This is known as the maximum a posteriori estimate (MAP). For flat prior, ML and MAP are equivalent.

Using the prior: Zhang et al

• For a time varying variable, one can use the distribution over the previous estimate as a prior for the next one.

Nasty but independent of Xt+1

| , || ,

t t t t tt t t

t t t t

A X X X XX A X

A X X X

Bayesian approach

Limitations: the Bayesian approach and ML require a lot of data…

Alternative: estimate P(|A) directly using a nonlinear estimate.

Bayesian approach:logistic regression

Example: Decoding finger movements in M1. On each trial, we observe 100 cells and we want to know which one of the 5 fingers is being moved.

1 2 3 100

1 2 3 4 5

…100 input units

5 categories

P(F5|A)

Bayesian approach:multinomial distributions

Example: Decoding finger movements in M1. Each finger can take 3 mutually exclusive states: no movement, flexion, extension.

Probability of no movementProbability of flexionProbability of extension

Activity of the N M1 neurons

Digit 1 Wrist

Softmax

Digit 2 Digit 3 Digit 4 Digit 5

Decoding time varying signals

s t t k t

k t t d

bcs547 neural decoding. nature of the problem in response to a stimulus with unknown orientation ,...

fisher information slide

estimation theory slide

large variance slide

aipiaipi p slide

fisher information decreases

fisher information increases

bcs547 neural decoding

shannon information

Documents

tutorial kohitij kar psychophysics …...dicarlo lab sensory...

encoding and decoding with voxel-wise models...find out how...

e-detective decoding centre (eddc) offline decoding &...

stimulus = anything that elicits a response. stimulus...

experimental demonstration of capacity increase...

a test bench is an hdl program used for applying stimulus to...

decoding neuronal patterns resulting from visual responses...

neural decoding of bistable sounds reveals an effect of...

behavioral learning theory (or operant conditioning) b.f....

chapter 8 – stimulus control of behavior outline 1...

interactions between stimulus-stimulus congruence and

automatic effects of derived stimulus-stimulus...

decoding design

decoding an arbitrary continuous stimulus · 2017-02-02 ·...

hopewell 7th science · 2. 3. 4. date: stimulus and...

rightleft no stimulus. rightleft right horizontal canal...

a graphical model framework for decoding in the …...this...

spike train decoding summary decoding of stimulus from...

decoding cag

decoding gsm