supplementary materials for -...
TRANSCRIPT
www.sciencemag.org/cgi/content/full/338/6103/135/DC1
Supplementary Materials for
Network Resets in Medial Prefrontal Cortex Mark the Onset of Behavioral
Uncertainty
Mattias P. Karlsson, Dougal G. R. Tervo, Alla Y. Karpova*
*To whom correspondence should be addressed. E-mail: [email protected]
Published 5 October 2012, Science 338, 135 (2012)
DOI: 10.1126/science.1226518
This PDF file includes: Materials and Methods
Figs. S1 to S10
References
2
Materials and methods
Subjects
12 male Long Evans rats (400-550g) were used to characterize behavior during the
described task. Of these, four animals (500-550g at implantation) were implanted with
microdrive arrays for electrophysiological recordings. Animals were kept at 85% of their initial
body weight before food restriction, and maintained on a 12hr light/12hr dark schedule.
Experiments were conducted according to National Institutes of Health guidelines for animal
research and were approved by the Institutional Animal Care and Use Committee at HHMI’s
Janelia Farm Research Campus.
Behavioral task design
The idea behind the task design was to create a setting in which the animal would
abruptly abandon old beliefs during identifiable periods of behavioral uncertainty, with the added
feature that these moments would be decoupled from abrupt changes in sensory input or
behavioral output. In such a setting, an abrupt change in neural activity is most likely to be a
neural correlate of the change in the internal state of belief. We took advantage of the
expectation that in a stochastic and unstable environment, evidence accumulates gradually but,
once there is sufficient information to indicate that the environment has indeed changed, old
beliefs are abandoned (reset) abruptly and exploration is initiated. Thus, abrupt and substantial
changes in neural activity, such as network resets, that occur at moments when neither the
statistics of the environment nor the behavioral output changes abruptly are likely to be neural
correlates of changes in the belief state. Our behavioral paradigm exploits this situation
particularly well for the following reasons:
3
1) Because of the stochastic nature of the action-outcome association, animals sample both
sides for a wide range of outcome probabilities. Thus, neural activity can be compared between
situations that are identical with respect to action (motor output) and sensory input, but different
in belief state.
2) While the reward probabilities may change abruptly, no single outcome is sufficient (unlike
in the case of deterministic rewards) to inform the animal of that change in the environment.
Instead, detection of that change requires gradual evidence accumulation and will thus be
delayed with respect to when the probability is switched. This separates in time the point of the
environmental change from the point when the animal becomes aware of it. Importantly, around
that latter point the local outcome history is statistically constant for trials to the same side.
3) The sequential presentation of the two behavioral options introduces additional stochasticity,
because the manifestation of a decision to re-sample the previously non-preferred option has to
wait, for a variable number of trials, until that option is presented by the computer. Therefore, a
sudden decision to abandon old beliefs will not always coincide with a change in behavioral
output associated with the decision to explore- a change that will result in consecutive trials to
one side being suddenly separated by more trials to the other side. Thus, if reset-like dynamics
are observed in the network activity even in the absence of a coincidental change in local history,
such dynamics can be most parsimoniously attributed to a change in animal’s internal state.
While this design makes it somewhat harder to pinpoint, at the behavioral level, the exact
moment when the decision to explore has been made, it allows us to dissociate abrupt transitions
in network activity due to changes in the internal state of belief from those due to changes in
behavioral output and sensory input.
4
Behavioral apparatus
All behavior was confined to a box with 23 cm high plastic walls and stainless steel
floors (Island Motion Corp). The floor of the box was 25 cm by 34 cm, and the levers and nose
ports were all arranged on one of the short walls. All lights, nose ports, levers, and reward
deliveries were controlled and monitored with a custom-programmed microcontroller, which in
turn communicated via USB to a PC running a control program based on MATLAB (The
Mathworks, Inc.). Nose port entries were detected with an infrared beam-break detector (IR LED
and photodiode pair). The central initiation port contained one white LED that indicated the
option to initiate a new trial. Upon each trial initiation, the left and right levers were
pneumatically extended from the wall, (both were retracted after one of the two levers was
pressed), and simultaneously one of two sounds was presented by two speakers (located on the
two 34 cm walls) with equal volume. The sounds were frequency modulated (1% modulation at
6.67 Hz) around a single base frequency. 6.5 or 14 kHz base frequency indicated, respectively,
that the left or the right lever was correct. The trial identity was random, with equal probability
for each tone being presented. The animals were required to stay in the initiation port for at least
250 ms in order to initiate a trial and for the tone to be played. The animal was then required to
exit the port, and upon exiting could not initiate a new trial for 500 ms. All behavior was video
recorded at 30 frames/sec using an infrared-sensitive camera. Except after incorrect lever
presses, the only visible light source inside the box was from the LED in the initiation port.
During error trials, a white LED array in the ceiling of the box was lit, and no trials could be
initiated during a 30 second timeout period. Liquid rewards (0.1 ml drops of 10% sucrose mixed
with black cherry Kool-Aid) were delivered from the reward ports 0.5 seconds after port entry
with a motorized syringe pump (Harvard Apparatus PHD 2000).
5
Behavioral training
Food-restricted animals were trained to perform the task with minimal ‘shaping’—from
the first moment of training, animals were exposed to the full task with four exceptions: 1)
animals only needed to press the correct lever once before reward became available, 2) reward
probabilities were kept at or above 0.5 for both sides, 3) the timeout period for pressing the
wrong lever was kept short (0.5 seconds) and 4) the time that was allowed to pass between
initiating the trial and pressing the lever, as well as between pressing the lever and collecting the
reward was 300 sec. At first, animals performed the correct sequence of actions rather
infrequently and by chance. However, after approximately 1-2 weeks of training, most animals
learned the entire task structure, including the presence of the option to reject trials and of
unsignalled changes in reward contingencies. After each animal successfully completed two trial
blocks in one session, the number of required lever presses was gradually increased to 5, the
timeout period was increased to 30 seconds, and reward probabilities became randomly drawn
from a set spanning low and high values (0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7 and 0.8) at block
transitions. Reward probabilities associated with the two sides were selected independently. This
design feature made it difficult for the rat to infer the identity of the new and more profitable
option simply from the change in reward probability of the preferred option, thus prompting
exploratory bouts. Animals were considered proficient on the task when they preferentially
rejected the less profitable side and would dynamically update this rejection policy when reward
probabilities changed. Of 13 attempts to train animals with the described method, 12 became
proficient within one month of training. To prevent animals from predicting when reward
probability changes would occur, the number of trials within each block was drawn randomly
6
from a Gaussian distribution (mean 250 with standard deviation of 25). Other than the changed
reward probabilities, no cues were presented to indicate block transitions.
Rejection-rate analysis around behavioral transitions
Large shifts in choice preference were identified using change point analysis on the
stream of accepted trials (18) . The identity of accepted trials was represented as 1s and 0s (for
left and right trials, respectively). The cumulative sum of the difference between each value and
the average session value was calculated along the entire behavioral time series. The presence of
a change point was inferred if the maximum deflection from zero exceeded that for the 99% of
200 bootstraps (random reordering of data points). The location of the change point was taken as
the data point with the highest absolute value of the cumulative sum. Following the detection of
a first change point, the acceptance data stream was split in two (before and after the change
point) and the process was repeated for each segment. This iterative process continued until no
further change point was detected. The change points identified in the stream of accepted trials
was then mapped back onto the full data set. Only those change points that were separated by at
least 70 trials were kept for the rejection rate analysis. The rejection rate was then computed in 5-
trial bins around the detected change points and normalized to the average rejection rate for that
session.
Reaction-time analysis around behavioral transitions
Reaction time for acceptance trials was defined as the time between engaging the
initiation port and pressing of the lever. Reaction time was computed for all accepted trials in 10-
7
trial bins around the detected behavioral change points (see above) and normalized to the average
acceptance trial reaction time for that session.
Implant preparation and surgery
For neural recordings, a microdrive array containing 20 independently movable tetrodes
(29) was chronically implanted on the head of the animal. Each tetrode was constructed by
twisting and fusing together four insulated 13 μm wires (stablohm 800A, California Fine Wire).
Each tetrode tip was gold-plated to reduce the impedance, yielding values of 200-300 kΩ at 1
kHz. Within the implant, the tetrodes converged to a circular bundle (1.9 mm diameter), angled
20° with respect to vertical (pointing towards the midline). This angle allowed the implant to be
positioned laterally relative to the mPFC to avoid puncturing the midline sulcus.
For surgery, trained animals were initially anaesthetized with 5% isoflurane gas (2.5
L/min) and 0.03 mg/kg buprenorphine. After 10-15 minutes, isoflurane was reduced to 0.5-1.0%
and the flow rate to 0.5 L/min. A local anesthetic (Bupivacaine) was injected under the skin 10
minutes before making an incision. The microdrive array was implanted such that the tetrode
bundle was centered 3.0 mm anterior and 1.8 mm lateral to bregma (right hemisphere). Small
stainless steel bone screws and dental cement were used to secure the implant to the skull. One of
the screws was connected to a wire leading to system ground. Before the animal woke up, all
tetrodes were advanced into brain.
Tetrode positioning
Over a period of two weeks following the surgery, the tetrodes were gradually advanced
to a depth of 2 mm along the 20° trajectory, moving approximately 160 μm/day. During this
8
time, animals were acclimatized to performing the task while being tethered to the recording
system. When performance on the task regained pre-surgery levels (in terms of motivation and
dynamic rejection behavior), recording sessions began. After each recording session, any
tetrodes that did not appear to have any isolatable units were advanced by 80 μm. No adjustment
was made within 12 hours prior to each recording session. Once a tetrode had been moved a total
of 2.5 mm from the surface, which is the approximate border between anterior cingulate and
prelimbic cortices (fig. S2), it was no longer advanced.
Recording sessions and data preprocessing
Each recording session lasted 1.5 to 3 hours, depending on the animal’s motivation to
perform. Animals were not forced to perform the task and occasionally took breaks (generally
around 5 minutes, but sometimes up to 30 minutes). Combining neural recordings with behavior
imposes limitations on the length and number of recording sessions, prompting us to simplify the
task structure to emphasize exploratory bouts in the following way. We limited the reward
structure to reversals of high/low probabilities, but used animals that had previously been trained
on more complex probability pairs. Reversals of high/low probabilities were used to make
exploratory bouts stand out. Keeping the high/low probabilities at 0.5 and 0.25 respectively was
usually sufficient for this purpose, but higher contrast (0.6/0.2 or 0.7/0.3) was occasionally
required. At the end of each block, both probabilities were switched, such that the previously
unfavorable side was now favorable. Despite the simplification of the block structure, animals
continued to go through a transient period of exploration once they detected a change in reward
contingencies, likely due to their prior training. Although animals quickly transitioned to a stable
strategy of preferential acceptance of one trial type, presumably due to the relatively large
9
difference in reward probabilities associated with selection of the two sides, exploratory bouts
were well-defined.
Data were collected with an Nspike data acquisition system (L. Frank, UC San Francisco,
and J. MacArthur, Harvard Instrumentation Design Laboratory). Signals were first amplified on
the animals using a small unity-gain preamplifier array. Then, signals were carried to the Nspike
system via a bundle of 80 fine wires (Cooner Wire) to be digitized and processed. An infrared
diode array with a large and a small group of diodes was attached to the animal's preamplifier
array and the animal's position in the environment was reconstructed using semi-automated
analysis of a digital video recording of the experiment with custom-written software. Spike data
were sampled at 30 kHz, low and high pass filtered at 600 Hz and 6 KHz, respectively (2 pole
Bessel) and all above-threshold events were saved to disk. Local field potential data from all
tetrodes was sampled continuously at 1.5 KHz, digitally filtered between 0.5 and 400 Hz and
saved to disk.
After neural data were collected, individual units on each tetrode were identified by
manually classifying spikes using polygons in two-dimensional views of waveform parameters
(Matclust, M.K.). For each channel of a tetrode, the parameters used were peak waveform
amplitude and the waveform’s projection onto the first two principal components computed
across the session. We also used autocorrelation analysis to exclude units with non-physiological
single-unit spike trains. Only units where the entire cluster was visible throughout the recording
session were included. Thus, a unit was not isolated for further analysis if any part of the cluster
vanished into the noise or was cut off by the recording threshold. The quality of each cell’s
isolation was assessed using standard measures, Lratio and isolation distance (30).
10
Cell selection
Only cells that were active during performance of the task were analyzed. This
assessment was performed independently for each of the three analysis intervals within the trial.
For each trial, we computed the firing rate of each cell (number of spikes/1 sec) during the 1-
second interval centered on each of three behavioral analysis point (analysis points 1, 2, and 3;
Fig. 2A caption and see below). As a form of low-pass filtering for activity dynamics, we applied
a slight smoother across trials (Gaussian smoother with 1 trial standard deviation) in order to
reduce false detections of ensemble fluctuations due to rate-measurement variability. Any cell
with mean firing rate below 1 Hz in the 1-second time interval (calculated across the entire
session) was excluded from the pool of analyzed cells for that interval. No effort was made to
distinguish excitatory and inhibitory neurons. Four to nineteen neurons active during the
performance of the task were recorded simultaneously (fig. S3), revealing a variety of response
profiles (see Fig. 2A for examples). The total number of neurons included in the study was 320.
Visualizing network transitions
For visualization purposes only (Figs 2A and 3A) time was aligned to the three analysis
points within the trial and time between these alignment points was stretched or compressed to
allow visualization of activity across the entire trial while still maintaining alignment. Spikes
were binned in 200 ms bins (in stretched/compressed time relative to the average trial timing
between analysis points) and firing rates were computed by dividing the number of spikes by the
actual time that went into the bin (time occupancy normalized rate).
11
Quantitative detection of network transitions
To detect such transitions, we characterized local changes in the firing of individual
neurons and determined when these changes were unexpectedly large across the recorded
ensemble. We minimized the possibility that observed changes in neural activity simply reflect
changes in local behavior by performing all analyses separately on left- and right-bound
acceptance trials. Furthermore, for each trial, the analysis of activity changes was limited to
points (analysis points 1, 2, and 3; Fig. 2A caption) when the animal’s spatial trajectories were
highly stereotyped (fig. S5). These three points represented 1s windows around: 1) the initiation
of a new trial, 2) the lever press, and 3) the reward checking.
The slope in the normalized firing rate for each cell was calculated for each analysis point
using a ten-trial sliding window (Fig. 2B). For each position of the sliding window, the slope of
the firing rate change (for each of the three analysis points) was calculated using linear
regression analysis (MATLAB’s ‘regress’). The total change in firing rate over the 9 time steps
within the ten-trial window was estimated by multiplying the slope by 9. This value was then
normalized by the mean firing rate of the cell at the corresponding analysis point (norm change =
total rate change/mean rate). A steep slope (rising or falling) indicated that the neuron’s firing
rate changed rapidly. In the example in Figure 2, the firing rate of cell 5 decreased by close to its
mean firing rate around trial 58 at all three analysis time points (Fig 2B, right panel). The median
absolute firing rate change across the population was used to gauge how widespread a change in activity
was in the network (Fig. 2C, top panel). With each step of the sliding detection window, we
calculated the probability of observing an equal or larger population-wide fluctuation in firing
rate by chance alignment of uncorrelated fluctuations in single-cell activity. For significance
testing, we performed the same calculation 100 times with scrambled trial order to determine the
probability of observing a population-wide fluctuation in firing rate within the particular analysis
12
window with equal or greater median change due to chance alignment of ongoing single-cell
variability in activity. Independently for each of the three analysis points, we tallied up the
number of ten-trial windows across the 100 scrambled sessions that had a median change value
equal to or greater than that observed in the non-scrambled data, and divided it by the total
number of windows in the 100 scrambled sessions (100 number of ten-trial windows in the
session). A session with 510 trials would provide 500 ten-trial comparison windows; after 100
session scrambles, this would provide 50000 comparison windows. Under the assumption that
the firing rates across the three analysis points were independent, the three probabilities were
then multiplied together to estimate the joint probability of change. To estimate the expected
frequency of the observed network transition for one session, the computed probability was
multiplied by the total number of trial windows in the session. If the expected frequency was
below 0.05/session, we used a peak finder algorithm (PeakFinder for MATLAB, N.Yoder) to
center the detection window on the point with the lowest expected frequency. This window was
later used when assessing the behavioral correlates of the network transition. Our metric revealed
clear moments when the observed population-wide changes in activity were unexpectedly large
(Fig. 2C).
Determining behavioral correlates of network transitions
We focused our analysis on three behavioral variables: side preference (left vs. right
acceptance trials), reward (including both types of trials), and trial rejection. Because these
parameters were binary, we could obtain an instantaneous estimate of the underlying behavioral
parameters by smoothing with a Gaussian (standard deviation of 3 trials, including all trial
types). The change of each behavioral parameter within a 10-trial window was calculated using
13
linear regression (MATLAB’s ‘regress’) to fit a straight line through the 10 behavioral values
(from one trial type) in the window.
We used a generalized linear model (31) with a binomial link function (‘glmfit’ in
MATLAB) to compare the predictive power of reward dynamics to that of rejection dynamics.
We excluded a randomly selected 10% of the behavioral windows during model training for
subsequent testing of the models. During model testing, we used the behavioral measures in
those 10% of windows as model predictors of whether or not a network transition would occur
(‘glmval’ in MATLAB). The model’s predictions were compared to actual detection outcomes.
This procedure was repeated ten times (using a new random test set for each repeat) to build up a
test set equal in size to the original dataset (10-fold cross validation (32)). A receiver operating
characteristic (ROC) curve was calculated for each of the three behavioral predictors using
MATLAB’s ‘perfcurve’, with false and true positive rate on the X- and Y-axis, respectively. The
area under the ROC curves was used to compare performance of the two behavioral predictors.
To estimate the variance of the measure, the entire 10-fold cross validation procedure was
repeated 100 times for each behavioral predictor.
Calculation of network participation
For each detected network transition, we calculated the proportion of active cells that
exceeded in the detection window a normalized change of either 0.75 or 1.0 of their mean firing
rate (calculated across the session). Any cell that exceeded this threshold for at least one of the
three analysis points was considered a participant in the network transition. The total number of
active cells was defined as the number of cells that were active in at least one of the three
analysis points (at least 1 Hz mean rate across the session).
14
Analysis of network transition abruptness
We characterized how abrupt the activity change was by examining the network
dynamics across the 10 trials surrounding each network transition by representing each trial as a
point in a multidimensional space where each dimension corresponded to the firing rate of an
individual cell. If the majority of the cells abruptly change their firing between the same two
trials, the 10 points would fall into two easily separable clusters with the network suddenly
jumping from one cluster to another within a single trial. If, on the other hand, individual cells
modify their activity more gradually or suddenly but on different trials, the 10 points would form
a more spread-out distribution in this space. One way to test this is to force the 10 points into two
clusters (k-means classifier algorithm) without regard for the temporal order of the points and
measure the distance from each point to the centroid of each cluster. As one moves along the trial
axis, the distance to the first cluster would abruptly increase in the first case but grow more
gradually in the second. We used a multi-dimensional k-means classification algorithm
(MATLAB’s ‘kmeans’ with the number of clusters set to 2) to assign each trial in the 10-trial
detection window to one of two states. The rate values for each cell were taken from the three 1-
second analysis points, normalized by that cell’s mean firing rate in each of those intervals.
Because each cell contributed 3 measurement points per trial, the dimensionality of the clustering
space was 3 x N, where N is the number of active isolated cells. The relative ensemble distance
between the two cluster centroids was defined as d1/(d1+d2), where d1 and d2 are the Euclidean
distances to the centroids of cluster 1 and cluster 2, respectively. For the majority of network
transitions, a state transition occurred that was centered in the 10-trial window, but occasionally
the state transitions were significantly off-center in the detection window. To preserve
15
confidence in the timing of the network transitions, the 87 detected transitions included only
those where the k-means state switch occurred between trials 4-5, 5-6, or 6-7 in the detection
window. This excluded 10 transitions from further analysis.
Local field potential analysis
Local field potential was analyzed using time frequency analysis. For each 1-second
analysis point within a trial, the average power at frequencies ranging from 5 Hz to 140 Hz was
calculated, using 5 Hz steps. The average power was then compiled for three frequency ranges-
theta (5-10Hz), low gamma (25-55Hz), and high gamma (65-140Hz). The 55-65 Hz frequency
band was ignored to avoid potential contamination from 60 Hz electrical noise. For each of the
three frequency ranges, the average power was normalized as a percentile for the session, using
data from the same analysis point across all same-side trials in the session. Mean percentiles for
each trial in the 10-trial detection window (3 analysis points, and 3 frequency ranges) were then
calculated across all 87 network transitions. Significance of any change in the LFP power on an
individual trial was established by comparing the values in that trial to the values for the other 9
trials (Wilcoxon rank sum test with the final p value corrected for the number of comparisons).
Analysis of variability in activity around network transitions
We measured relative trial-to-trial variability in network activity in ten-trial windows
centered on trials before and after each network transition. For the initial characterization of trial-
to-trial variability in the activity of each individual cell, we represented this activity on each trial
as a point in a three dimensional space, where each dimension represented mean firing rate of the
cell around one of the three analysis points within the trial. As a measure of trial-to-trial
16
variability in the activity of that cell within a particular ten-trial window of interest, we took the
mean Euclidean distance between all pair-wise combinations of the associated ten points in this
three dimensional space. This individual-cell measure of trial-to-trail variability in activity within
the ten-trial window of interest was then compared (as a percentile) to the same measure across
all the other ten-trial windows in the corresponding session. Doing this analysis separately for
each cell ensured proper normalization for individual firing rates. The mean percentile for all
cells that participated in the network transition (based on a 0.75 activity change threshold) was
subsequently computed at each window location. Finally, this network transition variability
measure was averaged at each window location across all transitions. Large deviations from the
50th percentile (when the lower SEM range of the plot was above the 50th percentile range) was
taken to indicate a significant change in trial-to-trial variability of network activity.
Analysis of ensemble states
To compare ensemble states before and after these transient periods of network plasticity, we
computed a trial-to-trial similarity matrix, using the Euclidean distance in the multidimensional
space representing the state of the network as our measure of state similarity. To compare the
new stable ensemble state to the state before the network transition, we performed a 2-group
comparison of specific values within the similarity matrix. Because the matrix is symmetrical,
only values above the diagonal in the matrix were used to avoid duplication. In the first group,
we included all within-state distances (using trials -20 to -1 for state 1 and trials 16 to 35 for state
2; where trial 0 was when the network transition was detected). In group 2 we included all
between-state distances (similar to the blue dashed box in Fig. 4C, except only 20 trials were
included in each group). 15 trials separated the two states. This was based on the average time
17
needed for the cell activity to stabilize (Fig. 4B). For each transition, we used a Wilcoxon rank
sum test to examine whether the first group of distances was significantly smaller than the
second group. If so, this was interpreted as the average distance between the two groups being
greater than the average within-state distance, and therefore that the new state was significantly
different than the previous state. 73 network transitions were analyzed, as the remaining events
either occurred too early or too late in the session to perform the calculation.
Lesions and histology
Recording sites were assessed by creating small electrolytic lesions with the tetrodes
before the animal was euthanized. 10 µA of current was applied for 14 seconds to each tetrode
before they were retracted from the brain. Then, animals were euthanized, and brains were fixed
with 4% paraformaldehyde, sectioned (50 µm coronal sections), and stained with cresyl violet.
Rej
ecti
on
rate
Rew
ard
pro
b.
0
1
0
1
200 400 600
Trial
Low contrast
Left side
Right side
0
1
0840 880 920
Rew
ard
pro
b.
Trial
Accepted trialRejected trial
A
B
Right trialsLeft trials
Fig. S1. Additional examples of rejection behavior. (A) Rejection behavior for blocks of di�erent reward contrast. Top panel shows the reward probabilities for the right and left trial types. Bottom panel shows the smoothed rejection rates for each trial type (Gaussian smoother with σ=3 trials). Note lower rejection rate for both trial types during low reward contrast period (grey background). (B) Example of a change in choice preference at a block transition without a transient decline in trial rejection. Note an abrupt switch in side preference at trial 885.
A B
AC
PL
M2
1 cm Bregma 2.5 mm
Fig. S2. Recording details. (A) Top view of the microdrive array, showing the 20 shuttles surrounding a connector array. Each shuttle drives a single tetrode into the brain. (B) Coronal section (cresyl violet stain) showing where the ends of two tetrodes were at the end of the experiment (for animal 1). Lesions were made with current injection to make the tetrode locations visible in the brain slices. At the end of each experiment, most tetrodes were located near the border of the anterior cingulate (AC) and prelimbic (Pl) cortices.
0 2 4 6 8 10 12 14 16 18 200
1
2
3
4
Num
ber o
f ses
sion
s
Number of cells active during task performance
5
6
Fig. S3. Number of cells recorded per session. Only cells that had an average �ring rate of 1.0 Hz or more in at least one of the six analysis points (3 for right-bound trials and 3 for left-bound trials) were counted. All other cells were included neither here nor in any further analysis.
Tet 11, Cell 1 Tet 13, Cell 1 Tet 16, Cell 1 Tet 16, Cell 3 Tet 17, Cell 2 Tet 19, Cell 1 Tet 19, Cell 2
Before
After
Spike feature 1
Spik
e fe
atur
e 2
Example event 2 (from �g 3a)
Coun
t
Isolation distance L ratio
Tet 2, Cell 2 Tet 4, Cell 2 Tet 13, Cell 2 Tet 16, Cell 1 Tet 2, Cell 1
Before
After
Spike feature 1
Spik
e fe
atur
e 2
Example event 1 (from �g 2a)
0
40
80
120
0 40 80 120 160 200
Z = 0.3125, N.S.
.0
100
200
300
400
0 0.1 0.2 0.3 0.4 0.5
Coun
t
Z = 0.1702, N.S.
First 5 trials in windowLast 5 trials in window
A
B
C D
Fig. S4. Cell isolation quality during events. (A-B) Spike clusters for the cells shown in the two event examples in the main text (A for Fig. 2A and B for Fig. 3A). Each panel shows two spike features (which, in combination, provided the best visual separation of spikes from the highlighted cell from other spikes) during the �rst 5 trials (top row) and the last 5 trials (bottom row) in each of the two the event windows. Each column shows a di�erent cell (in the same order as presented in the main text). Note that no decrease in isolation quality is apparent during the events in either case. (C-D) Isolation quality for all 87 events. Shown are histograms of the number of cells across two di�erent quality measures ((C) isolation distance; (D) Lratio). Blue lines indicate the �rst �ve trials in the event windows, and the green lines indicate the last 5 trials. No signi�cant di�erences were found with either measure (Isolation distance, Wilcoxon rank sum, Z = 0.3125, N.S.; Lratio, Wilcoxon rank sum, Z=0.1702, N.S.).
0
2
4
6
0 50 100 150
0
1
2
3
0 40 80 120
30 40 5012
16
20
24
28
30 40 5012
16
20
24
28
30 40 5012
16
20
24
28
20 30 40 5022
26
30
34
38
20 30 40 5022
26
30
34
38
20 30 40 5022
26
30
34
38
X position (pixels)
Y po
sitio
n (p
ixel
s)
X position (pixels)
Y po
sitio
n (p
ixel
s)
Dis
tanc
e fr
om m
ean
traj
ecto
ry (z
-sco
re)
Dis
tanc
e fr
om m
ean
traj
ecto
ry (z
-sco
re)
Accepted L-trial
Accepted R-trial
0 1 2 3 4 50
5
10
15
Average distance from mean trajectory (z-score)
Num
ber o
f eve
nts
First 5 trials in windowLast 5 trials in window
A B
C D
E
Analysis point 1 Analysis point 2 Analysis point 3
Analysis point 1 Analysis point 2 Analysis point 3
Z = 0.638, N.S.
Time
Time
Fig. S5. Spatial trajectories during abrupt network transitions. (A) Spatial trajectories of the animal (measured by tracking diodes mounted on the Microdrive) during the example event in Fig. 2A. The three panels (blue, green, and red) plot the trajectories during each of the 1-second analysis intervals in the trial. The wall containing levers and nose ports was located along the top of each plot. The grey lines show all trajectories throughout the session, and the colored lines show the trajectories for the 10-trial window around the event. Note that because the diodes were positioned approximately 4 cm above the animal’s head, head tilts also contributed to the measured trajectories; for example in analysis point 2, the animal tended to swing it’s head backwards as it moved from the lever to the reward port. Inset for the analysis point 3: y position (same range as main plot) over time. (B) Each trial’s spatial trajectory compared to the mean trajectory across the session. For each trial, we tracked the animal’s X and Y location and compiled these locations for the three analysis time points. Data was collected at 30 frames/sec, with 3 seconds analyzed per trial (X and Y data), totaling 180 data points per trial. We computed a z-score transformation for each 180-point trajectory. The yellow bar indicates the 10-trial window where the event was detected. Note that none of the trials within this window had an absolute z-score above 2. (C-D) Same as (A-B) for the example event in Fig. 3A. (E) The number of events vs. the average absolute trajectory z-score in either the �rst 5 trials (blue) or the last �ve trials (green) of the 10-trial detection window. Note that there was no signi�cant di�erence between these groups, suggesting that the detected transitions did not occur because of abrupt changes in spatial trajectories.
0
500
1000
1500
2000
2500
3000
3500
4000
0 -2 -4 -6 -8 -12
Expected frequency (log10)
-102
Tria
l tim
e in
terv
al (s
ec)
Stable network dynamicsduring large time gaps
Signi�cant �uctuationswith no large time gaps
Fig. S6. Network transitions and time gaps between trials. The x-axis represents probability (per session) of the observed network dynamics for each step of the 10-trial sliding window. The vertical dashed black line indicates the probability above which the observed change was considered statistically surprising (compared to the permutation tests). The time gap from the beginning of the 4th trial in the window to the end of the 6th trial is plotted on the y-axis. Note that there are many points with signi�cant network transitions but short trial gaps (green dashed box) and vice versa (red dashed box), suggesting that the abrupt network dynamics were not the result of a slow drift in the neural representation occurring between trials.
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Animal 1 (15 events, 7 sessions) Animal 2 (27 events, 7 sessions) Animal 3 (18 events, 6 sessions) Animal 4 (27 events, 9 sessions)
Nor
mal
ized
reje
ctio
n ra
te
Trial in event window
Fig. S7. Rejection rate during network transitions for individual animals. We performed the same analysis as in Fig 3D, separately for each of the four animals. Each animal showed the same decreasing trend in rejection rate during detected events (plotted is mean rejection rate across events +/- S.E.M.).
0.8
0.9
1.0
1.1
1.2
1 2 3 4 5 6 7 8 9 10
Trial in event window
Nor
mal
ized
reje
ctio
n ra
te
Fig. S8. Rejection rate during network transitions with lower expected frequency. We performed the same analysis as in Fig 3D for the subpopulation of network transitions that showed the expected frequency of 0.01 or less in bootstrap analysis.
1.0
1.1
1.2
0.9
0.82 4 6 8 10
Nor
mal
ized
rew
ard
rate
Trial in event window
N.S.
Fig. S9. Changes in reward frequency and network transitions. Note a slight, but not statistically signi�cant, increase in reward rate occurring around the detected events (Wilcoxon rank sum, Z=0.885, N.S.). The small magnitude of this trend was expected, given that the contrast between the two outcome probabilities was generally too low to generate large and abrupt jumps in outcome trends, and it suggests that reward coding, by itself, cannot explain the occurrence of the abrupt network dynamics. To test this idea directly, however, we used a cross validation method to compare the predictive power of the two behavioral variables (change in reward rate vs. change in rejection rate, both calculated using a straight-line �t over the 10 trials). We divided all behavior into 10-trial non-overlapping segments, and used a generalized linear model to predict the occurrence of a network event based solely on one of the two behavioral variables (see Materials and methods). We found that changes in rejection rate were far better behavioral predictors of the events than changes in reward rate (Wilcoxon rank sum, comparing area under receiver operating characteristic curves, Z=12.16, p < 10-33).
0 10 20 30 40 50 60 70 80
5
10
15
20
25
Number of ‘other side’ accepted trialsbetween event trials 4 and 7
Num
ber o
f det
ecte
d ne
twor
k ev
ents
Fig. S10. Network transitions and trial gaps. Histogram of network transitions with di�erent number of acceptance trials to the ‘other side’ between trials 4 and 7 of the detection window. Note that dispersion by a large number of accepted trials to the other side was not necessary for the network transition to happen. Note particularly that for 25 of the 87 events, no ‘other side’ accepted trials separated the two trials �anking the transition, i.e. the transition occurred during purely consecutive exposure to the same side. This group of network transitions represented abrupt dynamics that was as widespread across the network as the rest of the events (using a 0.75 threshold, see Materials and methods, consecutive partipation = 0.732; non consecutive participation = 0.719).
References
1. E. C. Tolman, Cognitive maps in rats and men. Psychol. Rev. 55, 189 (1948).
doi:10.1037/h0061626 Medline
2. W. Edwards, Behavioral decision theory. Annu. Rev. Psychol. 12, 473 (1961).
doi:10.1146/annurev.ps.12.020161.002353 Medline
3. K. Körding, Decision theory: What “should” the nervous system do? Science 318, 606 (2007).
doi:10.1126/science.1142998 Medline
4. J. M. Pearce, M. E. Bouton, Theories of associative learning in animals. Annu. Rev. Psychol.
52, 111 (2001). doi:10.1146/annurev.psych.52.1.111 Medline
5. T. E. Behrens, M. W. Woolrich, M. E. Walton, M. F. Rushworth, Learning the value of
information in an uncertain world. Nat. Neurosci. 10, 1214 (2007). doi:10.1038/nn1954
Medline
6. A. J. Yu, P. Dayan, Uncertainty, neuromodulation, and attention. Neuron 46, 681 (2005).
doi:10.1016/j.neuron.2005.04.026 Medline
7. M. R. Nassar, R. C. Wilson, B. Heasly, J. I. Gold, An approximately Bayesian delta-rule
model explains the dynamics of belief updating in a changing environment. J. Neurosci.
30, 12366 (2010). doi:10.1523/JNEUROSCI.0822-10.2010 Medline
8. J. M. Pearson, S. R. Heilbronner, D. L. Barack, B. Y. Hayden, M. L. Platt, Posterior cingulate
cortex: Adapting behavior to a changing world. Trends Cogn. Sci. 15, 143 (2011).
doi:10.1016/j.tics.2011.02.002 Medline
9. S. Fusi, W. F. Asaad, E. K. Miller, X. J. Wang, A neural circuit model of flexible sensorimotor
mapping: Learning and forgetting on multiple timescales. Neuron 54, 319 (2007).
doi:10.1016/j.neuron.2007.03.017 Medline
10. W. F. Asaad, G. Rainer, E. K. Miller, Neural activity in the primate prefrontal cortex during
associative learning. Neuron 21, 1399 (1998). doi:10.1016/S0896-6273(00)80658-3
Medline
11. A. Pasupathy, E. K. Miller, Different time courses of learning-related activity in the
prefrontal cortex and striatum. Nature 433, 873 (2005). doi:10.1038/nature03287
Medline
12. S. A. Huettel, A. W. Song, G. McCarthy, Decisions under uncertainty: Probabilistic context
influences activation of prefrontal and parietal cortices. J. Neurosci. 25, 3304 (2005).
doi:10.1523/JNEUROSCI.5070-04.2005 Medline
13. J. H. Sul, H. Kim, N. Huh, D. Lee, M. W. Jung, Distinct roles of rodent orbitofrontal and
medial prefrontal cortex in decision making. Neuron 66, 449 (2010).
doi:10.1016/j.neuron.2010.03.033 Medline
14. H. D. Critchley, C. J. Mathias, R. J. Dolan, Neural activity in the human brain relating to
uncertainty and arousal during anticipation. Neuron 29, 537 (2001). doi:10.1016/S0896-
6273(01)00225-2 Medline
15. B. Y. Hayden, J. M. Pearson, M. L. Platt, Neuronal basis of sequential foraging decisions in a
patchy environment. Nat. Neurosci. 14, 933 (2011). doi:10.1038/nn.2856 Medline
16. C. B. Holroyd, M. G. Coles, Dorsal anterior cingulate cortex integrates reinforcement history
to guide voluntary behavior. Cortex 44, 548 (2008). doi:10.1016/j.cortex.2007.08.013
Medline
17. R. Quilodran, M. Rothé, E. Procyk, Behavioral shifts and action valuation in the anterior
cingulate cortex. Neuron 57, 314 (2008). doi:10.1016/j.neuron.2007.11.031 Medline
18. D. Durstewitz, N. M. Vittoz, S. B. Floresco, J. K. Seamans, Abrupt transitions between
prefrontal neural ensemble states accompany behavioral transitions during rule learning.
Neuron 66, 438 (2010). doi:10.1016/j.neuron.2010.03.029 Medline
19. S. W. Kennerley, M. E. Walton, T. E. Behrens, M. J. Buckley, M. F. Rushworth, Optimal
decision making and the anterior cingulate cortex. Nat. Neurosci. 9, 940 (2006).
doi:10.1038/nn1724 Medline
20. S. W. Kennerley, J. D. Wallis, Evaluating choices by single neurons in the frontal lobe:
Outcome value encoded across multiple decision variables. Eur. J. Neurosci. 29, 2061
(2009). doi:10.1111/j.1460-9568.2009.06743.x Medline
21. B. Y. Hayden, M. L. Platt, Neurons in anterior cingulate cortex multiplex information about
reward and action. J. Neurosci. 30, 3339 (2010). doi:10.1523/JNEUROSCI.4874-09.2010
Medline
22. N. Kolling, T. E. Behrens, R. B. Mars, M. F. Rushworth, Neural mechanisms of foraging.
Science 336, 95 (2012). doi:10.1126/science.1216930 Medline
23. B. Y. Hayden, J. M. Pearson, M. L. Platt, Fictive reward signals in the anterior cingulate
cortex. Science 324, 948 (2009). doi:10.1126/science.1168488 Medline
24. D. R. Euston, B. L. McNaughton, Apparent encoding of sequential context in rat medial
prefrontal cortex is accounted for by behavioral variability. J. Neurosci. 26, 13143
(2006). doi:10.1523/JNEUROSCI.3803-06.2006 Medline
25. M. Rigotti, D. Ben Dayan Rubin, X. J. Wang, S. Fusi, Internal representation of task rules by
recurrent dynamics: The importance of the diversity of neural responses. Front Comput
Neurosci 4, 24 (2010). doi:10.3389/fncom.2010.00024 Medline
26. P. Dayan, A. J. Yu, Phasic norepinephrine: A neural interrupt signal for unexpected events.
Network 17, 335 (2006). doi:10.1080/09548980601004024 Medline
27. S. Bouret, S. J. Sara, Network reset: A simplified overarching theory of locus coeruleus
noradrenaline function. Trends Neurosci. 28, 574 (2005). doi:10.1016/j.tins.2005.09.002
Medline
28. G. Aston-Jones, J. D. Cohen, An integrative theory of locus coeruleus-norepinephrine
function: Adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403 (2005).
doi:10.1146/annurev.neuro.28.061604.135709 Medline
29. J. O’Keefe, M. L. Recce, Phase relationship between hippocampal place units and the EEG
theta rhythm. Hippocampus 3, 317 (1993). doi:10.1002/hipo.450030307 Medline
30. N. Schmitzer-Torbert, J. Jackson, D. Henze, K. Harris, A. D. Redish, Quantitative measures
of cluster quality for use in extracellular recordings. Neuroscience 131, 1 (2005).
doi:10.1016/j.neuroscience.2004.09.066 Medline
31. J. A. Nelder, R. W. M. Wedderburn, Generalized linear models. J. R. Stat. Soc. Ser. A 135,
370 (1972). doi:10.2307/2344614