eric zarahn, gregory d. weston, johnny liang, pietro...

13
100:2537-2548, 2008. First published Jul 2, 2008; doi:10.1152/jn.90529.2008 J Neurophysiol Krakauer Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro Mazzoni and John W. You might find this additional information useful... 22 articles, 10 of which you can access free at: This article cites http://jn.physiology.org/cgi/content/full/100/5/2537#BIBL including high-resolution figures, can be found at: Updated information and services http://jn.physiology.org/cgi/content/full/100/5/2537 can be found at: Journal of Neurophysiology about Additional material and information http://www.the-aps.org/publications/jn This information is current as of December 8, 2008 . http://www.the-aps.org/. American Physiological Society. ISSN: 0022-3077, ESSN: 1522-1598. Visit our website at (monthly) by the American Physiological Society, 9650 Rockville Pike, Bethesda MD 20814-3991. Copyright © 2005 by the publishes original articles on the function of the nervous system. It is published 12 times a year Journal of Neurophysiology on December 8, 2008 jn.physiology.org Downloaded from

Upload: dinhnguyet

Post on 27-Apr-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

100:2537-2548, 2008. First published Jul 2, 2008;  doi:10.1152/jn.90529.2008 J NeurophysiolKrakauer Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro Mazzoni and John W.

You might find this additional information useful...

22 articles, 10 of which you can access free at: This article cites http://jn.physiology.org/cgi/content/full/100/5/2537#BIBL

including high-resolution figures, can be found at: Updated information and services http://jn.physiology.org/cgi/content/full/100/5/2537

can be found at: Journal of Neurophysiologyabout Additional material and information http://www.the-aps.org/publications/jn

This information is current as of December 8, 2008 .  

http://www.the-aps.org/.American Physiological Society. ISSN: 0022-3077, ESSN: 1522-1598. Visit our website at (monthly) by the American Physiological Society, 9650 Rockville Pike, Bethesda MD 20814-3991. Copyright © 2005 by the

publishes original articles on the function of the nervous system. It is published 12 times a yearJournal of Neurophysiology

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 2: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

Explaining Savings for Visuomotor Adaptation: Linear Time-InvariantState-Space Models Are Not Sufficient

Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro Mazzoni, and John W. KrakauerMotor Performance Laboratory, The Neurological Institute, New York, New York

Submitted 5 May 2008; accepted in final form 27 June 2008

Zarahn E, Weston GD, Liang J, Mazzoni P, Krakauer JW.Explaining savings for visuomotor adaptation: linear time-invariantstate-space models are not sufficient. J Neurophysiol 100: 2537–2548,2008. First published July 2, 2008; doi:10.1152/jn.90529.2008. Ad-aptation of the motor system to sensorimotor perturbations is a type oflearning relevant for tool use and coping with an ever-changing body.Memory for motor adaptation can take the form of savings: anincrease in the apparent rate constant of readaptation compared withthat of initial adaptation. The assessment of savings is simplified if thesensory errors a subject experiences at the beginning of initial adap-tation and the beginning of readaptation are the same. This can beaccomplished by introducing either 1) a sufficiently small number ofcounterperturbation trials (counterperturbation paradigm [CP]) or 2) asufficiently large number of zero-perturbation trials (washout para-digm [WO]) between initial adaptation and readaptation. A two-rate,linear time-invariant state-space model (SSMLTI,2) was recentlyshown to theoretically produce savings for CP. However, we reasonedfrom superposition that this model would be unable to explain savingsfor WO. Using the same task (planar reaching) and type of perturba-tion (visuomotor rotation), we found comparable savings for both CPand WO paradigms. Although SSMLTI,2 explained some degree ofsavings for CP it failed completely for WO. We conclude that forvisuomotor rotation, savings in general is not simply a consequence ofLTI dynamics. Instead savings for visuomotor rotation involves meta-learning, which we show can be modeled as changes in systemparameters across the phases of an adaptation experiment.

I N T R O D U C T I O N

Perturbations to either environment or physical plant, as wellas direct experimental manipulations of sensory feedback, caninduce sensory error: a discrepancy between observed andpredicted sensory feedback. Motor adaptation refers to whensensorimotor mappings change to reduce sensory error oversuccessive movements. Adaptation can be modeled with state-space models (Cheng and Sabes 2006) that have sensory error(or perturbation) as input, sensorimotor mappings as hiddenvariables (states), and adaptation responses as output. Adapta-tion may reflect how the CNS establishes and maintains sen-sorimotor mappings throughout normal life (Kording et al.2007). Memory for a newly learned mapping/state can take atleast two forms: aftereffects [persistence of the adapted stateinto readaptation (Yamamoto et al. 2006)] and savings [a fasterrate of readaptation compared with that of initial adaptation(Kojima et al. 2004; Krakauer et al. 2005)]. To assess savingsindependently of aftereffects, starting states for initial adapta-tion and readaptation should be equated. Thus far only twostudies that meet this requirement have shown savings—one of

saccadic adaptation (Kojima et al. 2004) and our previousstudy of rotation adaptation for reaching movements (Krakaueret al. 2005). Aftereffects were eliminated in the saccade studyby inserting counterperturbation trials between initial adapta-tion and readaptation (counterperturbation paradigm [CP]).Aftereffects were eliminated in the rotation study by insertinginstead sufficient zero-perturbation trials to washout memoryof the initial adaptation phase (washout paradigm [WO]).

Motivated by findings from the saccadic adaptation study(Kojima et al. 2004), Smith and colleagues (2006) demon-strated via simulation that a linear time-invariant (LTI) state-space model (Cheng and Sabes 2006; Donchin et al. 2003;Thoroughman and Shadmehr 2000) with two states (slow andfast) produces savings in CP. It is helpful to understand thatthis model (which we will refer to as SSMLTI,2), as well as anyLTI SSM used to model adaptation, can be mathematicallyrepresented in various ways. For example, Smith and col-leagues (2006) chose to express output in terms of net senso-rimotor mapping. However, output can be equivalently ex-pressed in terms of sensory error. Likewise, input can beexpressed either in terms of sensory error or perturbation. Thepoint here is that, regardless of the specific form, an LTIsystem obeys superposition: the output to a sum of inputsequals the sum of the outputs to the individual inputs. There-fore no matter whether adaptation is expressed in terms ofsensorimotor mapping or sensory error, savings produced bySSMLTI,2 in CP results not from a change in system parametersbetween initial adaptation and readaptation, but rather fromsuperposition of the adaptation responses to the perturbationscorresponding separately to the 1) initial adaptation, 2) coun-terperturbation, and 3) readaptation phases of CP (Fig. 1).

Given that LTI systems obey superposition, we reasonedthat the SSMLTI,2 could not explain savings in a WO paradigm:Assuming a stable LTI SSM, application of some fixed, non-zero perturbation causes the state (i.e., the sensorimotor map)to change trial by trial such that sensory error is reduced. Iffrom some trial onward, the perturbation is set to zero (i.e.,washout trials are applied), the sensorimotor map must ap-proach in the limit the same value it had prior to the initialadaptation. Thus with a sufficient number of washout trials thesensorimotor map will be arbitrarily close to this initial, naivevalue (corresponding to elimination of aftereffects). Thereforerespecting superposition, the larger the number of washouttrials, the closer the net adaptation response during the read-aptation phase of a WO paradigm will be to the adaptationresponse (time-shifted, of course) to the initial adaptation. Thus

Address for reprint requests and other correspondence: E. Zarahn, MotorPerformance Laboratory, The Neurological Institute, New York, NY 10032(E-mail: [email protected]).

The costs of publication of this article were defrayed in part by the paymentof page charges. The article must therefore be hereby marked “advertisement”in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

J Neurophysiol 100: 2537–2548, 2008.First published July 2, 2008; doi:10.1152/jn.90529.2008.

25370022-3077/08 $8.00 Copyright © 2008 The American Physiological Societywww.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 3: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

in the limit there would be no savings. The same point wasmade by Smith and colleagues (2006) in that they showed viasimulation for SSMLTI,2 that as the number of washout trialsinserted between the counterperturbation and readaptationphases in CP increased (i.e., as CP was converted into WO),the amount of savings tended to zero. Figure 2 illustrates theadaptation responses of the SSMLTI,2 in a WO paradigm whenthe number of washout trials was sufficient to effectively“isolate” the adaptation response during the readaptation phasefrom that of the initial adaptation phase, leading to a lack ofappreciable savings.

Here we measured savings as a change in rate constantsrather than as a change in rates, to further avoid contaminationby aftereffects. When modeling rate constant savings, there isan important distinction to be made (as implied earlier) be-tween apparent rate constants (the empirical rate constantevident during a particular phase of an adaptation experiment)and system rate constants (model parameters that determine theinput–output relationship of the system). Our goal for modelingrate constant savings, then, was to determine whether changes inapparent rate constants were best explained with a system whoseparameters do (varying-parameter SSM: SSMVP) or do not(SSMLTI) change with experience. A system displaying sav-ings as a result of a change in parameters would correspond tometalearning. We first demonstrated rate constant savings for

both CP and WO using the same task (planar reaching move-ments) and the same type of perturbation (visuomotor rota-tion). Then, in each paradigm, for the one- and two-rateSSMLTI and SSMVP, we 1) assessed the ability of each SSM toexplain rate constant savings and 2) quantified model parsi-mony with the information-theoretic measure Akaike Informa-tion Criterion (AIC) (Akaike 1974; Bozdogan 1987; Burnhamand Anderson 2002) to ensure that any potential superiority ofSSMVP over SSMLTI with regard to explanation of savings wasnot offset by overparameterization.

M E T H O D S

Subjects

A total of 14 right-handed subjects volunteered for the study. Allwere naive to the purpose of the study, signed an institutionallyapproved consent form, and were paid to participate. They wererandomly assigned to either the CP [n � 6; mean age (SD) � 25.8(6.6) yr; 3 M] or the WO [n � 8; mean age (SD) � 22.1 (0.6) yr; 6M] experiments.

General experimental protocol

Subjects sat and moved a hand cursor by making planar reachingmovements with the shoulder and elbow over a horizontal surfacepositioned at shoulder level. A center start position and a single target

FIG. 1. Illustration of superposition for two-rate, lineartime-invariant state-space model (SSMLTI,2) [aslow � 0.992,bslow � 0.02, afast � 0.59, bfast � 0.21, taken from the empiricalestimates reported by Smith and colleagues (2006)] in thecounterperturbation paradigm (CP). See Eq. 2 in the main textfor the state-space model. The left column shows the inputperturbation functions (abscissa is movement number n),whereas the right column shows both the outputs [equivalentlyin terms of directional error (red) and net sensorimotor map(black)] and the state variables [slow (blue) and fast (green)].The rows correspond to a decomposition of the net input [(fromtop): initial adaptation stimulus, counterperturbation stimulus,readaptation stimulus, summed inputs]. As a consequence ofsuperposition, the shaded plot in the bottom right corner isequal to both 1) the sum of the outputs to the separate inputs(sum down right column) and 2) the output to the summedinputs (transform from left to right in the bottom row). In the CPparadigm, superposition leads to obvious savings (i.e., a fasterapparent rate of adaptation during the readaptation phase com-pared with the initial adaptation phase). Perturbation functionfor this CP paradigm: 0° for 1 � n � 10, 30° for 11 � n � 100,�30° for 101 � n � 103, 30° for 104 � n � 150.

2538 ZARAHN, WESTON, LIANG, MAZZONI, AND KRAKAUER

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 4: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

(45° clockwise from the 12 o’clock position, diameter 2 cm, 6 cmfrom start position) was projected onto a computer screen positionedabove the arm. This same, single target position was used throughoutthe entire experiment in both CP and WO paradigms (i.e., targetposition was not varied across trials, subjects, or paradigms). Amirror, positioned halfway between the computer screen and the tablesurface, reflected the computer display, producing a virtual image ofthe screen cursor and the target in the horizontal plane of the finger tip.Hand positions calibrated to the position of the finger tip weremonitored using a Flock of Birds (Ascension Technology, Burlington,VT) magnetic movement recording system at a frequency of 120 Hz.Anterior–posterior translation of the shoulder was prevented with arigid frame around the trunk. The wrist, hand, and fingers wereimmobilized with a splint and the forearm was supported on anair-sled system. An opaque shield prevented subjects from seeing theirarms and hands at all times.

Visuomotor rotation paradigms

There were two experimental paradigms, which involved the inser-tion of either counterperturbation (CP) or washout (WO) trials be-tween initial adaptation and readaptation, respectively. A single targetposition was used in both paradigms (see General experimentalprotocol). The sign convention used for rotation was that counter-clockwise rotation corresponded to positive angles. In both paradigmssubjects were first familiarized with 40 baseline trials (0° rotation). InCP they then performed 80 trials of a �30° rotation (initial adapta-

tion) followed by 8 trials of a �30° rotation (counterrotation) fol-lowed by another 80 trials of the �30° rotation (readaptation). In WO,after the baseline trials subjects performed 80 initial adaptation trialswith a �45° rotation followed by 40 trials of 0° rotation (washout)followed by 80 readaptation trials with �45°. The reason we chose arotation magnitude of 45° for the WO paradigm was to increase thesignal-to-noise ratio of the adaptation data relative to our previousstudy, which used 30° (Hinder et al. 2007; Kojima et al. 2004;Krakauer et al. 2005). We would have chosen a 45° rotation magni-tude for CP as well, but chose not to because a �45° rotation wouldhave meant an initial error of �90° in the deadaptation phase; errorsof this size may provoke cognitive strategies (Imamizu et al. 1995),which we sought to avoid. We empirically address this potentialconfound of different rotation magnitudes early in the RESULTS section.

Measurement of savings

In both CP and WO paradigms, apparent first-order rate constantswere estimated (via nonlinear least squares) per subject separately forinitial adaptation and readaptation from the first 30 values of direc-tional error e[n] according to

e�n� � aecn � b (1)

where n is movement number, c is the rate constant (in units ofmovements�1), and a and b are additional free parameters (both inunits of degrees). Src was defined as cinitial adaptation minus creadaptation.

FIG. 2. Illustration of superposition for SSMLTI,2 in thewashout paradigm (WO). Formatting and system constants arethe same as in Fig. 1. The rows correspond to a decompositionof the net input [(from top) initial adaptation stimulus, washoutphase, readaptation stimulus, summed inputs]. Because a suf-ficient number of washout trials was inserted between the initialadaptation and readaptation stimuli to bring the state vectorclose to its naive value of (in this case) 0, the apparent rates ofadaptation during the readaptation and initial adaptation phasesare not nearly as distinguishable as in CP (Fig. 1). However, thedirectional error output (and thus savings) in response to thesummed perturbations can be predicted simply from the super-position of the directional errors caused by the individualperturbations (this superposition being the red curve in theshaded plot), without additional concern for the values of theslow and fast components of the state vector (blue and greencurves, respectively, in shaded plot). Perturbation function forthis WO paradigm: 0° for 1 � n � 10, 30° for 11 � n � 100,0° for 101 � n � 200, 30° for 201 � n � 280.

2539COMPARING MODELS OF SAVINGS

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 5: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

Rate constant savings would correspond to Src �0. This method isreasonable to the extent that the early phase of adaptation can be wellapproximated by first-order behavior, even though the best model forthe net data might be of higher order (Smith et al. 2006).

To assess the degree to which the various SSMs captured Src, Eq.1 was also fit to the various SSM fits. That is, the SSM fits (seeState-space modeling) were simply treated as e[n] and fit by Eq. 1,again, per subject and separately for the initial adaptation and read-aptation phases. The basic logic is that if a given SSM fits the datawell, the fit of Eq. 1 to the fit of the SSM should capture Src well. Wenote that our variance estimator for Src comes from the variation inestimated Src across subjects (not the “residual” variation about eachfit) and, in such a case, statistical inference will be valid even when“fitting to fits.”

State-space modeling

SSMs describe the entire e[n] movement series for a given exper-iment, and (unlike Eq. 1) are not fit separately to the initial adaptationand readaptation phases. The parameters of each type of SSM ofinterest (described in the following text) were estimated separately ineach subject. Linear, discrete-time SSMs for modeling motor learningdata have been discussed by Cheng and Sabes (2006). The SSMs weuse correspond to their Eq. 3.9 and we use their notation (except thatwe use lowercase boldface for vectors, uppercase boldface for matri-ces, and italics for scalars). Perturbation (visuomotor rotation angle indegrees) r[n] and the output (reach direction at peak velocity relativeto the target direction in degrees) y[n] were scalars (e[n] � r[n] �y[n]). The state vector on movement n, x[n] (�{xslow[n] xfast[n]}T)represents the components of the sensorimotor transformation, i.e., theangular discrepancy between the target direction and movementdirection, on trial n. Therefore xslow and xfast are also in units ofdegrees. The state update equation is

x�n � 1� � Ax�n� � b�r�n� � y�n� � bx� � ��n�

� Ax�n� � b�e�n� � bx)���n� (2)

where

A � � aslow 00 afast

�is the matrix of dimensionless retention rates, b � [bslow bfast]

T is thevector of dimensionless learning rates, bx is sensorimotor bias (de-grees), and the state noise vector � N(0, �state

2 I). The equation forthe reach direction on trial n � 1 is

y�n � 1� � cTx�n � 1� � ��n � 1� (3)

with the output noise � N(0, �output2 ), and c � [1 0]T for one-rate

models or [1 1]T for two-rate models. For one-rate models afast andbfast were constrained to be zero. Equation 2 can easily be reparam-eterized to have r be the input instead of e, and in figures we will useperturbation as the input (because it is under direct experimental control).Likewise, we will discuss the output in terms of both y and e.

The initial state was set equal to its steady-state value under zeroperturbation from �, which is

x�1� � ����afast � bfast � 1�bslowbx � bfastbslowbx�

�aslow � bslow � 1��aslow � bslow � 1� � bslowbfast

���aslow � bslow � 1�bfastbx � bfastbslowbx�

�aslow � bslow � 1��afast � bfast � 1� � bslowbfast

�The variance of x[1] was �initial

2 I.The SSM described by Eqs. 2 and 3 with fixed [aslow bslow afast bfast]

is LTI (and so referred to as SSMLTI). We also used a version of theabove-cited SSM in which [aslow bslow afast bfast] were allowed to

take on different values for the experimental phases of initial adap-tation, counterperturbation for CP (or washout for WO), and readap-tation. We refer to these SSMs as “varying-parameter” SSMs (abbre-viated as SSMVP). SSMVP are non-LTI (or, more precisely, notnecessarily LTI) because they need not satisfy superposition. The ideabehind using SSMVP is that the experience of a perturbation during anearly experimental phase (e.g., initial adaptation) might change thesystem parameters during a later phase (e.g., readaptation): Considera particular system initially (i.e., in the absence of prior nonzero input)displaying LTI behavior. Let r1[n] be a perturbation function takingon a particular nonzero value for 0 � n � N and a value of zeroeverywhere else, and let the response of the system to r1[n] be y1[n].Let r2[n] � r1[n � L] with L � N, and let the response of the systemto r2[n] be y2[n]. Let r3[n] � r1[n] � r2[n], and let the response of thesystem to r3[n] be y3[n]. If the occurrence of the perturbation reflectedin r1[n] changes the parameters of the system, then y3[n] will not equaly1[n] � y2[n]. This is because y2[n] � y1[n � N], whereas theresponse to r2[n] having been preceded by r1[n] (i.e., to r2[n]as a component of r3[n]) will not equal y1[n � N] as [aslow

bslow afast bfast] will have been changed as a consequence of r1[n].Thus representing the system transform as T(r), we would haveT(r1[n] � r2[n]) � T(r3[n]) � y3[n] � y1[n] � y1[n � N] � y1[n] �y2[n] � T(r1[n]) � T(r2[n]), and thus T(r) would be a non-LTI system.

The reason we use the abbreviation SSMVP for these varying-parameter SSMs as opposed to simply SSMnon-LTI is that SSMVP isone very particular type of non-LTI SSM among many. We choseSSMVP from among the immense class of non-LTI models becausealthough it can manifest experience dependence, it is LTI withinphase, which is congruent with our impressions of directional errordata from previous visuomotor rotation paradigms (Krakauer et al.2005).

We considered one- and two-rate versions of the SSMLTI and theSSMVP. Numbers of free parameters (k) per SSM were as follows:k � 6 for SSMLTI,1; k � 8 for SSMLTI,2; k � 10 for SSMVP,1; and k �16 for SSMVP,2. An explicit form for the likelihood f(e � p) of thedirectional error e � r � y � {r[Ninitial] � y[Ninitial], r[Ninitial � 1] �y[Ninitial � 1], . . . , r[Nmax] � y[Nmax]}T) for each of the four SSMswas derived from Eqs. 2 and 3. For SSMLTI, p � [aslowbslow afast bfast

bx �initial2 �state

2 �output2 ]T; p was similar for SSMVP except that the

values of [aslow bslow afast bfast] were allowed to be different duringthe three phases of both CP and WO. Ninitial � 31 and Nmax � 160;for CP, Ninitial � 31 and Nmax � 190 for WO (see next paragraph forthe reason that all 240 movements were not used). These values werechosen to allow fitting from the last 10 zero perturbation trials beforethe initial adaptation up until 30 trials into the readaptation. The formof loge [ f(y � p)] corresponding to Eqs. 2 and 3 was derived such that,unlike expectation-maximization (Cheng and Sabes 2006; Shumwayand Stoffer 1982), the states were not explicitly represented; thereby,maximum likelihood estimates (MLEs; Shao 2003) p of p wereobtained from e of each subject for each of the four SSMs byminimizing �loge [ f(y � p)] with respect to p using the MATLAB7.4a (The MathWorks, Natick MA) routine fmincon via the method ofLevenberg–Marquardt. Fits were also obtained to across-subject av-erages of e, but these fits were used for display only, not for modelselection (see AIC). The following linear constraints were used toreduce the occasion of nonconvergence: 0 � aslow, afast � 1.1; 0 �bslow, bfast � 0.8; �30 � bx � 30; 1 � �initial

2 � 200; 0.1 � �state2 �

200; 1 � �output2 � 200; aslow � bslow � 0.001; afast � bfast � 0.001.

All fits were initialized with the values pinitial � [0.99 0.05 0.4 0.2 010 10 10]T.

In expectation-maximization, one iteratively maximizes (with re-spect to p) the expectation of loge [ f(x, y � p)] (with respect to xconditioned on y and p); for SSMLTI, this expectation has a compu-tationally simple form (Cheng and Sabes 2006; Shumway and Stoffer1982). However, we did not use the expectation-maximization methodof obtaining the MLE of p because we were not sure how toimplement this method with SSMVP. Instead, for both SSMLTI and

2540 ZARAHN, WESTON, LIANG, MAZZONI, AND KRAKAUER

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 6: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

SSMVP, we maximized the more complicated loge [ f(y � p)]; thecomputational expense of determining loge [ f(y � p)] was the onlyreason the entire 240 movement data sets were not used to obtainMLEs.

Obtaining MLEs of e (both for plotting and for assessment ofexplanation of savings) involved substituting the expression for y[n]from Eq. 3 into Eq. 2, eliminating all random terms, and substitutingp for p

x�n � 1� � Ax�n� � b�r�n� � cTx�n� � bx) (4)

e�n � 1� � r�n � 1� � cTx�n � 1� (5)

This fit e is purely a function of p and the deterministic r, and so (byvirtue of the r we used in either paradigm) is not “bumpy”: erepresents our best estimate of the expected value of e[n] given p andr. This can be contrasted with Kalman filtering (not used here), whichyields the best estimate of the expected value of e[n] given p, r, ande[n � 1] (Shumway and Stoffer 1982), which would tend to be“bumpy.” Although this latter type of fit better follows the data, it isnot limited to the estimated deterministic response of the system,which is all that is of interest here.

AIC

Because the four different SSMs described earlier have differentnumbers of free parameters (k), model parsimony becomes an issue.This is because simply increasing k will improve apparent model fit(i.e., increase loge [ f(y � p)]), even if the extra parameters are irrelevantto the true process generating the data. Thus the SSM with moreparameters might conceivably appear to explain more savings thananother, but in a manner not “worth” the extra parameters because addingextra parameters tends to reduce the stability of fits over repeatedmeasurements (Stone 1977). The AIC provides a way to rank a set ofcandidate models in terms of how well they fit the data (Akaike 1974;Bozdogan 1987; Burnham and Anderson 2002) while accounting for theeffect of varying k. The AIC for the ith candidate model is

AIC i � �2loge �fi�y � pi�� � 2ki (6)

and it is in units of information (Burnham and Anderson 2002). Let ui

equal the expectation (with respect to the data sample) of the Kull-bach–Leibler mean information for discrimination between candidatemodel i and the true data generating process; ui is the risk function(i.e., the function to be minimized over i) for information-theoreticmodel selection and is related to total prediction error (Akaike 1974;Bozdogan 1987; Burnham and Anderson 2002). AICi � AICj is anapproximately unbiased estimator of ui � uj (Akaike 1974; Bozdogan1987; Burnham and Anderson 2002). That is, the AIC difference be-tween two candidate models is an approximately unbiased estimator oftheir difference in information-theoretic model selection risk. However,the AIC is known to demonstrate a bias with respect to model selectionrisk toward selection of more overparameterized models (Hurvich andTsai 1991), which will temper our inferences accordingly.

Since the difference in AIC between two models is only anestimator of their difference in risk, to control the false-positive ratewhen comparing the risk between models using the AIC wouldrequire a statistical test. Here, the null hypothesis of zero differencebetween pairs of SSMs in the population average risk was assessed viapaired t-test using a two-tailed � � 0.05. Because AIC differences donot have normal distributions, the use of t-test is not formally correct;however, parametric tests tend to be robust to violations of normality(Kirk 1982). Nevertheless, the Shapiro–Wilk W test (Shapiro andWilk 1968) was used to assess the assumption of normality on theAIC differences for each of the six SSM pairings in both CP and WOparadigms; a threshold of � � 0.05 for each W test was used as acriterion for deciding whether the violation of the normality assump-tion was acceptable.

We remark that although performing parametric statistical tests onindependent and identically distributed samples of AICs is not a verycommon procedure, and was frowned on by Burnham and Anderson(2002), we cannot see any fundamental problem or contradiction indoing so. Indeed, we see it as a strength, given the likely variabilitybetween subjects in the relative risk between models, variability thatis explicitly accounted for in statistical testing.

Using the AIC for model selection is different from statisticalhypothesis testing of additional SSM parameters (one possible alter-native to the AIC for model selection), which controls type I error rate(�) for a null hypothesis. Arguments against using hypothesis testingto perform model selection include the arbitrary selection of �, themultiple-comparison problem, the dependence of results on the orderof entering variables in stepwise regression, and the philosophicalissue of whether any null hypothesis can ever be true (Akaike 1974;Bozdogan 1987; Burnham and Anderson 2002).

R E S U L T S

Adaptation curves

The directional errors for both CP and WO manifested theessential qualitative behavior expected from previous studiesof adaptation to visuomotor rotation (Krakauer et al. 1999,2000; Wigmore et al. 2002). CP directional error data averagedacross subjects (n � 6) is shown in Fig. 3A (data from arandomly selected single subject is shown in SupplementalFig. S1A).1 With respect to the CP data averaged acrosssubjects, directional error on the first trial during the initialadaptation phase was on average �33°, which is very close tothe value of the perturbation value (�30°) after taking intoaccount a small sensorimotor bias, which led to a �4° offsetduring baseline trials. As expected, directional error decreasedthroughout the course of initial adaptation, approaching anasymptotic level of adaptation of approximately �6°. The firsttrial of counterrotation (�30°) had a directional error of �54°,as expected from the asymptotic level of directional errorduring initial adaptation, which then increased to �23° on theeighth movement with counterrotated feedback. The first trialof readaptation had a directional error of �36° (i.e., within 3°of the first trial of initial adaptation, which indicates thataftereffects were to a good approximation eliminated). Byvisual inspection, the apparent rate constant of readaptationwas substantially more negative (i.e., smaller decay time con-stant) than that of initial adaptation.

Figure 3B shows the directional error data averaged acrosssubjects (n � 8) for WO (data from a randomly selected singlesubject are shown in Supplemental Fig. S1B). This paradigmyielded qualitatively very similar results to those of CP, ac-counting for the larger magnitude of the perturbation (�45° forWO vs. �30° for CP). Also, as expected, the directional errorduring the first trial of washout was approximately �36°,which is less than the magnitude of the 45° perturbation duringinitial adaptation (whereas in contrast, in CP the directionalerror during the first trial of counterrotation was �54°, whichis substantially larger in magnitude than the correspondingvalue of the perturbation during initial adaptation). As in CP,aftereffects were successfully eliminated. Also as in CP, visualinspection suggests that the apparent rate constant of readap-tation was substantially more negative than that of initialadaptation.

1 The online version of this article contains supplemental data.

2541COMPARING MODELS OF SAVINGS

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 7: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

A potential confound in comparing and contrasting rateconstant savings, Src, in the CP and WO paradigms was thatthey were associated with different rotation (perturbation)amplitudes during initial adaptation (�30° for CP and �45°for WO; see Visuomotor rotation paradigms, in METHODS),which could possibly be associated with different rate con-stants of adaptation. To assess this possibility, we comparedthe rate constants of initial adaptation between CP and WO. Asingle exponential (Eq. 1) was fit to the first 30 movements ofthe initial adaptation data from each subject in each paradigmto obtain an estimate of the apparent adaptation rate constant.The exponential rate constants (in units of movements�1)during the initial adaptation phase were (mean � SD) �0.16 �0.14 for CP and �0.17 � 0.16 for WO, which were notsignificantly different [t(12) � �0.13, two-tailed P � 0.90].This shows that, on average, rate constants of initial adaptationwere very similar between CP and WO (which is what wouldbe expected under a LTI system), despite the different magni-tudes of visuomotor rotation. For completeness, we note thatthis implies that their rates of initial adaptation (in units ofdegrees �movements�1) were different (greater for WO).

Besides the requirement for the LTI system to have a singlerate constant regardless of perturbation amplitude, the outputamplitude must also be strictly proportional to perturbation

amplitude. This was supported as the average ratio of estimatedoutput amplitude (i.e., a in Eq. 1) to perturbation amplitudewas very similar [t(12) � 0.08, two-tailed P � 0.94] for CP(0.922 � 0.011) and WO (0.916 � 0.029). This findingprovides further support for an LTI system being a reasonableapproximation to initial adaptation. It also implicitly providesevidence against adaptation having appreciable saturation orsupralinear (two other types of nonlinear SSMs distinct fromSSMVP) characteristics in this perturbation range because thisratio would have been different between CP and WO if it did.

Initial adaptation: one-rate or two-rate?

In the immediately preceding text, we approximated the first30 trials of initial adaptation as a single exponential to estimateapparent adaptation rate constants. However, this does notmean that the adaptation curves do not have multirate behavior.Therefore a preliminary question we sought to address iswhether initial adaptation to a visuomotor rotation is a better fitby SSMLTI,1 or SSMLTI,2 (SSMVP values were not relevanthere because initial adaptation constitutes only one phase).Data reported for adaptation to a viscous-curl force fieldsuggested a multirate (e.g., two-rate) system during initialadaptation (Smith et al. 2006). Determining whether there are

FIG. 3. The perturbation r[n] (thick gray line), across-sub-ject averaged directional error e[n] (open diamonds) and SSMMLE fits (black line: SSMLTI,1, green line: SSMLTI,2, blue line:SSMVP,1, red line: SSMVP,2) for the (A) CP and (B) WOparadigms.

2542 ZARAHN, WESTON, LIANG, MAZZONI, AND KRAKAUER

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 8: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

similarly two rates present during initial adaptation to a visuo-motor rotation will be important for interpreting how thevarious SSM models fit CP and WO directional error data intheir entirety.

Initial adaptation data were fit better by a SSMLTI,1 than by aSSMLTI,2, significantly in CP [for AIC1-rate LTI minus AIC2-rate LTI:t(5) � �37.9, two-tailed P 0.001] and only as a trend forWO [for AIC1-rate LTI minus AIC2-rate LTI: t(5) � �2.10,two-tailed P � 0.07]. (The extremely high t-value for theformer comparison was mainly due to very small variabilityacross subjects in [AIC1-rate LTI minus AIC2-rate LTI] for initialadaptation in CP; the average values of [AIC1-rate LTI minusAIC2-rate LTI] for initial adaptation were �3.70 and �1.58 forCP and WO, respectively.) These results show that, ostensiblyunlike adaptation to viscous-curl force fields (Smith et al.2006), initial adaptation to either a �30° (in CP) or �45° (inWO) visuomotor rotation is better fit by a SSMLTI,1 than aSSMLTI,2. As a technical aside, we emphasize that these SSMfits were to the initial adaptation phase only, in contrast to theSSMVP fits to all three phases we report later (see SSM fits)that, despite allowing for different learning and retention ratesin each of the three paradigm phases, have a single sensori-motor bias parameter.

Savings

The presence of Src in both CP and WO paradigms wasconfirmed by comparing the apparent rate constants from theinitial adaptation and readaptation phases. This method ofmeasuring savings did not explicitly involve assuming anySSM but simply relied on the qualitative impression that atleast the first few trials of motor adaptation to a constantperturbation is reasonably well modeled by Eq. 1 (Caithnesset al. 2004; Krakauer et al. 2005; Mazzoni and Krakauer 2006),

although this would correspond to the response of SSMLTI,1 toa constant perturbation. The rate constant c was significantlymore negative (corresponding to faster learning, i.e., Src) forthe readaptation than for the initial adaptation phase in both CP[t(5) � 3.05, one-tailed P � 0.014] and WO [t(7) � 3.73,one-tailed P � 0.004]. Furthermore, the magnitude of Src wasnot significantly different between CP and WO [cinitial adaptationminus creadaptation (mean � SD) � 0.48 � 0.39 for CP and0.47 � 0.36 for WO; t(12) � �0.056, two-tailed P � 0.96].Also, we keep in mind the finding that apparent rate constantsof initial adaptation were very similar between CP and WO(see Adaptation curves). Thus the two adaptation paradigmsCP and WO did not appreciably differ from one another eitherin terms of Src (which was robust in both) or rate constants ofinitial adaptation (despite different perturbation magnitudes).

SSM fits

MLE fits of the four SSMs to e[n] (simultaneously to allthree phases of either the CP or WO paradigms; see State-spacemodeling in METHODS) were computed separately in each sub-ject. The across-subject averages of the parameter estimates areprovided in Table 1. To collectively illustrate the character ofthe fits, MLEs of the SSMs were also determined for theacross-subject averaged time courses for both CP (Fig. 3A) andWO (Fig. 3B). Although quantitative comparisons betweenSSMs based on the fits obtained per subject are providedsubsequently, a qualitative impression of the across-subjectaveraged fits is provided here: SSMLTI,1 did a poor job ofexplaining savings in both paradigms, yielding initial adapta-tion that was too fast and readaptation that was too slow; thisSSM also poorly fit the sensorimotor bias. SSMLTI,2 did areasonable job in CP but did a very poor job in the WOparadigm in which it manifested the same problem as

TABLE 1. Across-subject averages of the maximum likelihood estimates of SSM parameters

CP WO

SSMLTI,1 SSMLTI,2 SSMVP,1 SSMVP,2 SSMLTI,1 SSMLTI,2 SSMVP,1 SSMVP,2

Phase 1aslow 0.924 (0.043) 0.991 (0.012) 0.971 (0.028) 0.986 (0.016) 0.960 (0.031) 0.983 (0.025) 0.979 (0.016) 0.986 (0.016)bslow 0.137 (0.057) 0.062 (0.016) 0.081 (0.039) 0.112 (0.057) 0.245 (0.099) 0.159 (0.046) 0.149 (0.078) 0.116 (0.048)afast N/A 0.629 (0.102) N/A 0.369 (0.350) N/A 0.519 (0.318) N/A 0.492 (0.324)bfast N/A 0.158 (0.051) N/A 0.003 (0.005) N/A 0.193 (0.104) N/A 0.077 (0.076)

Phase 2aslow N/A N/A 0.779 (0.150) 0.946 (0.140) N/A N/A 0.846 (0.102) 0.891 (0.099)bslow N/A N/A 0.093 (0.089) 0.064 (0.058) N/A N/A 0.242 (0.141) 0.144 (0.127)afast N/A N/A N/A 0.573 (0.263) N/A N/A N/A 0.480 (0.284)bfast N/A N/A N/A 0.120 (0.079) N/A N/A N/A 0.230 (0.181)

Phase 3aslow N/A N/A 0.834 (0.077) 0.792 (0.388) N/A N/A 0.955 (0.038) 0.975 (0.034)bslow N/A N/A 0.318 (0.187) 0.170 (0.123) N/A N/A 0.375 (0.137) 0.330 (0.170)afast N/A N/A N/A 0.400 (0.347) N/A N/A N/A 0.548 (0.365)bfast N/A N/A N/A 0.175 (0.256) N/A N/A N/A 0.088 (0.114)

Sensorimotor biasbx 2.707 (5.330) �3.405 (2.233) 3.189 (43.382) �4.101 (1.758) �3.065 (10.986) �4.071 (8.599) 2.562 (28.544) �3.098 (13.398)

Hyperparameters�initial

2 36.107 (34.135) 1* (0.00) 48.071 (43.382) 1* (0.00) 1* (0.00) 1* (0.00) 12.209 (28.544) 5.737 (13.398)�output

2 13.358 (4.537) 12.791 (3.604) 17.417 (4.702) 14.925 (5.970) 13.680 (10.986) 7.482 (8.599) 17.699 (12.473) 16.469 (12.408)�state

2 6.378 (3.550) 2.530 (1.643) 1.255 (2.270) 1.167 (2.479) 9.911 (5.504) 8.195 (7.624) 3.539 (2.953) 2.123 (1.644)

Values are means, with SDs in parentheses. Due to our parameterization of the SSMs, the initial state and state noise variance estimates for two-rate modelsneed to be multiplied by 2 to be comparable to the corresponding variance estimates for one-rate models. That the average and SD of phase 1 aslow are the samein CP and WO to three decimal places for SSMVP,2 is not a typographical error. An asterisk denotes that all subjects in that sample had estimates equal to a boundplaced in that parameter. N/A, parameter not applicable for that SSM.

2543COMPARING MODELS OF SAVINGS

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 9: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

SSMLTI,1: initial adaptation that was too fast and readaptationthat was too slow. Although the SSMVP,1 did a reasonable jobfitting WO, it did a much less reasonable one in CP in which itclearly misfit the baseline offset. We speculate this misfittingof the baseline offset was due to a competition between thesensorimotor bias term, on the one hand, and the learning andretention rates during initial adaptation, on the other, both ofwhich determine the offset from zero directional error inbaseline trials. Similar misfitting of the baseline offset by thisSSM was also apparent in five of six of the individual subjectfits (data not shown). SSMVP,2 fit the data well overall in bothparadigms.

Table 2 shows the percentage of Src explained by the variousSSMs in both paradigms. The SSMLTI,1 was unable to explainSrc in either paradigm. The SSMLTI,2 was able to explain anontrivial amount Src in CP but, as expected, negligibly ex-plained Src in WO. Both of the SSMVP explained substantialamounts of Src in both CP and WO, although the SSMVP,1overestimated Src in CP.

It is important to consider the number of SSM parameters inaddition to how well an SSM seems to fit the data overall.Model selection risk corresponds to net prediction error, whichincludes both systematic and random sources of misfitting(Akaike 1974). All other things being equal, increasing thenumber of parameters increases model selection risk. The AICis an approximately unbiased estimator of model selection riskin that the AIC not only penalizes systematic misfitting by themodel [through a dependence on loge (likelihood)] but alsocorrects in a theoretically reasoned way (Akaike 1974) for thenumber of model parameters. The across-subject average AICs(not the AICs corresponding to fits to the across-subject aver-age data) of the various candidate SSMs to CP (trials 31–160)and WO (trials 31–190) are shown in Fig. 4, A and B, respec-tively. Plotted together with the AICs are the �2 loge (likeli-hood) values (like AIC, the smaller the better). The �2 loge(likelihood) values (which do not penalize the number ofparameters) decrease necessarily as the number of nested SSMmodel parameters k increases (Burnham and Anderson 2002).In contrast, the AIC, which equals �2 loge (likelihood) � 2k,does not have to decrease as k increases (e.g., Fig. 4B).

For descriptive purposes, for each SSM pairing the propor-tion of subjects with AIC differences in a given direction areprovided in Tables 3 and 4 for CP and WO, respectively. Tocompare the population average model selection risk betweenSSMs, paired t-tests on the AIC values from different SSMpairings were performed (two-tailed � � 0.05). The Shapiro–Wilk W test was used to assess the t-test assumption ofnormality on the AIC differences for each of the six SSMpairings in both CP and WO paradigms; none of the W valuesfor any of the pairings in either paradigm was significant at� � 0.05 (12 W tests: median P value � 0.49, P value range �0.09–0.93), indicating insufficient evidence to reject the nullhypothesis of normality for any of the pairings. We therefore

proceeded with the use of t-test. The AIC of the SSMLTI,2 wassignificantly better than that of the SSMLTI,1 in both CP [t(5) �4.24, two-tailed P � 0.008] and WO [t(7) � 2.58, two-tailedP � 0.036]. The AIC of the SSMVP,2 was better (although notsignificantly so at two-tailed � � 0.05) than the SSMLTI,2 inboth the CP [t(5) � 1.76, two-tailed P � 0.138] and WO[t(7) � 1.44, two-tailed P � 0.192]. That the AIC was better(albeit not significantly) for SSMVP,2 than for SSMLTI,2 in CP,is a pivotal finding. This is because if the SSMLTI,2 was a goodapproximation to the true data generating process in CP, its modelselection risk in CP would be better than that of SSMVP,2 (becauseSSMVP,2 has more parameters than SSMLTI,2). However, we keepin mind the bias of AIC toward overfitting with respect to modelselection risk (Hurvich and Tsai 1991).

The AICs of SSMVP,1 and SSMVP,2 were not substantially differ-ent in CP [AIC1-rate varying-parameter minus AIC2-rate varying-parameter:t(5) � 0.34, two-tailed P � 0.749], although the former trendedtoward being favored over the latter in WO [AIC1-rate varying-parameterminus AIC2-rate varying-parameter: t(7) � �2.29, two-tailed P �0.056]. Because these results did not indicate that SSMVP,2 wasproviding a consistently (i.e., in both CP and WO) better fitthan SSMVP,1, we were curious to see how multirate behaviorwas manifested, if at all, in any phase of either paradigm. Tothis end, we plotted the fast and slow state (Eq. 2) estimatesfrom SSMVP,2 fit to the across-subject averaged e[n], for bothCP and WO. For CP, it appears that the fast state is substantialonly during the counterperturbation phase (Fig. 5). Analo-gously, for WO the fast state is most apparent during thewashout phase. This absence of a salient fast system duringinitial adaptation is expected based on our finding that theinitial adaptation phase is best fit by a SSMLTI,1 (see Initialadaptation: one-rate or two-rate?). Figure 5 furthermore sug-gests essentially one-rate behavior for both CP and WO duringreadaptation, which indicates that Src relies almost entirely ona change in the parameters of the dominant, slow state. It maybe that for adaptation to visuomotor rotation, fast states emergesubstantially only when the net state is returning to 0°, asoccurs during counterperturbation and washout.

D I S C U S S I O N

To provide a more pure assessment of savings than that ofmost previous studies we eliminated aftereffects via either CPor WO, and we also used Src rather than rate savings. Compa-rable Src was observed for CP and WO. SSMLTI,2 was able toexplain on average 65% of Src in CP but only 1.5% of Src inWO. In terms of SSMLTI,2, this is because the fast and slowstates, when subjected to enough washout trials, will both(aside from any sensorimotor bias) get arbitrarily close inexpectation to zero (and this is true of any arbitrary order ofLTI model, not just second-order). Since SSMLTI parametersare fixed, the bringing of its state variables close to initialconditions will make its net output response during a second

TABLE 2. Percentage of savings explained by state-space model

SSMLTI,1 SSMLTI,2 SSMVP,1 SSMVP,2

CP 3.6 � 10�5 � 2.9 � 10�4 (8.5 � 10�5) 64.8 � 57.0 (46.8) 156.6 � 81.3 (152.2) 109.3 � 81.3 (113.2)WO �9.1 � 10�5 � 3.2 � 10�4 (0.0) 1.5 � 2.1 (0.18) 87.5 � 29.1 (87.8) 86.1 � 22.8 (82.6)

Values are means � SD; median values are in parentheses.

2544 ZARAHN, WESTON, LIANG, MAZZONI, AND KRAKAUER

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 10: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

rotational perturbation look much like that during initial adap-tation, and increasingly so with increasing numbers of washouttrials. Given the empirical rate of initial adaptation, the 40washout trials used in WO were sufficient to well approximatecomplete washout, making it impossible for a SSMLTI toexplain Src in WO. In contrast, both SSMVP,1 and SSMVP,2(which can be non-LTI in the perturbation inputs) explained�85% of Src in both CP and WO. Furthermore, as measured bythe AIC both SSMVP,1 and SSMVP,2 fit the overall adaptationmovement series data better than either SSMLTI in both CP andWO (although population-level inference was not significant).Together, these empirical findings lead to the following con-clusions for adaptation to visuomotor rotation: 1) Src can occureven with complete elimination of aftereffects via washoutbetween initial adaptation and readaptation, confirming a pre-vious report (Krakauer et al. 2005); 2) this savings cannot bereasonably explained by a SSMLTI,2 (and from theory, not byany SSMLTI); and 3) Src seen with counterperturbation as wellas Src seen with washout are both more parsimoniously ex-plained as the consequence of experience-dependent changesin learning and retention parameters (as a result of initialadaptation) rather than as a property of a multirate LTI system.

Src observed here in WO for single target adaptation isconsistent with the savings described in a our previous workfor multitarget rotation adaptation (Krakauer et al. 2005).Hinder and colleagues reported absence of savings in a WOparadigm (Hinder et al. 2007) but savings was determined fromthe fits of a power function to directional error (e[n] � c1n2

c),which is a poor approximation because empirical adaptationcurves tend not to asymptote at zero, whereas power functionsmust (a nonzero asymptote can be accommodated by, say, aSSMLTI,1 by having a retention rate 1 and/or a sensorimotorbias). Perhaps this is why no savings was detected for WO eventhough visual inspection of their raw adaptation data (Fig. 2bfrom that report) suggests otherwise. Thus at the present timeit would seem that savings can indeed occur even after washouteliminates aftereffects. This does not preclude the possibilitythat prolonged washout might eliminate savings, which isanother potential explanation for the lack of savings reportedby Hinder and colleagues (2007) and in the study by Kojima

and colleagues (2004), where savings was not seen aftercounterperturbation followed by washout trials.

Recently, in a perturbation/counterperturbation/error-clampadaptation paradigm using viscous-curl force fields, it wasdemonstrated that SSMLTI,2 could explain spontaneous recov-ery, a transient excursion of motor output during movementsunder error clamp in the same direction as that seen duringadaptation to the initial perturbation (Smith et al. 2006). Theearly component of spontaneous recovery in SSMLTI,2 is at-tributable to the rapid decay of the fast state, whereas thesluggish decay back to baseline is attributable to the slowdecay (i.e., better retention) of the slow system. Spontaneousrecovery has also been observed with saccades (Kojima et al.2004). Smith and colleagues also showed via simulation thatSSMLTI,2 produces Src for CP. A heuristic explanation ofsavings for the SSMLTI,2 in CP (with the convention that theinitial perturbation is positive) is that the movements withcounterrotated feedback eventually (when the net statereaches zero) bring the fast state to take on a substantialnegative value (with the slow state having a value that isequal in magnitude but positive). That the fast system has avalue substantially away from zero on the first readaptationtrial (unlike the first initial adaptation trial) in conjunctionwith the fact that the fast system retention parameter issmaller than that of the slow system, allow the fast systemto respond with a more rapid correction to the perturbationduring readaptation than to the perturbation during initialadaptation (Smith et al. 2006). However, perhaps a moreclear way to understand the Src produced by SSMLTI,2 in CPis that it is simply the result of the superposition of theseparate adaptation responses to the initial adaptation, coun-terperturbation, and readaptation perturbations (criticallywith the latter necessarily being identical to that for initialadaptation). Thinking in terms of superposition, we can alsounderstand why SSMLTI,2 cannot produce savings in WOwithout worrying explicitly about the slow and fast systems:the net output seen during readaptation will be a superpo-sition of the separate adaptation responses to the initialadaptation, washout, and readaptation perturbations; thesystem response to the initial adaptation, however, has

TABLE 3. Number of subjects showing AIC differencesin a particular direction in CP

SSMLTI,1 SSMLTI,2 SSMVP,1 SSMVP,2

SSMLTI,1 N/A 6/6 6/6 5/6SSMLTI,2 — N/A 4/6 5/6SSMVP,1 — — N/A 3/6SSMVP,2 — — — N/A

The value of each cell is the proportion of subjects whose AIC is lower (i.e.,better) for the SSM of the column relative to the SSM of that row.

TABLE 4. Number of subjects showing AIC differencesin a particular direction in WO

SSMLTI,1 SSMLTI,2 SSMVP,1 SSMVP,2

SSMLTI,1 N/A 7/8 6/8 6/8SSMLTI,2 — N/A 6/8 6/8SSMVP,1 — — N/A 2/8SSMVP,2 — — — N/A

The value of each cell is the proportion of subjects whose AIC is lower (i.e.,better) for the SSM of the column relative to the SSM of that row.

FIG. 4. The across-subject averaged Akaike InformationCriterion (AIC, black) and �2 loge (likelihood) (gray) values(to within an additive constant) of the candidate models for the(A) CP and (B) WO paradigms. Both measures assess model fit,but only the AIC penalizes for the number of parameters. Foreach measure, it is only the differences between models (withinparadigm) that are meaningful. The units of both the AIC and�2 loge (likelihood) are information, in the information-theo-retic sense.

2545COMPARING MODELS OF SAVINGS

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 11: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

decayed essentially to nil during the washout phase (andwill continue to decay), the response to the washout trialsthemselves is nil, and the response attributable to the read-aptation stimulus must simply be a shifted version of theresponse to the initial adaptation stimulus.

Given that SSMLTI,2 is able to explain spontaneous recoveryin a viscous-curl force field (Smith et al. 2006), we need to askwhy SSMLTI,2 did not do a better job of explaining savings inCP compared with SSMVP,2. A sufficient answer lies in ourfinding that for the initial adaptation phase, the fit of SSMLTI,1was superior to that of a SSMLTI,2 in both CP and WO, whichmeans that there was no appreciable two-rate behavior duringinitial adaptation in our data. Thus SSMLTI,2 could not satis-factorily explain savings even in CP because (under SSMLTI,2)two-rate behavior is not something that can suddenly emerge atreadaptation. Rather, it must be evident even at initial adapta-tion. In contrast to the lack of two-rate behavior during initialadaptation in our visuomotor rotation data, the viscous-curlforce-field data reported by Smith and colleagues (Fig. 3d fromthat report) manifest clear two-rate behavior during initialadaptation (Smith et al. 2006). A possible explanation for this

discrepancy is the difference in the nature of perturbations usedin the two experiments. Viscous-curl force-field perturbationshave a proprioceptive component, whereas visuomotor rota-tions do not. Malfait and Ostry (2004) showed that salientviscous-curl force-field perturbations led to interlimb transferof adaptation in extrinsic coordinates, whereas more gradualperturbations led only to intralimb transfer in joint-centeredcoordinates. They suggested that the salient perturbation en-gaged a cognitive/explicit mechanism distinct from the implicitmechanisms thought to underlie adaptation in joint-centeredcoordinates. Similarly, it has recently been shown that there isa form of response to sudden force-field perturbations thatappears to be categorical rather than scaled to the size of theerror (Fine and Thoroughman 2006). In contrast, we haveshown that explicit strategies do not contribute to rotationlearning (Mazzoni and Krakauer 2006). Perhaps then the two-rate behavior evident in force-field adaptation is due to anexplicit component absent in rotation learning. Therefore at thecurrent time we must restrict our finding that SSMLTI,2 is worseat explaining savings in CP than SSMVP,2 to visuomotorrotation.

FIG. 5. The fast (red line), slow (green line), and net �fast � slow (black line) state variables estimated from the fit ofSSMVP,2 to the across-subject averaged directional error e[n]are plotted for the (A) CP and (B) WO paradigms. The pertur-bation r[n] (thick gray line), i.e., the deterministic input to thesystem, is also plotted.

2546 ZARAHN, WESTON, LIANG, MAZZONI, AND KRAKAUER

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 12: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

The AIC was used here to assess parsimony of the candidateSSMs in the contexts of the CP and WO paradigms. The AICdoes not measure the ability of the models to explain savingsper se (which is why we also directly assessed savings). Rather,the AIC estimates the overall closeness, in terms of Kullbach–Leibler mean information for discrimination, of the estimated,candidate model to the true but unknown data-generatingprocess (Akaike 1974; Burnham and Anderson 2002); thiscloseness does not solely depend on the ability to explainsavings. All other things being equal, however, a model thatexplains more savings than another should be closer in terms ofKullbach–Leibler information to the truth and therefore have alower AIC. As it turned out, direct measurements of theamount of savings explained by the various SSMs were grosslyconsonant with the AICs, with the SSMVP explaining moresavings as well as having better AICs (albeit, not significantly)than those of the SSMLTI. The use of the AIC complementedthe direct measurement of savings because the former takesinto account increasing instability in the estimated fit associ-ated with increasing parameter number, whereas the latter doesnot. Thus that the SSMVP did not have significantly worsemodel selection risk (and even trended toward being better)than the SSMLTI suggests that the SSMVP made up for theirlarger number of parameters by the extent to which theyreduced model bias (i.e., fit systematic aspects of the data). Incontrast, if we had included a highly overparameterized can-didate model, it might have explained savings the best of all,but its AIC would likely have been the worst of all. However,a caveat of model selection with AIC is that it is known to bebiased toward selecting overparameterized models when theratio of the number of observations to the number of parame-ters is low (Hurvich and Tsai 1991); we do not know the sizeof this bias for the problem we investigated here.

The appeal of SSMLTI,2 in the context of motor adaptation isthat with a small number of static parameters it is nonethelesscapable of producing a fairly rich array of behavioral phenom-ena (Smith et al. 2006). That savings in WO cannot be ex-plained by such a model is in a sense unfortunate. However, amore subtle yet perhaps more interesting aspect of the currentfindings is that they underscore the fact that the ability of amodel to theoretically produce a certain phenomenon (in thiscase, Src in CP) does not imply that the model can actuallyexplain the empirical phenomenon. The paradox is resolved byrealizing that it is not just the choice of model type, but also thevalues of the model parameters that determine the input–output behavior of a system. So, although SSMLTI,2 can pro-duce Src in CP, it does so appreciably only when its parametervalues are such that two-rate behavior is sufficiently salient. Itso happened that, empirically, two-rate behavior during initialadaptation was weak, which made the SSMLTI,2 fit to the Srceffects in CP mediocre. The situation in WO was differentbecause there it can be understood from superposition that it isimpossible for SSMLTI,2 to substantially explain Src.

Kording and colleagues (2007) described an elegant modelof sensorimotor adaptation based on Bayesian estimation(“Bayesian learner”). In this model, the brain implementsKalman filtering to obtain a posteriori estimates of sensorimo-tor perturbation states (“disturbance states” in Kording et al.)associated with different timescales, which it then uses tocorrect sensorimotor maps (i.e., adapt). This Bayesian learnermodel can be roughly understood as a generalization of the

SSMLTI,2 of Smith and colleagues (2006) that represents per-turbations over a wide range of timescales instead of only two.As the Kalman gain approaches steady state, Kalman filteringapproaches an SSMLTI. Thus a Bayesian learner that assumesfixed, known state and measurement error covariance matriceswill be LTI, and thus obey superposition, at steady state.Therefore like SSMLTI,2, such a Bayesian learner would fail toexplain the savings we observed in WO.

In contrast to SSMLTI (Kording et al. 2007; Smith et al.2006), the SSMVP values investigated here were allowed tochange their parameters in an experience-dependent mannerand in this way were able to generate the non-LTI behaviorrequired to explain Src in WO. Said in another way, thisexperience dependence of SSMVP parameters enabled meta-learning (i.e., learning to learn; Abraham and Bear 1996).Similarly, the Bayesian learner could also manifest metalearn-ing by allowing its assumed state and/or measurement errorcovariance matrices to vary (Kording et al. 2007); such a Bayesianlearner would also be SSMVP. It would be interesting to determinewhether such a varying-parameter Bayesian learner—which esti-mates rather than assumes state and measurement noise param-eters—can explain the savings we observed in WO moreparsimoniously than the SSMVP investigated here.

A C K N O W L E D G M E N T S

We thank Dr. Stefano Fusi for a critical reading of the manuscript. We thankDr. Robert Sainburg for sharing custom computer software.

G R A N T S

This work was supported by a Gatsby Initiative in Brain Circuitry grant toE. Zarahn and National Institute of Neurological Disorders and Stroke GrantR01-NS-052804 to J. W. Krakauer.

R E F E R E N C E S

Abraham WC, Bear MF. Metaplasticity: the plasticity of synaptic plasticity.Trends Neurosci 19: 126–130, 1996.

Akaike H. New look at statistical-model identification. IEEE Trans AutomControl 19: 716–723, 1974.

Bozdogan H. Model selection and Akaike Information Criterion (AIC): thegeneral-theory and its analytical extensions. Psychometrika 52: 345–370, 1987.

Burnham KP, Anderson DR. Model Selection and Multimodel Inference: APractical Information-Theoretic Approach. New York: Springer-Verlag,2002, p. 488.

Caithness G, Osu R, Bays P, Chase H, Klassen J, Kawato M, Wolpert DM,Flanagan JR. Failure to consolidate the consolidation theory of learning forsensorimotor adaptation tasks. J Neurosci 24: 8662–8671, 2004.

Cheng S, Sabes PN. Modeling sensorimotor learning with linear dynamicalsystems. Neural Comput 18: 760–793, 2006.

Fine MS, Thoroughman KA. Motor adaptation to single force pulses:sensitive to direction but insensitive to within-movement pulse placementand magnitude. J Neurophysiol 96: 710–720, 2006.

Hinder MR, Walk L, Woolley DG, Riek S, Carson RG. The interferenceeffects of non-rotated versus counter-rotated trials in visuomotor adaptation.Exp Brain Res 180: 629–640, 2007.

Hurvich CM, Tsai CL. Bias of the corrected AIC criterion for underfittedregression and time-series models. Biometrika 78: 499–509, 1991.

Imamizu H, Uno Y, Kawato M. Internal representations of the motorapparatus: implications from generalization in visuomotor learning. J ExpPsychol Hum Percept Perform 21: 1174–1198, 1995.

Kirk R. Experimental Design: Procedures for the Behavioral Sciences.Belmont, CA: Brooks/Cole Publishing, 1982.

Kojima Y, Iwamoto Y, Yoshida K. Memory of learning facilitates saccadicadaptation in the monkey. J Neurosci 24: 7531–7539, 2004.

Kording KP, Tenenbaum JB, Shadmehr R. The dynamics of memory as aconsequence of optimal adaptation to a changing body. Nat Neurosci 10:779–786, 2007.

2547COMPARING MODELS OF SAVINGS

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from

Page 13: Eric Zarahn, Gregory D. Weston, Johnny Liang, Pietro ...blam-lab.org/publications/pdf/Papers/2537.pdf · Eric Zarahn, Gregory D. Weston, Johnny Liang, ... Gregory D. Weston, Johnny

Krakauer JW, Ghez C, Ghilardi MF. Adaptation to visuomotor transforma-tions: consolidation, interference, and forgetting. J Neurosci 25: 473–478,2005.

Krakauer JW, Ghilardi MF, Ghez C. Independent learning of internalmodels for kinematic and dynamic control of reaching. Nat Neurosci 2:1026–1031, 1999.

Krakauer JW, Pine ZM, Ghilardi MF, Ghez C. Learning of visuomotortransformations for vectorial planning of reaching trajectories. J Neurosci20: 8916–8924, 2000.

Malfait N, Ostry DJ. Is interlimb transfer of force-field adaptation a cognitiveresponse to the sudden introduction of load? J Neurosci 24: 8084–8089, 2004.

Mazzoni P, Krakauer JW. An implicit plan overrides an explicit strategyduring visuomotor adaptation. J Neurosci 26: 3642–3645, 2006.

Shao J. Mathematical Statistics. New York: Springer-Verlag, 2003.

Shapiro SS, Wilk MB. Approximations for null distribution of W statistic.Technometrics 10: 861–866, 1968.

Shumway RH, Stoffer DS. An approach to time series smoothing andforecasting using the EM algorithm. J Time Series Anal 3: 253–264, 1982.

Smith MA, Ghazizadeh A, Shadmehr R. Interacting adaptive processes withdifferent timescales underlie short-term motor learning. PLoS Biol 4: e179,2006.

Stone M. An asymptotic equivalence of choice of model by cross-validationand Akaike’s criterion. J R Stat Soc B Stat Method 39: 44–47, 1977.

Wigmore V, Tong C, Flanagan JR. Visuomotor rotations of varying size anddirection compete for a single internal model in motor working memory. JExp Psychol Hum Percept Perform 28: 447–457, 2002.

Yamamoto K, Hoffman DS, Strick PL. Rapid and long-lasting plasticity ofinput–output mapping. J Neurophysiol 96: 2797–2801, 2006.

2548 ZARAHN, WESTON, LIANG, MAZZONI, AND KRAKAUER

J Neurophysiol • VOL 100 • NOVEMBER 2008 • www.jn.org

on Decem

ber 8, 2008 jn.physiology.org

Dow

nloaded from