3 in single-molecule force spectroscopy experimentsvparot/files/pubs/conference/bps/... · this...

1
Kalman Filter Estimates of the Contour Length of an Unfolding Protein in Single-Molecule Force Spectroscopy Experiments Vicente I. Fernandez 1 , Pallav Kosuri 2 , Vicente Parot 4 and Julio M. Fernández 3 1 Department of Mechanical Engineering, Massachusetts Institute of Technology, Boston 02139 2 Dept of Biochemistry and 3 Department of Biological Sciences, Columbia University, New York 10027, 4 Pontificia Universidad Católica de Chile, Santiago Abstract Force spectroscopy measurements of single molecules using AFM have enabled the study of a range of molecular properties not accessible with bulk methods. These properties of interest must typically be inferred by manually fitting models to selected portions of measured data. As manual intervention in the fitting process easily introduces a bias in the analysis, there is a need for more sophisticated analysis methods capable of interpreting data in an unbiased and repeatable “hands-off” manner. Here we apply an extended Kalman filter to the estimation of protein contour length (Lc) during mechanical unfolding, based on force and extension data from an AFM experiment. This filter provides an online and fully automated estimate of Lc based on a system model, the experimental measurements, and noise statistics. The system model comprises a physical model of the cantilever and a nonlinear WLC approximation of the extended protein. When manually fitting the WLC model to force-extension data from ubiquitin proteins, the estimate of the change in contour length during unfolding is distributed normally with mean 23.3 nm and variance 10.2 nm 2 . Testing the Kalman filter on the same protein yields Δ Δ ΔLc with a 24.8 nm mean and 2.0 nm 2 variance. As the variance limits resolution in estimating the number of amino acids released by unfolding, it is clear that the Kalman filter presents a substantial improvement over the conventional method. We thereby demonstrate that the Kalman filter provides a powerful unbiased approach to interpreting force spectroscopy data, capable of increasing resolution beyond the traditional experimental limit. Due to the flexibility of this approach, it can be extended to monitoring other state variables of molecular systems observed by various forms of force spectroscopy, including optical and magnetic tweezers. Figure 1 Mechanical stretching of a polyprotein using single-molecule atomic force microscopy. (A) (i) A single polyprotein molecule is held between the cantilever tip and the coverslip, whose position can be controlled with high precision using a piezoelectric positioner (piezo). (ii) Moving the coverslip away from the tip exerts a stretching force on the polyprotein, which in turn bends the cantilever. The bending of the cantilever changes the position of the laser beam on the split photo diode (PD), registering the pulling force. The applied force can be determined from the spring constant of the cantilever and the degree of cantilever bending. At this high pulling force, a protein domain unfolds. (iii) The unfolded domain can now readily extend, relaxing the cantilever. (iv) The piezo continues to move, stretching the polyprotein to a new high force peak, repeating the sequence until the whole polyprotein has unfolded. This process results in a force extension curve with a characteristic sawtooth pattern shape. (B) A typical sawtooth pattern curve obtained by stretching an I27 polyprotein. The labels i–iv represent the sequence of events shown in A. Figure 2 Schematic depiction of the Extended Kalman Filter (EKF) implementation for single protein force spectroscopy with an AFM. The EKF estimates the current contour length of the protein based upon the force measurements (F t ) up to the present time. It is an extension of the Kalman filter for systems with nonlinear dynamics. The Kalman filter is an optimal estimation algorithm given a known linear system with Gaussian noise. B In general, the extended Kalman filter (EKF) tracks an estimate of the state vector q n The state vector is composed of the contour length of the protein and the position of the cantilever, which are the hidden variables that fully determine the system. The algorithm uses a model of the system to predict the measurement at timestep n based on the input at n-1 and the estimate at n-1. The error between the predicted value and the true measured value at time n is then used to update the estimate with a gain K that is determined by the EKF algorithm. In order to optimally determine K for each timestep, an estimate of the covariance matrix is also tracked by the algorithm. Discussed in Figure 3, the protein is modeled using the Worm-Like Chain (WLC) model of polymer elasticity, given by equation (1). The WLC model provides the tension force in the protein (F p ) given the extension of the protein (y-u) and the contour length (Lc). The cantilever model in turn is specified by the transfer function between the input protein tension and the output cantilever force (2). The corresponding coefficients are given in equations (3) and (4). As mentioned in Figure 3, the result is a filter that depends on the previous two timesteps. Equation (5) shows the state vector used in this implementation. It is composed of the cantilever deflection (y) and the contour length of the protein being stretched (Lc). The current and previous values of y and Lc are both included in the state vector, as required by the cantilever model. The full system model used by the EKF algorithm is described by equations (6) and (7). Equation (6) describes the progression of the state vector. Both the cantilever deflection and the contour length are assumed to have independent white Gaussian process noise (w i,n ) sources. Apart from the noise, the contour length is modeled as constant. Although this is clearly incorrect globally, it models the period between unfolding events accurately. The cantilever deflection is updated by the combination of cantilever dynamics and WLC models. The nonlinearity in the WLC model is the reason that an extended Kalman filter must be used as opposed to a regular Kalman filter. Equation (7) is the measurement equation, linking the state variables to the experimentally measured quantity. In the measurement, there is an additional source of Gaussian noise (v n ). These equations fully define the problem for the application of the EKF algorithm. The algorithm itself is standard and can be found in texts such as [REF 1]. The EKF algorithm only utilizes the measurements that have already been made in order to create its estimate of the state variables. As a result, it can be run concurrently with the experiment itself. In the current analysis, the estimation was done after the experiment on a batch of traces. The causal estimation results are shown in Figure 5. These results show a substantial improvement over the commonly used hand-fitting methods. (1) (2) (3) (4) (5) (6) 2 2 1 1 2 2 1 1 1 ) ( ) ( - - - - + + + = z a z a z b z b z F z F p + - - = = = - + - - - + + + + n n n n n j j n j j n j n j n j n n n n n w w Lc Lc y y a Lc u y W b k Lc Lc y y q , 2 , 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 1 [ ] T n n n n n Lc Lc y y q 1 1 - - = [ ] n n n v q k F + = 0 0 0 - + - - - = - = - Lc u y Lc u y p T k Lc u y W F B p 4 1 1 4 1 2 [ ] 0.192 0.334 = b [ ] 0.196 0.669 - = a (7) (1) (2) (3) (4) (5) (6) 2 2 1 1 2 2 1 1 1 ) ( ) ( - - - - + + + = z a z a z b z b z F z F p + - - = = = - + - - - + + + + n n n n n j j n j j n j n j n j n n n n n w w Lc Lc y y a Lc u y W b k Lc Lc y y q , 2 , 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 1 [ ] T n n n n n Lc Lc y y q 1 1 - - = [ ] n n n v q k F + = 0 0 0 - + - - - = - = - Lc u y Lc u y p T k Lc u y W F B p 4 1 1 4 1 2 [ ] 0.192 0.334 = b [ ] 0.196 0.669 - = a (7) Figure 5. Results of EKF implementation on experi- mental data. Experiments with ubiquitin polyproteins at an extension rate of 400 nm/s produced 190 sawtooth traces for analysis. (A) A sample trace from the data set. The final peak is a result of the dissociation of the protein from the cantilever tip. These peaks were excluded from the analysis in Lc step sizes. (B) The resulting estimate of the contour length of the protein, corresponding to the data in part (A). A persistence length of 0.4 nm was chosen for the protein model, which is the commonly used value for ubiquitin. The inset enlarges one of the steps in Lc, showing that the convergence behavior does not match any of those from Figure 3. The overshoot that occurs immediately after the step implies an error in the protein model and confirms the expected failure of the WLC model at low forces. (C) The distribution and statistics of the estimated changes in contour lengths during unfolding compared with earlier results fit by hand with the WLC model, as in figure 2. The data for the hand-fitted steps is from [REF 2]. The spread of EKF estimates is substantially narrower, with a standard deviation less than half that of the hand- fitted distribution. A noticeable skew is observed in the EKF step size histogram. Therefore the fit plotted is not a Gaussian distribution, but a generalized extreme value distribution. The parameters of the distribution are: ξ = -0.15 σ = 1.3 and μ = 24.6. ξ, σ, and μ are the shape, scale, and location parameters respectively. Though preliminary, this may be linked to the underlying protein mechanics in which large contour lengths are preferentially chosen among available configurations. Figure 3 Models describing the experimental system for the extended Kalman filter implementation. The system model is divided into two sequential parts: A linear model of the cantilever dynamics driven by the output of a nonlinear protein model. (A) The cantilever model consists of a second order LTI system fit to the noise spectrum of a free cantilever. This assumes the noise is white and acts solely on the tip of the cantilever. It is important to distinguish the heavily damped cantilever spectrum near a surface (blue line) from the spectrum far from a surface (black line). (B) The protein forces are modeled by the Worm-Like Chain (WLC) model of protein elasticity. This model is known to fit the protein force-extension curve well for high forces only. Previously, data was fit with the WLC model as shown, where the unfolding step size and persistence length are chosen manually to obtain the best fit over the entire trace. For this example, ΔLc = 23.1 nm, and p = 0.37 nm. Figure 4 Kalman filter estimation applied to simulated data. (A) Simulated data generated directly from the cantilever and WLC models with a persistence length of 0.4 nm and a predetermined stepwise constant contour length (Lc). White Gaussian measurement noise with a standard deviation of 10 pN has been added to the simulation. (B) EKF estimates of the Lc based on various persistence lengths, against the true Lc (dotted line). The convergence behavior is strongly dependent on the value of the persistence length. For the true value of the persistence length, the estimate quickly converges to the true Lc. If the persistence length is off, slow convergence results, with a slope that indicates the sign of the error. Conclusions Implementing a Kalman filter enhances the estimate of the stepwise increases in contour length of unfolding proteins by providing more consistent and accurate results This approach provides an unbiased “hands off” way of extracting information in real time on molecular properties The choice of persistence length may be made before the experiment, or it can be automatically chosen in post-analysis, making the procedure fully independent of the experimenter The on-line operation of the Kalman filter opens the door to numerous applications such as improved feed-back systems The Kalman filter should be utilized more commonly in providing unbiased estimations of hidden state variables in single molecule experiments References: [1] Mohinder S. Grewal and Angus P. Andrews, Kalman Filtering – Theory and Practise using MATLAB, 2nd edition, Wiley-Interscience, 2001 [2] Mariano Carrion-Vazquez et al., The mechanical stability of ubiquitin is linkage dependent, Nat Struct Biol, 2007

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3 in Single-Molecule Force Spectroscopy Experimentsvparot/files/pubs/conference/bps/... · This approach provides an unbiased “hands off” way o f extracting information in real

Ka

lma

n F

ilte

r E

sti

ma

tes

of

the

Co

nto

ur

Le

ng

th o

f a

n U

nfo

ldin

g P

rote

in

in S

ing

le-M

ole

cu

le F

orc

e S

pe

ctr

os

co

py

Ex

pe

rim

en

tsV

ice

nte

I.

Fe

rna

nd

ez

1,

Pa

lla

v K

os

uri

2,

Vic

en

te P

aro

t4a

nd

Ju

lio

M.

Fe

rná

nd

ez

3

1D

epart

ment

of M

echan

ica

l E

ng

ineeri

ng,

Massach

usett

s I

nstitu

te o

fT

echnolo

gy,

Bosto

n 0

213

92D

ept

of B

iochem

istr

y a

nd 3

Dep

art

ment of B

iolo

gic

al S

cie

nces,

Co

lum

bia

Un

ivers

ity,

New

York

1002

7,

4P

ontificia

Un

ivers

idad C

ató

lica

de C

hile

, S

antiag

o

Ab

str

act

Fo

rce s

pectr

osco

py m

easu

rem

en

ts o

f sin

gle

mo

lecu

les u

sin

g A

FM

have e

nab

led

th

e s

tud

y o

f a

ran

ge o

f m

ole

cu

lar

pro

pert

ies n

ot

accessib

le w

ith

b

ulk

m

eth

od

s.

Th

ese p

rop

ert

ies o

f in

tere

st

mu

st

typ

icall

y b

e i

nfe

rred

by m

an

uall

y f

itti

ng

mo

dels

to

sele

cte

d p

ort

ion

s o

f m

easu

red

data

. A

s

man

ual

inte

rven

tio

n i

n t

he f

itti

ng

pro

cess e

asil

y i

ntr

od

uces a

bia

s i

n t

he a

naly

sis

, th

ere

is a

need

fo

r m

ore

so

ph

isti

cate

d

an

aly

sis

m

eth

od

s

cap

ab

le

of

inte

rpre

tin

g

data

in

an

u

nb

iased

an

d

rep

eata

ble

“h

an

ds-o

ff”

man

ner.

Here

we a

pp

ly a

n exte

nd

ed

K

alm

an

fi

lter

to th

e e

sti

mati

on

of

pro

tein

co

nto

ur

len

gth

(L

c)

du

rin

g m

ech

an

ical

un

fold

ing

, b

ased

on

fo

rce a

nd

exte

nsio

n d

ata

fro

m

an

AF

M e

xp

eri

men

t. T

his

filte

r p

rovid

es a

n o

nlin

e a

nd

fu

lly a

uto

mate

d e

sti

mate

of

Lc

based

on

a

syste

m

mo

del,

the

exp

eri

men

tal

measu

rem

en

ts,

an

d

no

ise

sta

tisti

cs.

Th

e

syste

m

mo

del

co

mp

rises a

ph

ysic

al

mo

del

of

the c

an

tile

ver

an

d a

no

nlin

ear

WL

Cap

pro

xim

ati

on

of

the e

xte

nd

ed

p

rote

in.

Wh

en

man

uall

y f

itti

ng

th

e W

LC

mo

del

to f

orc

e-e

xte

nsio

n d

ata

fro

m u

biq

uit

in p

rote

ins,

the

esti

mate

of

the c

han

ge i

n c

on

tou

r le

ng

th d

uri

ng

un

fold

ing

is d

istr

ibu

ted

no

rmall

y w

ith

mean

23.3

n

m a

nd

vari

an

ce 1

0.2

nm

2.

Testi

ng

th

e K

alm

an

filte

r o

n t

he s

am

e p

rote

in y

ield

s ∆ ∆∆∆

Lc

wit

h a

24.8

nm

m

ean

an

d 2

.0 n

m2

vari

an

ce.

As t

he v

ari

an

ce l

imit

s r

eso

luti

on

in

esti

mati

ng

th

e n

um

ber

of

am

ino

acid

s r

ele

ased

by u

nfo

ldin

g,

it i

s c

lear

that

the K

alm

an

filte

r p

resen

ts a

su

bsta

nti

al

imp

rovem

en

t o

ver

the c

on

ven

tio

nal

meth

od

. W

e t

here

by d

em

on

str

ate

th

at

the K

alm

an

filte

r p

rovid

es a

po

werf

ul

un

bia

sed

ap

pro

ach

to

in

terp

reti

ng

fo

rce sp

ectr

osco

py d

ata

, cap

ab

le o

f in

cre

asin

g re

so

luti

on

b

eyo

nd

th

e t

rad

itio

nal exp

eri

men

tal li

mit

. D

ue t

o t

he f

lexib

ilit

y o

f th

is a

pp

roach

, it

can

be e

xte

nd

ed

to

mo

nit

ori

ng

oth

er

sta

te v

ari

ab

les o

f m

ole

cu

lar

sys

tem

s o

bserv

ed

b

y vari

ou

s f

orm

s o

f fo

rce

sp

ectr

osco

py,

inclu

din

g o

pti

cal an

d m

ag

neti

c t

weezers

.

Fig

ure

1M

ech

an

ical str

etc

hin

g o

f a p

oly

pro

tein

usin

g s

ing

le-m

ole

cu

le a

tom

ic f

orc

e m

icro

sco

py.

(A)

(i)

A s

ingle

poly

pro

tein

mole

cule

is h

eld

betw

een t

he c

antile

ver

tip

and t

he c

overs

lip,

whose p

ositio

n c

an b

e

contr

olle

d w

ith h

igh p

recis

ion u

sin

g a

pie

zoele

ctr

ic p

ositio

ner

(pie

zo).

(ii)

Movin

g t

he c

overs

lipaw

ay f

rom

the

tip exert

s a str

etc

hin

g fo

rce on th

e poly

pro

tein

, w

hic

h in

tu

rn bends th

e cantile

ver.

T

he bendin

g of

the

cantile

ver

changes t

he p

ositio

n o

f th

e laser

beam

on t

he s

plit

photo

dio

de (

PD

), r

egis

tering t

he p

ulli

ng f

orc

e.

The a

pplie

d f

orc

e c

an b

e d

ete

rmin

ed f

rom

the s

pring c

onsta

nt

of

the c

antile

ver

and t

he d

egre

e o

f cantile

ver

bendin

g.

At

this

hig

h pulli

ng fo

rce,

a pro

tein

dom

ain

unfo

lds.

(iii)

The unfo

lded dom

ain

can now

re

adily

exte

nd,

rela

xin

g t

he c

antile

ver.

(iv

)T

he p

iezo c

ontinues t

o m

ove,

str

etc

hin

g t

he p

oly

pro

tein

to a

new

hig

h

forc

e p

eak,

repeating t

he s

equence u

ntil

the w

hole

poly

pro

tein

has u

nfo

lded.

This

pro

cess r

esults i

n a

forc

e

exte

nsio

n c

urv

e w

ith a

chara

cte

ristic s

aw

tooth

pattern

shape.

(B)

A t

ypic

al

saw

tooth

pattern

curv

e o

bta

ined

by s

tretc

hin

g a

n I27 p

oly

pro

tein

. T

he labels

i–iv

repre

sent th

e s

equence o

f events

show

n in A

.

Fig

ure

2

Sch

em

ati

c

dep

icti

on

o

f th

e

Exte

nd

ed

K

alm

an

F

ilte

r (E

KF

) im

ple

men

tati

on

fo

r sin

gle

p

rote

in

forc

e

sp

ectr

osco

py

wit

h

an

A

FM

.T

he

EK

F

estim

ate

s t

he c

urr

ent

conto

ur

length

of

the

pro

tein

based u

pon t

he f

orc

e m

easure

ments

(F

t) u

p t

o t

he p

resent

tim

e.

It

is a

n e

xte

nsio

n

of

the

Kalm

an

filter

for

syste

ms

with

nonlin

ear

dynam

ics.

The K

alm

an f

ilter

is a

n

optim

al

estim

ation a

lgorith

m g

iven a

know

n

linear

syste

m w

ith G

aussia

n n

ois

e.

B

In g

ene

ral, t

he

exte

nded

Ka

lman

filt

er

(EK

F)

tra

cks a

n e

stim

ate

of

the

sta

te

ve

cto

r q

nT

he

sta

te v

ecto

r is

co

mpo

sed

of

the

con

tou

r le

ngth

of

the p

rote

in

an

d t

he

po

sitio

n o

f th

e c

an

tile

ve

r, w

hic

h a

re t

he

hid

den

va

riab

les t

ha

t fu

lly

de

term

ine

the

syste

m.

The

alg

orith

m u

se

s a

mode

l of

the

syste

m t

o p

red

ict

the

me

asu

rem

en

t a

t tim

este

pn

ba

se

d o

n t

he

inpu

t a

t n

-1an

d t

he

estim

ate

a

t n

-1.

The

err

or

be

twee

n t

he

pre

dic

ted

va

lue

and

th

e t

rue

me

asu

red

va

lue

at

tim

e

nis

th

en

u

sed

to

upda

te

the

e

stim

ate

w

ith

a

gain

K

tha

t is

de

term

ined

by t

he

EK

F a

lgo

rith

m.

In o

rde

r to

op

tim

ally

de

term

ine

Kfo

r

ea

ch

tim

este

p,

an

estim

ate

of

the

co

va

rian

ce

ma

trix

is a

lso

tra

cked

by t

he

alg

orith

m.

Dis

cu

ssed

in

Fig

ure

3,

the

pro

tein

is m

od

ele

d u

sin

g t

he

Worm

-Lik

e C

ha

in

(WLC

) m

ode

l of

po

lym

er

ela

sticity,

giv

en

by e

qua

tion

(1

).

The

WLC

mode

l p

rovid

es t

he

ten

sio

n f

orc

e i

n t

he

pro

tein

(F

p)

giv

en

th

e e

xte

nsio

n o

f th

e

pro

tein

(y-u

) a

nd

the

con

tou

r le

ngth

(L

c).

T

he

can

tile

ve

r m

od

el

in t

urn

is

spe

cifie

d b

y t

he

tra

nsfe

r fu

nction

be

twe

en

the

inpu

t p

rote

in t

en

sio

n a

nd

the

ou

tpu

t can

tile

ve

r fo

rce (

2).

The

co

rre

spond

ing c

oeff

icie

nts

a

re g

ive

n

in

equa

tion

s

(3)

and

(4

).

As

men

tioned

in

Fig

ure

3,

the

re

su

lt is a

filt

er

tha

t de

pend

s o

n the

pre

vio

us t

wo

tim

este

ps.

Equa

tio

n

(5)

sh

ow

s

the

sta

te

ve

cto

r u

sed

in

this

im

ple

men

tation

. It

is

co

mpo

sed

of

the

ca

ntile

ve

r de

fle

ction

(y)

an

d t

he

co

nto

ur

len

gth

of

the

pro

tein

be

ing s

tre

tche

d (

Lc).

T

he

cu

rren

t an

d p

revio

us v

alu

es o

f y a

nd

Lc

are

bo

th i

nclu

de

d i

n t

he

sta

te v

ecto

r, a

s r

equ

ired

by t

he

can

tile

ve

r m

ode

l. T

he

fu

ll syste

m m

ode

l u

se

d b

y t

he

EK

F a

lgo

rith

m i

s d

escrib

ed

by

equa

tion

s (

6)

and

(7

).

Equa

tion

(6

)de

scribe

s t

he p

rogre

ssio

n o

f th

e s

tate

ve

cto

r.

Both

th

e c

an

tile

ve

r d

eflection

and

the

con

tou

r le

ngth

are

assu

med

to

ha

ve

indepe

nden

t w

hite

Gau

ssia

n p

roce

ss n

ois

e (

wi,n)

so

urc

es.

Apa

rt f

rom

th

e n

ois

e,

the

co

nto

ur

len

gth

is m

ode

led

as c

on

sta

nt.

Althou

gh

th

is i

s c

lea

rly i

nco

rre

ct

glo

ba

lly,

it m

ode

ls t

he

pe

riod

be

twe

en

un

fold

ing e

ven

ts a

ccu

rate

ly.

Th

e c

an

tile

ve

r d

efle

ction

is u

pda

ted

by t

he

co

mb

ina

tion

of

can

tile

ve

r d

yn

am

ics a

nd

WLC

mode

ls.

The

non

linea

rity

in

th

e W

LC

mode

l is

the

re

ason

tha

t an

exte

nded

Ka

lman

filt

er

mu

st

be

use

d a

s o

ppo

se

d t

o a

re

gu

lar

Ka

lman

filt

er.

E

qua

tio

n (

7)

is t

he

mea

su

rem

en

t e

qua

tion

, lin

kin

g t

he

sta

te v

ariab

les t

o t

he

exp

erim

en

tally

me

asu

red

qu

an

tity

. I

n t

he

mea

sure

men

t, t

he

re is a

n a

dd

itio

na

l sou

rce

of

Gaussia

n n

ois

e (

vn).

T

he

se

equa

tio

ns f

ully

define

the

pro

ble

m f

or

the

app

lica

tion

of

the

EK

F a

lgo

rith

m.

The

alg

orith

m its

elf is s

tand

ard

and

can

be

fo

und

in

te

xts

su

ch

as [

RE

F 1

].

The

EK

F a

lgo

rith

m o

nly

utiliz

es t

he

mea

su

rem

en

ts th

at ha

ve

alrea

dy b

een

made

in

ord

er

to c

rea

te its

estim

ate

of

the

sta

te v

ariab

les.

As a

re

su

lt,

it

can

be

run

con

cu

rren

tly w

ith

th

e e

xp

erim

en

t itse

lf.

In

the

cu

rren

t ana

lysis

, th

e e

stim

ation

wa

s d

one

aft

er

the

exp

erim

en

t on

a b

atc

h o

f tr

ace

s.

The

cau

sal e

stim

ation

re

su

lts a

re s

ho

wn

in

Fig

ure

5.

The

se

re

sults s

ho

w a

sub

sta

ntial im

pro

vem

en

t o

ve

r th

e c

om

monly

use

d h

an

d-f

ittin

g m

eth

od

s.

(1)

(2)

(3)

(4)

(5)

(6)

2

2

1

1

2

2

1

1

1)

(

)(

−−

−−

++

+=

za

za

zb

zb

zF

zF

p

+

=

=

∑ =

−+

−−

+

+

+

+

nn

nnn

j

jn

j

jn

jn

jn

j

n

nn

n

nww

Lc

Lcy

ya

Lc

uy

Wb

k

Lc

Lcyy

q,

2

,1

1

0

11

1

1

1

00

10

00

01

1

[]T

nn

nn

nL

cL

cy

yq

11

−−

=

[]

nn

nv

qk

F+

⋅=

00

0

+−

−−

=

=

Lcu

y

Lcu

y

pTk

Lcu

yW

FB

p41

141

2

[]

0

.19

20

.33

4

=b

[]

0.1

96

0

.66

9-

=a

(7)

(1)

(2)

(3)

(4)

(5)

(6)

2

2

1

1

2

2

1

1

1)

(

)(

−−

−−

++

+=

za

za

zb

zb

zF

zF

p

+

=

=

∑ =

−+

−−

+

+

+

+

nn

nnn

j

jn

j

jn

jn

jn

j

n

nn

n

nww

Lc

Lcy

ya

Lc

uy

Wb

k

Lc

Lcyy

q,

2

,1

1

0

11

1

1

1

00

10

00

01

1

[]T

nn

nn

nL

cL

cy

yq

11

−−

=

[]

nn

nv

qk

F+

⋅=

00

0

+−

−−

=

=

Lcu

y

Lcu

y

pTk

Lcu

yW

FB

p41

141

2

[]

0

.19

20

.33

4

=b

[]

0.1

96

0

.66

9-

=a

(7)

Fig

ure

5.

Resu

lts

of

EK

F

imp

lem

en

tati

on

o

n

exp

eri

-m

en

tal

data

.E

xperim

ents

w

ith

ubiq

uitin

poly

pro

tein

s

at

an

exte

nsio

n

rate

of

400

nm

/s

pro

duced 1

90 s

aw

tooth

tra

ces f

or

analy

sis

.

(A)

A

sam

ple

tr

ace

from

the d

ata

set.

The f

inal

peak

is a re

sult of

the dis

socia

tion of

the p

rote

in f

rom

the c

antile

ver

tip.

These p

eaks w

ere

exclu

ded f

rom

th

e a

naly

sis

in L

cste

p s

izes.

(B)

The

resultin

g

estim

ate

of

the

conto

ur

length

of

the

pro

tein

,

corr

espondin

g to

th

e data

in

part

(A

). A

pers

iste

nce le

ngth

of

0.4

nm

w

as

chosen f

or

the p

rote

in m

odel, w

hic

h i

s t

he c

om

monly

used v

alu

e f

or

ubiq

uitin

.

The inset

enla

rges o

ne o

f th

e s

teps in L

c,

show

ing t

hat

the c

onverg

ence b

ehavio

r does

not

matc

h

any

of

those

from

F

igure

3.

T

he

overs

hoot

that

occurs

im

media

tely

after

the s

tep i

mplie

s a

n e

rror

in t

he p

rote

in m

odel

and c

onfirm

s t

he

expecte

d fa

ilure

of

the W

LC

m

odel

at

low

fo

rces.

(C

) T

he dis

trib

ution and

sta

tistics o

f th

e e

stim

ate

d c

hanges in c

onto

ur

length

s d

uring u

nfo

ldin

g c

om

pare

d

with e

arlie

r re

sults f

it b

y h

and w

ith t

he W

LC

model, a

s i

n f

igure

2.

The d

ata

for

the

hand-f

itte

d

ste

ps

is

from

[R

EF

2].

T

he

spre

ad

of

EK

F

estim

ate

s

is

substa

ntially

narr

ow

er,

with a

sta

ndard

devia

tion l

ess t

han h

alf

that

of

the h

and-

fitted d

istr

ibution.

A n

oticeable

skew

is o

bserv

ed in t

he E

KF

ste

p s

ize h

isto

gra

m.

There

fore

the f

it p

lotted i

s n

ot

a

Gaussia

n

dis

trib

ution,

but

a

genera

lized

extr

em

e

valu

e

dis

trib

ution.

T

he p

ara

mete

rs o

f th

e

dis

trib

ution

are

: ξ

=

-0.1

5

σ=

1.3

and µ

= 2

4.6

. ξ,

σ,

and µ

are

th

e

shape,

scale

, and

location p

ara

mete

rs r

espectively

.

Though p

relim

inary

, th

is m

ay b

e

linked

to

the

underlyin

g

pro

tein

m

echanic

s in w

hic

h larg

e c

onto

ur

length

s a

re p

refe

rentially

chosen

am

ong a

vaila

ble

configura

tions.

Fig

ure

3M

od

els

descri

bin

g t

he e

xp

eri

men

tal

syste

m f

or

the e

xte

nd

ed

Kalm

an

fi

lter

imp

lem

en

tati

on

.T

he s

yste

m m

odel

is d

ivid

ed in

to t

wo sequential

part

s:

A

lin

ear

model

of

the c

antile

ver

dynam

ics d

riven b

y t

he o

utp

ut

of

a n

onlin

ear

pro

tein

m

odel.

(A)

The c

antile

ver

model

consis

ts o

f a s

econd o

rder

LT

I syste

m f

it t

o t

he

nois

e s

pectr

um

of

a f

ree c

antile

ver.

This

assum

es t

he n

ois

e i

s w

hite a

nd a

cts

sole

ly

on t

he t

ip o

f th

e c

antile

ver.

It

is im

port

ant

to d

istinguis

h t

he

heavily

dam

ped c

antile

ver

spectr

um

near

a s

urf

ace (

blu

e lin

e)

from

the s

pectr

um

far

from

asurf

ace (

bla

ck l

ine).

(B

)T

he p

rote

in f

orc

es a

re m

odele

d b

y t

he W

orm

-Lik

e C

hain

(W

LC

) m

odel

of

pro

tein

ela

sticity.

This

model

is k

now

n t

o f

it t

he p

rote

in f

orc

e-e

xte

nsio

n c

urv

e w

ell

for

hig

h

forc

es

only

. P

revio

usly

, data

w

as

fit

with th

e W

LC

m

odel

as show

n,

where

th

e

unfo

ldin

g s

tep s

ize a

nd p

ers

iste

nce l

ength

are

chosen m

anually

to o

bta

in t

he b

est

fit

over

the e

ntire

tra

ce. F

or

this

exam

ple

, ∆

Lc

= 2

3.1

nm

, and p

= 0

.37 n

m.

Fig

ure

4K

alm

an

filte

r esti

mati

on

ap

plied

to

sim

ula

ted

d

ata

.(A

)S

imula

ted

data

genera

ted

directly

from

th

e

cantile

ver

and W

LC

models

with a

pers

iste

nce length

of

0.4

nm

and a

pre

dete

rmin

ed s

tepw

ise c

onsta

nt

conto

ur

length

(L

c).

W

hite G

aussia

n m

easure

ment

nois

e w

ith a

sta

ndard

devia

tion o

f 10 p

Nhas b

een a

dded t

o t

he s

imula

tion.

(B

)E

KF

estim

ate

s

of

the

Lc

based

on

various

pers

iste

nce

length

s,

again

st

the t

rue L

c(d

otted l

ine).

The c

onverg

ence

behavio

r is

str

ongly

dependent

on

the

valu

e

of

the

pers

iste

nce l

ength

. F

or

the t

rue v

alu

e o

f th

e p

ers

iste

nce

length

, th

e e

stim

ate

quic

kly

converg

es t

o t

he t

rue

Lc.

If

the

pers

iste

nce l

ength

is o

ff,

slo

w c

onverg

ence r

esults,

with a

slo

pe that in

dic

ate

s the s

ign o

f th

e e

rror.

Co

nclu

sio

ns

•Im

ple

men

tin

g

a

Kalm

an

fi

lter

en

han

ces

the

esti

mate

o

f th

e

ste

pw

ise i

ncre

ases i

n c

on

tou

r le

ng

th o

f u

nfo

ldin

g p

rote

ins b

y

pro

vid

ing

mo

re c

on

sis

ten

t an

d a

ccu

rate

resu

lts

•T

his

ap

pro

ach

p

rovid

es

an

u

nb

iased

“h

an

ds

off

”w

ay

of

extr

acti

ng

in

form

ati

on

in

real ti

me o

n m

ole

cu

lar

pro

pert

ies

•T

he

ch

oic

e

of

pers

iste

nce

len

gth

m

ay

be

mad

e

befo

re

the

exp

eri

men

t, o

r it

can

be a

uto

mati

call

y c

ho

sen

in

po

st-

an

aly

sis

, m

akin

g t

he p

roced

ure

fu

lly i

nd

ep

en

den

t o

f th

e e

xp

eri

men

ter

•T

he o

n-l

ine o

pera

tio

n o

f th

e K

alm

an

fi

lter

op

en

s th

e d

oo

r to

n

um

ero

us a

pp

licati

on

s s

uch

as im

pro

ved

feed

-back s

ys

tem

s•

Th

e K

alm

an

filte

r sh

ou

ld b

e u

tilized

mo

re c

om

mo

nly

in

pro

vid

ing

u

nb

iased

esti

mati

on

s

of

hid

den

sta

te

vari

ab

les

in

sin

gle

m

ole

cu

le e

xp

eri

men

ts

Refe

ren

ces:

[1]

Mo

hin

der

S. G

rew

al an

d A

ng

us P

. A

nd

rew

s, K

alm

an

Filte

rin

g –

Th

eo

ry a

nd

Pra

cti

se u

sin

g M

AT

LA

B, 2n

d e

dit

ion

,

Wiley-I

nte

rscie

nce, 2001

[2]

Mari

an

o C

arr

ion

-Vazq

uez e

t al., T

he m

ech

an

ical sta

bilit

y o

f u

biq

uit

in is lin

kag

e d

ep

en

den

t, N

at

Str

uct

Bio

l, 2

007