gpu computing of the hybrid monte carlo method for the … · 2015. 12. 17. · first invented for...
TRANSCRIPT
GPU Computing of the Hybrid Monte Carlo Method for the Bayesian estimation of the Realized Stochastic Volatility Model( An application of HPC in finance)TETSUYA TAKAISHI
HIROSHIMA UNIVERSITY OF ECONOMICS
Outline of presentation1 Introduction
2 HMC in Lattice QCD
3 Modeling financial time series
4 Realized volatility
5 Realized stochastic volatility
6 Bayesian inference
7 Hybrid Monte Carlo
8 Optimal acceptance
9 Leapfrog integrator
10 Higher order integrator
11 Minimum Norm integrator
12 Simulation study
13 Empirical study
14 GPU computing
15 Summary
Introduction What is Hybrid Monte Carlo?
Hybrid Monte Carlo (HMC) is one of Markov Chain Monte Carlo Methods (MCMC).
First invented for the Lattice Quantum Chromo Dynamics (QCD) simulations by Kennedy et. al.
A special feature of HMC : global algorithm
Sometimes HMC is also called Hamiltonian Monte Carlo
NN xxxxxx 2121 , ,
NN xxxxxx ,, , 2211
All variables can be updated at once.
The global feature is need for the Lattice QCD simulations
Usual MCMC methods are local algorithms that update variables sequentially.
)10( 6ON
Introduction Why we use the HMC?
The merits of HMC
1. The global update may de-correlate Monte Carlo samples for many variables.
Stochastic volatility model that includes a number of volatility variables to be updated in MCMC.
2. HMC code can be easily parallelized.
Accelerated by parallel computing such as GPU computing
Ex.
The purpose of this study
Parameter estimation of the realized stochastic volatility model (RSVM) by the HMC simulations
Accelerating HMC simulations of the RSVM by GPU computing
Tuning by the improved integrator
HMC in Lattice QCD Why is the HMC needed for Lattice QCD?
QCD is the theory of strong interaction
u
u d
Proton consists of three quarks
u d
Strong force is carried by gluon
Lattice QCD is formulated to investigate the non-perturbative aspect of QCD.
Space-time is discretized on a 4-dimensional lattice.
Gluon fields are defined on links
Fermion fields are defined on sites
Lattice QCD action consists of gluon and fermion actions
),,()( USUS fermiongluon
TrUSgluon )(
Gluon action
))(exp(, niagAU n SU(3)
mnmnfermion UDUS )(),,(
Fermion action
nmmnmnmnnm rUrUKUD ,,,,, )()()( +
Fermion matrix Gamma matrix
64 10L 10030L
66 1010
)),,()(exp(][]][[ USUSdddUZ fermiong
Partition function
Integrating fermion fields by Grassmann integral
))(exp()det(][ USDdUZ g
Expectation value of physical observable
))(exp()det()(][1
USDUOdUZ
O g
This is done by the Markov Chain Monte Carlo method
)(
)(
)(det
)(det,1min
ig
ig
US
i
US
i
eUD
eUDp
If we do this by the Metropolis method, gluon fields are updated one by one.
ii UU
For each update, we need to calculate a determinant of D.
In total 610
D is a huge matrix 66 1010
determinant calculations
Metropolis test
Time consuming
Efficient global algorithm is needed
Modeling financial time seriesData: Panasonic Co. 2006.07.03-2009.12.30
Daily returnttt ppR lnln 1
tttR
),,,( ,21,21 ttttt RRf
N(0,1) ~ t
Return is assumed to be
We assume volatility dynamics given by
Various volatility models are proposed
Stylized facts: no significant autocorrelation in return long autocorrelation time in absolute return volatility clustering
Volatility model
GARCH-type model
Deterministic model
2
1
2
1
2
ttt R
,,
Model parameters can be determined by the maximum likelihood estimation
Parameter estimation: easy
parameters
GARCHEGARCHGJR-GARCHetc
GARCH model
Engle (1982)Bollerslev(1986)
Stochastic volatility model
Basic stochastic volatility model
Parameter estimation: difficult
ttt ))(ln()ln( 2
1
2),0(~ Nt
Model parameters are determined by the Bayesian inference.
The Bayesian inference is performed by Markov Chain Monte Carlo methods
Hybrid Monte Carlo method is used for parameter estimations of the RSVM.
In this study we focus on the realized stochastic volatility model (RSVM).
RSVM= realized volatility + daily return
stochastic
)()()(ln tdWttpd
N
i
kiTtt rRV1
2
*
dsstt
TtT )()( 22 Integrated volatility (IV) for T period
IV
Realized Volatility
)(ln)(ln kipipri
Andersen, Bollerslev (1998)
Let us assume that the logarithmic price process
follows a stochastic diffusion as
Realized volatility is defined by a sum of
squared finely sampled returns.
returns calculated using high-frequency data
k
TN k: sampling period
)0( k
W(t): Standard Brownian motion
volatility spot:)(t
1tptp
N
i
kitt rRV1
2
*
ttt ppR lnln 1,1 Daily return
Realized volatilitykitr *
)(2 tT
Daily price
morning session afternoon session
How to deal with the intraday returns during the breaks?
break breakbreak
Domestic stock trade at the Tokyo Stock Exchange
09:00 11:00 12:30 15:00
Let us consider daily volatility
Usually stock exchange markets are not open for a whole day.
start end
Non-trading hours issue
RV without including returns in the breaks
Microstructure noise
)()()( ttrtr
),0(:)( 2 WNt
true noise
Observed returns are also contaminated by noise
Price discreteness, bid-ask spreads, etc.
N
i
N
i
iii
N
i
iii
N
i
rrrrRV1 1
2
1
22
1
2 2)(
Noise terms
)()(ln)(ln ttPtP
Observed prices are contaminated by microstructure noise
)()()( tttt
In the presence of noise RV is calculated as follows
22 N
Zhou(1996)
Hansen and Lunde (2005) introduced an adjustment factor
T
t
t
T
t
t
RV
RR
c
1
0
1
2)(
Correct RV so that the average of RV matches the variance of the daily returns
0
tt cRVRV
Average of original realized volatilities
Variance of daily returns
T: trading daysc: adjustment factor
Original realized volatility
Bias correction
Realized Stochastic volatility model
,,1 ttty )1,0(~ Nt
),0(~ 2ut Nu ,,2 ttt uhy
)0(~ ,)( 2
1 ,Nhh tttt
tt RVy ln,2
tth2ln
),,,,( 22
u
Takahashi et al.(2009)
Model parameters
Volatility variable
Dynamics of RV
Idea : daily return + realized volatility
Information given from financial markets
Bias to RV
0
tt cRVRV
cRVy tt lnln 0
,2
HL factor
cln
①
②
③
TdhdhhYYfL 121 ),|,()(
Likelihood function of Realized stochastic volatility model
T
t
ttu
u
tt
uh
thT
i
hhh
hy
e
yehYYf
t
t
22
2
12/12
22
2
2
2
2
,22/12
2
,12/
2/1
1
21
2
))((exp2
))1/((2
)(exp
2
)(exp2
2exp2),|,(
TyyyY ,12,11,11 ,, TyyyY ,22,21,22 ,,
Bayesian inference
)()|()|( yLy
Bayes theorem
)(
)()|()|(
yf
yLy
)|( y :posterior distribution
)|( yL :likelihood function
)( :prior distribution
Probability distribution of θ
)()|()( yLdyf
dyZ
E )|(1
][
Parameters are estimated as expectation values:
Estimate by Markov Chain Monte Carlo
),,,,( 22
u
dhdhfddhdhhfE T )),(exp(ln),(][ 1
We need T updates of volatility variables at each MCMC step.
All T volatility variables can be updated simultaneously.
An advantage of the hybrid Monte Carlo algorithm:
De-correlate Monte Carlo samples of volatility variables
Expectation values of parameters for realized stochastic volatility model
),,,,( 22
u
dhdhfddhdhhfE T )),(exp(ln),(][ 1
Hybrid Monte Carlo
dpdhdhpHZ
dpdhdhfp
Z)),(exp(
1)),(ln
2exp(
1 2
Introduce momenta ),,,( 21 Tpppp
Define Hamiltonian
VThfp
hpH ),(ln2
),(2
conjugate to ),,,( 21 Thhhh
),,,,( 22
uFor RSV model
Kinetic + potential
Z: normalization constant
A local update scheme is not effective
Basic idea: HMC = molecular dynamics simulation + Metropolis test
Multi-move sampler, Watanabe,Omori(2004)
i
i
i
i
h
hpH
dt
dp
p
hpH
dt
dh
),(
),(
2/)()2/()(
)()(
2/)()()2/(
tttptthtth
th
Htpp
tpthtth
iii
i
ii
iii
Choose candidates of h by solving Hamilton’s equations of motion
0dt
dH
Hamiltonian is conserved
Solve Hamilton’s equations of motion numerically
(2nd order ) Leapfrog integrator
),(),( hpHhpHH
)( 2tO
Hamiltonian is Not conserved due to discretization errors.
0),(),( hpHhpHH
t Step size
h´
p´
2/t
elementary step
repeat this step times
t 2/t
t t
①
②
③
④
⑤h
p
),( hpH ),( hpH
))exp(,1min())),(
),(exp(,1min( H
hpH
hpHP
Metropolis accept/reject test
Acceptance depends on step size t
),(),( hpHhpHH )(~ 2tO
Molecular dynamics simulation
2/)()2/()(
)()(
2/)()()2/(
tttptthtth
th
Htpp
tpthtth
iii
i
ii
iii
),,,( 21 Thhhh
t
ln
Optimal acceptance•What is the optimal acceptance for HMC ?
step-size small acceptance large – cost large
acceptance small – cost small step-size large
t
1
tAcctE )(
0t 0)( tE
t 0)( tE
•We define the efficiency function as
This function has the property:
Optimal acceptance and step-size are given by maximizing the efficiency function
cost
t
t
Optimal acceptance
2/1
2
1HerfcAcc
ntV
CHAcc 2/1
2/1
2
2exp
8
12exp
Acceptance is evaluated as
ntCVH 2/12/1
2
ttVC
tE n
2/1
2exp)(
Maximizing the efficiency function, we obtain the optimal step-size
At small energy difference
Efficiency function is rewritten as
nopt
nCVt
2/1
2/1)2(
Gupta et al,PLB242(1990)437
For n-th integrator
Optimal acceptance
nopt
nCVt
2/1
2/1)2(
ntV
CAcc 2/1
2exp
Insert the optimal step-size to the acceptance
We obtain the optimal acceptance as
6th 85.0
4th 78.0
2nd 61.0
1exp
nAcc
opt
This depends only on the order of
the integrator
Thus the optimal acceptance does not
depends on the model and volume
Hfdt
df,
iiii p
f
q
H
p
H
q
fHf ,
HffHL ,)(
)())(exp()( tfHtLttf
VTqSLpLHL
)(
2
1)( 2
2
2
1pLT
)(qSLV
Leapfrog integrator•Hamilton's equations of motion Poisson bracket
•Define the linear operator
The formal solution is given by
This cannot be directly implemented
Leapfrog integrator
2/)()2/()(
)()(
2/)()()2/(
tttpttxttx
tx
Htpttp
ttptxttx
)()(
)())(exp()(
32
1
2
1
tftOeee
tfVTtttf
tTtV
tT
tTtV
tT
nd eeetG
2
1
2
1
2 )(
•2nd order leapfrog integrator
•Decompose the operator asT and V do not commute
O(Δt3) errors
Leapfrog integrator
1)()( 22 tGtG ndnd
time-reversible
area-preserving
Symplectic integrator
tTtV
tT
nd eeetG
2
1
2
1
2 )(
These conditions is needed for HMC to maintain the detailed balance
•2nd order leapfrog integrator has the following property
),(),( pxpx
)()( pdxddxdp
1),(
),(det
px
pxJacobian
Higher order integrator
)()exp()exp())(exp( n
i
ii tOtVdtTcVTt
•Higher order integrator can be found by decomposing
Higher order can be constructed by the lower integrators
•Find the coefficients c and d such that the leading error is O(Δtn)
tTtV
tT
nd eeetG
2
1
2
1
2 )(
4th order integrator is given by a product of three 2nd order integrator
)()()()( 1222124 taGtaGtaGtG ndndndth
where3/1
3/1
2
3/11
22
2
22
1
a
a This integrator is obviously time-reversible
and area-preserving
1)()( 44 tGtG thth
consider 2nd order integrator
Higher order integrator•Further higher order integrators are constructed from lower order
integrators recursively
)()()()( 12221222 tbGtbGtbGtG nnnn
)12/(1
)12/(1
2
)12/(11
22
2
22
1
n
n
n
b
b
Each 2nd order integrator has one force calculation
Cost of the nth order integrator
th12/
th
th
nd
n 3
6 9
4 3
2 1
n
# of 2nd order integrator
Recursive construction scheme
•Cost of higher order integrators
No higher order integrator with positive coefficients only (Suzuki)
Minimum Norm Integrator Higher order integrators can be used. Efficiency depends on the model we use.
Omelyan et al. (2003) found the 2nd order minimum norm (MN) integrator that is more efficient than the leapfrog integrator .
)(2/expexp2/exp)exp( 3tOtTtVtTtH Leapfrog
VThfp
hpH ),(ln2
),(2
)(exp2/exp)21(exp2/expexp)exp( 3tOtTtVtTtVtTtH MN
Leapfrog integrator
MN integrator
193183327.0for
)()( 33 tOtO MNLeapfrog Cost(MN)=2 x Cost(LF)
Simulation Study)2.0,1.0,3.0,0.1,93.0(),,,,( 22 uArtificial 4000 time series data
Simulations were done on the supercomputer system (SGI ICE X) at ISM
Simulation Study)2.0,1.0,3.0,0.1,93.0(),,,,( 22 uArtificial 4000 time series data
),(),( hpHhpHH
Efficiency function
5 times
Computational cost: Cost(MN) ~ 2 Cost(Leapfrog)
In total, MN integrator is about 2.5 times more effective than the leapfrog integrator.
ttPacc )(
Autocorrelation time~ 200
Autocorrelation time~ 18 # of volatility variables = 2000
HMC vs Metropolis
Input 0.93 -1.0 0.3 0.1 0.2
average 0.926 -0.97 0.31 0.097 0.203 -0.77
S.D. 0.007 0.10 0.03 0.006 0.010 0.23
8.8(20) 96(40) 1130(480) 64(18) 38(10) 21(6)2
2
u2
10h
Results
1
)(12t
tACFshort
Empirical StudyData: Panasonic Co. 2006.07.03-2009.12.30
Input data: Daily returns + RV(1min)
average 0.958 -7.63 -0.101 0.0440 0.0176 -7.21
S.D. 0.013 0.14 0.049 0.0036 0.0031 0.12
39(6) 24(6) 178(42) 65(11) 112(19) 37(11)
2
u2
10h
2
,,2 ttt uhy
cRVcRVy ttt lnlnln,2
cln
c:HL factor
Results
Input data: Daily returns + RV(1min)
HL factor
T
t
t
T
t
t
RV
RR
c
1
0
1
2)(
GPU computingHybrid Monte Carlo algorithm can be parallelized
Typically the number of volatility variables are thousands
Volatility variables can be updated in parallel
Speed up the Hybrid Monte Carlo algorithm
CUDA Fortran
OpenACC
Coding GPU program on NVIDIA Graphic Card
Directive-based coding
Host Device
CPU GPU
Main Memory Device Memory
Data transferData transfer
It is crucial to minimize overhead
Coding environment
GeForce GTX760
CPU: Intel i7-4770 3.4GHz
GPU: GeForce GTX 760 CUDA cores: 1152
Compiler:PGI Fortran(PGI 14.6)
CUDA6.0
1,...Ti 2/)()()(:3
1,...Ti )()(:2
1,...Ti 2/)()()2/(:1
iii
i
ii
iii
pthh
h
Hpp
pthh
CUDA Fortran
Kernel 2
Kernel 1
Kernel 3=Kernel 1
Device code for leapfrog
attributes(global) subroutine steph(nd,d_h,d_p,dt)implicit noneinteger,value :: ndreal(4), value :: dtreal(4),device :: d_h(nd),d_p(nd)integer :: j
c----- integrate h ------------------------------------j= (blockIdx%x -1 )* blockDim%x+ threadIdx%xif ( j< nd +1) d_h(j)=d_h(j)+d_p(j)*dt*0.5returnend
Kernel 1
Host and Device codes are needed
Kernel 2
attributes(global) subroutine stepp(nd,d_h,d_p,d_dy,d_grv,& dt,phi,xmu,vareta,varu, xe)
implicit noneinteger :: jinteger, value :: ndreal(4),value :: dt,phi,xmu,vareta,varu,xereal(4),device :: d_h(nd),d_p(nd),d_dy(nd),d_grv(nd)real(4) :: xf
j= (blockIdx%x -1 )* blockDim%x+ threadIdx%xif ( j< nd +1) then
if ( j== 1 ) thenc------ t=1 --- assume f(h(1)|para)= fullc write(*,*)"p10,h1= ",p(1),h(1),gm(1),gm(2)
xf = 1.0 - d_dy(1)**2*exp(-d_h(1))xf = xf + 2*(d_h(1)-(phi*d_h(2)+xmu*(1-phi)))/varetaxf = xf * 0.5xf = xf + (d_h(1)+xe-d_grv(1))/varud_p(1)=d_p(1)-xf*dt
c write(*,*)"p1,xf,yt,dgh= ",p(1),xf,yt,dghc----------------------------------------
endifif (j > 1 .and. j < nd ) then
c do j=2, nd-1xf = (1.0 - d_dy(j)**2*exp(-d_h(j))
& + 2*((1.+phi**2)*d_h(j)-(d_h(j+1)+d_h(j-1))*phi& - xmu*(1-phi)**2)/vareta)*0.5
xf = xf + (d_h(j)+xe-d_grv(j))/varud_p(j)=d_p(j)-xf*dt
c enddoendif
c---- t=nd ------
if ( j== nd) thenxf = (1.0 - d_dy(nd)**2*exp(-
d_h(nd))& + 2*(d_h(nd)-xmu*(1-phi) -
phi*d_h(nd-1))/vareta)*0.5xf = xf + (d_h(nd)+xe-
d_grv(nd))/varud_p(nd)=d_p(nd)-xf*dtendif
endif
returnend
integer, parameter :: n_thre = 512ngrid = (nd-1)/n_thre + 1
d_h = hd_p = pd_grv = grvd_dy = dy
do i=1,1000
call steph<<<ngrid,n_thre>>>(nd,d_h,d_p,dt)call stepp<<<ngrid,n_thre>>>(nd,d_h,d_p,d_dy,d_grv, dt,phi,xmu,vareta,varu, xe)call steph<<<ngrid,n_thre>>>(nd,d_h,d_p,dt)
enddo
Copy data from host to device
Kernel 1
Kernel 3
Kernel 2
Host code
OpenACC
!$acc data copy(h,p)
!$acc end kernels
!$acc end data
2/)()()(
)()(
2/)()()2/(
iii
i
ii
iii
pthh
h
Hpp
pthh
!$acc kernels
do j=1, n
enddo
i=1,T
i=1,T
i=1,T
Coding is done by inserting directives
Kernel 2
Kernel 1
Kernel 3
Data directive
Data size Bx512
# of volatility variables
Average time of one leapfrog step
Average time of 200 leapfrog steps
a c
CPU -1.85e-4 1.30e-5
GPU(OpenACC) 9.41e-3 3.36e-7
Fitting to f(T) =a+c*T
Gain=TIME(CPU)/TIME(GPU)
37736.3
530.1
e
e
37 times faster on GPU
Summary
Bayesian inference of realized stochastic volatility model has been performed by the Hybrid Monte Carlo method.
We found that the MN integrator is more effective than the conventional leapfrog integrator.
The Hybrid Monte Carlo method can de-correlate volatility variables fast enough.
The parameter ξ of the realized volatility model explains the bias similar to the HL factor although slight deviations are observed at high sampling frequency.
GPU computing can speed up the Hybrid Monte Carlo algorithm.
In practical application the HMC may be more efficient for a model with a large number of volatility variables such as the multivariate realized stochastic volatility.