gpu computing of the hybrid monte carlo method for the … · 2015. 12. 17. · first invented for...

GPU Computing of the Hybrid Monte Carlo Method for the Bayesian estimation of the Realized Stochastic Volatility Model( An application of HPC in finance)TETSUYA TAKAISHI

HIROSHIMA UNIVERSITY OF ECONOMICS

Outline of presentation1 Introduction

2 HMC in Lattice QCD

3 Modeling financial time series

4 Realized volatility

5 Realized stochastic volatility

6 Bayesian inference

7 Hybrid Monte Carlo

8 Optimal acceptance

9 Leapfrog integrator

10 Higher order integrator

11 Minimum Norm integrator

12 Simulation study

13 Empirical study

14 GPU computing

15 Summary

Introduction What is Hybrid Monte Carlo?

Hybrid Monte Carlo (HMC) is one of Markov Chain Monte Carlo Methods (MCMC).

First invented for the Lattice Quantum Chromo Dynamics (QCD) simulations by Kennedy et. al.

A special feature of HMC : global algorithm

Sometimes HMC is also called Hamiltonian Monte Carlo

NN xxxxxx 2121 , , 　

NN xxxxxx ,, , 2211 　

All variables can be updated at once.

The global feature is need for the Lattice QCD simulations

Usual MCMC methods are local algorithms that update variables sequentially.

)10( 6ON

Introduction Why we use the HMC?

The merits of HMC

1. The global update may de-correlate Monte Carlo samples for many variables.

Stochastic volatility model that includes a number of volatility variables to be updated in MCMC.

2. HMC code can be easily parallelized.

Accelerated by parallel computing such as GPU computing

The purpose of this study

Parameter estimation of the realized stochastic volatility model (RSVM) by the HMC simulations

Accelerating HMC simulations of the RSVM by GPU computing

Tuning by the improved integrator

HMC in Lattice QCD Why is the HMC needed for Lattice QCD?

QCD is the theory of strong interaction

Proton consists of three quarks

Strong force is carried by gluon

Lattice QCD is formulated to investigate the non-perturbative aspect of QCD.

Space-time is discretized on a 4-dimensional lattice.

Gluon fields are defined on links

Fermion fields are defined on sites

Lattice QCD action consists of gluon and fermion actions

),,()( USUS fermiongluon

TrUSgluon )(

Gluon action

))(exp(, niagAU n SU(3)

mnmnfermion UDUS )(),,(

Fermion action

nmmnmnmnnm rUrUKUD ,,,,, )()()( ＋

Fermion matrix Gamma matrix

64 10L 10030L

66 1010

)),,()(exp(][]][[ USUSdddUZ fermiong

Partition function

Integrating fermion fields by Grassmann integral

))(exp()det(][ USDdUZ g

Expectation value of physical observable

))(exp()det()(][1

USDUOdUZ

This is done by the Markov Chain Monte Carlo method

)(det,1min

If we do this by the Metropolis method, gluon fields are updated one by one.

For each update, we need to calculate a determinant of D.

In total 610

D is a huge matrix 66 1010

determinant calculations

Metropolis test

Time consuming

Efficient global algorithm is needed

Modeling financial time seriesData: Panasonic Co. 2006.07.03-2009.12.30

Daily returnttt ppR lnln 1

),,,( ,21,21 ttttt RRf

N(0,1) ～　t

Return is assumed to be

We assume volatility dynamics given by

Various volatility models are proposed

Stylized facts: no significant autocorrelation in return long autocorrelation time in absolute return volatility clustering

Volatility model

GARCH-type model

Deterministic model

Model parameters can be determined by the maximum likelihood estimation

Parameter estimation: easy

parameters

GARCHEGARCHGJR-GARCHetc

GARCH model

Engle (1982)Bollerslev(1986)

Stochastic volatility model

Basic stochastic volatility model

Parameter estimation: difficult

ttt ))(ln()ln( 2

2),0(~ Nt

Model parameters are determined by the Bayesian inference.

The Bayesian inference is performed by Markov Chain Monte Carlo methods

Hybrid Monte Carlo method is used for parameter estimations of the RSVM.

In this study we focus on the realized stochastic volatility model (RSVM).

RSVM= realized volatility + daily return

stochastic

)()()(ln tdWttpd

kiTtt rRV1

TtT )()( 22 Integrated volatility (IV) for T period

Realized Volatility

)(ln)(ln kipipri

Andersen, Bollerslev (1998)

Let us assume that the logarithmic price process

follows a stochastic diffusion as

Realized volatility is defined by a sum of

squared finely sampled returns.

returns calculated using high-frequency data

TN k: sampling period

W(t): Standard Brownian motion

volatility spot:)(t

kitt rRV1

ttt ppR lnln 1,1 Daily return

Realized volatilitykitr *

)(2 tT

Daily price

morning session afternoon session

How to deal with the intraday returns during the breaks?

break breakbreak

Domestic stock trade at the Tokyo Stock Exchange

09:00 11:00 12:30 15:00

Let us consider daily volatility

Usually stock exchange markets are not open for a whole day.

start end

Non-trading hours issue

RV without including returns in the breaks

Microstructure noise

)()()( ttrtr

),0(:)( 2 WNt

true noise

Observed returns are also contaminated by noise

Price discreteness, bid-ask spreads, etc.

rrrrRV1 1

Noise terms

)()(ln)(ln ttPtP

Observed prices are contaminated by microstructure noise

)()()( tttt

In the presence of noise RV is calculated as follows

Zhou(1996)

Hansen and Lunde (2005) introduced an adjustment factor

Correct RV so that the average of RV matches the variance of the daily returns

tt cRVRV

Average of original realized volatilities

Variance of daily returns

T: trading daysc: adjustment factor

Original realized volatility

Bias correction

Realized Stochastic volatility model

,,1 ttty )1,0(~ Nt

),0(~ 2ut Nu ,,2 ttt uhy

)0(~ ,)( 2

1 ,Nhh tttt

tt RVy ln,2

tth2ln

),,,,( 22

Takahashi et al.(2009)

Model parameters

Volatility variable

Dynamics of RV

Idea : daily return + realized volatility

Information given from financial markets

Bias to RV

tt cRVRV

cRVy tt lnln 0

HL factor

TdhdhhYYfL 121 ),|,()(

Likelihood function of Realized stochastic volatility model

yehYYf

,22/12

))((exp2

))1/((2

)(exp2

2exp2),|,(

TyyyY ,12,11,11 ,, TyyyY ,22,21,22 ,,

Bayesian inference

)()|()|( yLy

Bayes theorem

)()|()|(

)|( y ：posterior distribution

)|( yL ：likelihood function

)( ：prior distribution

Probability distribution of θ

)()|()( yLdyf

E )|(1

Parameters are estimated as expectation values:

Estimate by Markov Chain Monte Carlo

),,,,( 22

dhdhfddhdhhfE T )),(exp(ln),(][ 1

We need T updates of volatility variables at each MCMC step.

All T volatility variables can be updated simultaneously.

An advantage of the hybrid Monte Carlo algorithm:

De-correlate Monte Carlo samples of volatility variables

Expectation values of parameters for realized stochastic volatility model

),,,,( 22

dhdhfddhdhhfE T )),(exp(ln),(][ 1

Hybrid Monte Carlo

dpdhdhpHZ

dpdhdhfp

Z)),(exp(

1)),(ln

Introduce momenta ),,,( 21 Tpppp

Define Hamiltonian

hpH ),(ln2

conjugate to ),,,( 21 Thhhh

),,,,( 22

uFor RSV model

Kinetic + potential

Z: normalization constant

A local update scheme is not effective

Basic idea: HMC = molecular dynamics simulation + Metropolis test

Multi-move sampler, Watanabe,Omori(2004)

2/)()2/()(

2/)()()2/(

tttptthtth

tpthtth

Choose candidates of h by solving Hamilton’s equations of motion

Hamiltonian is conserved

Solve Hamilton’s equations of motion numerically

(2nd order ) Leapfrog integrator

),(),( hpHhpHH

)( 2tO

Hamiltonian is Not conserved due to discretization errors.

0),(),( hpHhpHH

t Step size

elementary step

repeat this step times

),( hpH ),( hpH

))exp(,1min())),(

),(exp(,1min( H

Metropolis accept/reject test

Acceptance depends on step size t

),(),( hpHhpHH )(~ 2tO

Molecular dynamics simulation

2/)()2/()(

2/)()()2/(

tttptthtth

tpthtth

),,,( 21 Thhhh

Optimal acceptance•What is the optimal acceptance for HMC ?

step-size small acceptance large – cost large

acceptance small – cost small step-size large

tAcctE )(

0t 0)( tE

t 0)( tE

•We define the efficiency function as

This function has the property:

Optimal acceptance and step-size are given by maximizing the efficiency function

Optimal acceptance

1HerfcAcc

CHAcc 2/1

Acceptance is evaluated as

ntCVH 2/12/1

2exp)(

Maximizing the efficiency function, we obtain the optimal step-size

At small energy difference

Efficiency function is rewritten as

2/1)2(

Gupta et al,PLB242(1990)437

For n-th integrator

Optimal acceptance

2/1)2(

CAcc 2/1

Insert the optimal step-size to the acceptance

We obtain the optimal acceptance as

6th 85.0

4th 78.0

2nd 61.0

This depends only on the order of

the integrator

Thus the optimal acceptance does not

depends on the model and volume

iiii p

HffHL ,)(

)())(exp()( tfHtLttf

VTqSLpLHL

)(qSLV

Leapfrog integrator•Hamilton's equations of motion Poisson bracket

•Define the linear operator

The formal solution is given by

This cannot be directly implemented

Leapfrog integrator

2/)()2/()(

2/)()()2/(

tttpttxttx

Htpttp

ttptxttx

)())(exp()(

tftOeee

tfVTtttf

nd eeetG

•2nd order leapfrog integrator

•Decompose the operator asT and V do not commute

O(Δt3) errors

Leapfrog integrator

1)()( 22 tGtG ndnd

time-reversible

area-preserving

Symplectic integrator

nd eeetG

These conditions is needed for HMC to maintain the detailed balance

•2nd order leapfrog integrator has the following property

),(),( pxpx

)()( pdxddxdp

),(det

pxJacobian

Higher order integrator

)()exp()exp())(exp( n

ii tOtVdtTcVTt

•Higher order integrator can be found by decomposing

Higher order can be constructed by the lower integrators

•Find the coefficients c and d such that the leading error is O(Δtn)

nd eeetG

4th order integrator is given by a product of three 2nd order integrator

)()()()( 1222124 taGtaGtaGtG ndndndth

where3/1

a This integrator is obviously time-reversible

and area-preserving

1)()( 44 tGtG thth

consider 2nd order integrator

Higher order integrator•Further higher order integrators are constructed from lower order

integrators recursively

)()()()( 12221222 tbGtbGtbGtG nnnn

)12/(1

)12/(11

Each 2nd order integrator has one force calculation

Cost of the nth order integrator

# of 2nd order integrator

Recursive construction scheme

•Cost of higher order integrators

No higher order integrator with positive coefficients only (Suzuki)

Minimum Norm Integrator Higher order integrators can be used. Efficiency depends on the model we use.

Omelyan et al. (2003) found the 2nd order minimum norm (MN) integrator that is more efficient than the leapfrog integrator .

)(2/expexp2/exp)exp( 3tOtTtVtTtH Leapfrog

hpH ),(ln2

)(exp2/exp)21(exp2/expexp)exp( 3tOtTtVtTtVtTtH MN

Leapfrog integrator

MN integrator

193183327.0for

)()( 33 tOtO MNLeapfrog Cost(MN)=2 x Cost(LF)

Simulation Study)2.0,1.0,3.0,0.1,93.0(),,,,( 22 uArtificial 4000 time series data

Simulations were done on the supercomputer system (SGI ICE X) at ISM

Simulation Study)2.0,1.0,3.0,0.1,93.0(),,,,( 22 uArtificial 4000 time series data

),(),( hpHhpHH

Efficiency function

5 times

Computational cost: Cost(MN) ~ 2 Cost(Leapfrog)

In total, MN integrator is about 2.5 times more effective than the leapfrog integrator.

ttPacc )(

Autocorrelation time~ 200

Autocorrelation time~ 18 # of volatility variables = 2000

HMC vs Metropolis

Input 0.93 -1.0 0.3 0.1 0.2

average 0.926 -0.97 0.31 0.097 0.203 -0.77

S.D. 0.007 0.10 0.03 0.006 0.010 0.23

8.8(20) 96(40) 1130(480) 64(18) 38(10) 21(6)2

Results

tACFshort

Empirical StudyData: Panasonic Co. 2006.07.03-2009.12.30

Input data: Daily returns + RV(1min)

average 0.958 -7.63 -0.101 0.0440 0.0176 -7.21

S.D. 0.013 0.14 0.049 0.0036 0.0031 0.12

39(6) 24(6) 178(42) 65(11) 112(19) 37(11)

,,2 ttt uhy

cRVcRVy ttt lnlnln,2

c:HL factor

Results

Input data: Daily returns + RV(1min)

HL factor

GPU computingHybrid Monte Carlo algorithm can be parallelized

Typically the number of volatility variables are thousands

Volatility variables can be updated in parallel

Speed up the Hybrid Monte Carlo algorithm

CUDA Fortran

OpenACC

Coding GPU program on NVIDIA Graphic Card

Directive-based coding

Host Device

CPU GPU

Main Memory Device Memory

Data transferData transfer

It is crucial to minimize overhead

Coding environment

GeForce GTX760

CPU: Intel i7-4770 3.4GHz

GPU: GeForce GTX 760 CUDA cores: 1152

Compiler：PGI Fortran（PGI 14.6）

CUDA6.0

1,...Ti 2/)()()(:3

1,...Ti )()(:2

1,...Ti 2/)()()2/(:1

CUDA Fortran

Kernel 2

Kernel 1

Kernel 3=Kernel 1

Device code for leapfrog

attributes(global) subroutine steph(nd,d_h,d_p,dt)implicit noneinteger,value :: ndreal(4), value :: dtreal(4),device :: d_h(nd),d_p(nd)integer :: j

c----- integrate h ------------------------------------j= (blockIdx%x -1 )* blockDim%x+ threadIdx%xif ( j< nd +1) d_h(j)=d_h(j)+d_p(j)*dt*0.5returnend

Kernel 1

Host and Device codes are needed

Kernel 2

attributes(global) subroutine stepp(nd,d_h,d_p,d_dy,d_grv,& dt,phi,xmu,vareta,varu, xe)

implicit noneinteger :: jinteger, value :: ndreal(4),value :: dt,phi,xmu,vareta,varu,xereal(4),device :: d_h(nd),d_p(nd),d_dy(nd),d_grv(nd)real(4) :: xf

j= (blockIdx%x -1 )* blockDim%x+ threadIdx%xif ( j< nd +1) then

if ( j== 1 ) thenc------ t=1 --- assume f(h(1)|para)= fullc write(*,*)"p10,h1= ",p(1),h(1),gm(1),gm(2)

xf = 1.0 - d_dy(1)**2*exp(-d_h(1))xf = xf + 2*(d_h(1)-(phi*d_h(2)+xmu*(1-phi)))/varetaxf = xf * 0.5xf = xf + (d_h(1)+xe-d_grv(1))/varud_p(1)=d_p(1)-xf*dt

c write(*,*)"p1,xf,yt,dgh= ",p(1),xf,yt,dghc----------------------------------------

endifif (j > 1 .and. j < nd ) then

c do j=2, nd-1xf = (1.0 - d_dy(j)**2*exp(-d_h(j))

& + 2*((1.+phi**2)*d_h(j)-(d_h(j+1)+d_h(j-1))*phi& - xmu*(1-phi)**2)/vareta)*0.5

xf = xf + (d_h(j)+xe-d_grv(j))/varud_p(j)=d_p(j)-xf*dt

c enddoendif

c---- t=nd ------

if ( j== nd) thenxf = (1.0 - d_dy(nd)**2*exp(-

d_h(nd))& + 2*(d_h(nd)-xmu*(1-phi) -

phi*d_h(nd-1))/vareta)*0.5xf = xf + (d_h(nd)+xe-

d_grv(nd))/varud_p(nd)=d_p(nd)-xf*dtendif

returnend

integer, parameter :: n_thre = 512ngrid = (nd-1)/n_thre + 1

d_h = hd_p = pd_grv = grvd_dy = dy

do i=1,1000

call steph<<<ngrid,n_thre>>>(nd,d_h,d_p,dt)call stepp<<<ngrid,n_thre>>>(nd,d_h,d_p,d_dy,d_grv, dt,phi,xmu,vareta,varu, xe)call steph<<<ngrid,n_thre>>>(nd,d_h,d_p,dt)

Copy data from host to device

Kernel 1

Kernel 3

Kernel 2

Host code

OpenACC

!$acc data copy(h,p)

!$acc end kernels

!$acc end data

2/)()()(

2/)()()2/(

!$acc kernels

do j=1, n

Coding is done by inserting directives

Kernel 2

Kernel 1

Kernel 3

Data directive

Data size Bx512

# of volatility variables

Average time of one leapfrog step

Average time of 200 leapfrog steps

CPU -1.85e-4 1.30e-5

GPU(OpenACC) 9.41e-3 3.36e-7

Fitting to f(T) =a+c*T

Gain=TIME(CPU)/TIME(GPU)

37736.3

37 times faster on GPU

Summary

Bayesian inference of realized stochastic volatility model has been performed by the Hybrid Monte Carlo method.

We found that the MN integrator is more effective than the conventional leapfrog integrator.

The Hybrid Monte Carlo method can de-correlate volatility variables fast enough.

The parameter ξ of the realized volatility model explains the bias similar to the HL factor although slight deviations are observed at high sampling frequency.

GPU computing can speed up the Hybrid Monte Carlo algorithm.

In practical application the HMC may be more efficient for a model with a large number of volatility variables such as the multivariate realized stochastic volatility.

gpu computing of the hybrid monte carlo method for the … · 2015. 12. 17. · first invented for...

Documents

effective field theories for quantum chromo- and...

lattice qcd: looking inside hadrons with computers · 2004....

dynamic modeling of rhic collisionsdynamic modeling of...

mark titmarsh chromo-man

uniparental disomy 14 (upd14) - rare chromo

chromosome chromo = colored in response to dye some = body

chromo endoscopy

chromo- and fluorophoric water-soluble polymers and silica

edition chromo | hotel 04 - anker

catalogo chromo 2.0 metalsport

google chromo os(docu)

quantum chromo-dynamics at the high energy/density...

1 “they invented inventing....” 1 "they invented...

introduction to quantum chromodynamics (qcd) · to...

chromo-tricks experiment kit - bemidji state university

selective chromo-fluorogenic chemosensor for cu detection...

trisomy 14 mosaicism ftnp - rare chromo

impressions chromo

chromo-dynamic multi-component lattice boltzmann equation...

mark titmarsh chromo-man - opus.lib.uts.edu.au · pdf...