multilevel and multi-index monte carlo for portfolio value-at-risk · x portfolio value at time...

Multilevel and multi-index MonteCarlo for portfolio Value-at-Risk

Candidate Number 828995

University of Oxford

A thesis submitted for the degree of

MSc in Mathematical Finance

Trinity 2016

Contents

List of notations 1

1 Introduction 3

2 Value-at-Risk and Monte Carlo simulation 62.1 Stochastic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Value-at-Risk in this model . . . . . . . . . . . . . . . . . . . . . . . 72.3 Monte Carlo approach for computing Value-at-Risk . . . . . . . . . . 92.4 Portfolio structure and computational cost . . . . . . . . . . . . . . . 12

3 Multilevel Monte Carlo for portfolio VaR 143.1 The MLMC approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Complexity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 The MLMC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4 Parameters of accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 203.5 Multilevel parameter strategies - fixed portfolio . . . . . . . . . . . . 21

3.5.1 Analytical pricing . . . . . . . . . . . . . . . . . . . . . . . . . 243.5.2 MC pricing without bias . . . . . . . . . . . . . . . . . . . . . 243.5.3 MC pricing with bias . . . . . . . . . . . . . . . . . . . . . . . 26

3.6 Multilevel parameter strategies - position sampling . . . . . . . . . . 273.6.1 Position sampling with analytical pricing . . . . . . . . . . . . 293.6.2 Position sampling with Monte Carlo pricing . . . . . . . . . . 29

4 Multi-index Monte Carlo for portfolio VaR 304.1 The MIMC approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3 Optimal index sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4 Complexity theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.5 The MIMC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Numerical Example 385.1 Model portfolio assumptions . . . . . . . . . . . . . . . . . . . . . . . 385.2 MLMC - Convergence rates . . . . . . . . . . . . . . . . . . . . . . . 39

5.2.1 Analytical pricing . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.2 MC pricing without bias . . . . . . . . . . . . . . . . . . . . . 40

i

5.2.3 MC pricing with bias . . . . . . . . . . . . . . . . . . . . . . . 405.2.4 Position sampling with analytical pricing . . . . . . . . . . . . 415.2.5 Position sampling with Monte Carlo pricing . . . . . . . . . . 41

5.3 MLMC - Computational cost . . . . . . . . . . . . . . . . . . . . . . 425.3.1 Analytical pricing . . . . . . . . . . . . . . . . . . . . . . . . . 425.3.2 MC pricing without bias . . . . . . . . . . . . . . . . . . . . . 435.3.3 MC pricing with bias . . . . . . . . . . . . . . . . . . . . . . . 435.3.4 Position sampling with analytical pricing . . . . . . . . . . . . 435.3.5 Position sampling with Monte Carlo pricing . . . . . . . . . . 43

5.4 MIMC - Convergence rates . . . . . . . . . . . . . . . . . . . . . . . . 435.4.1 MC pricing without bias . . . . . . . . . . . . . . . . . . . . . 445.4.2 MC pricing with bias . . . . . . . . . . . . . . . . . . . . . . . 445.4.3 Position sampling with analytical pricing . . . . . . . . . . . . 445.4.4 Position sampling with Monte Carlo pricing . . . . . . . . . . 44

5.5 MIMC - Computational cost . . . . . . . . . . . . . . . . . . . . . . . 445.5.1 MC pricing without bias . . . . . . . . . . . . . . . . . . . . . 445.5.2 MC pricing with bias . . . . . . . . . . . . . . . . . . . . . . . 455.5.3 Position sampling with analytical pricing . . . . . . . . . . . . 455.5.4 Position sampling with Monte Carlo pricing . . . . . . . . . . 45

6 Conclusion 46

A Empirical results 48

B Excursion: Direct quantile MLMC 61

Acronyms 63

References 64

ii

List of notations

θ Market parameter scenario at time horizon. 6Xθ Portfolio value at time horizon under scenario

θ. 6, 12P Probability distribution of θ. 6

Xθ Result of the pricing algorithm for the portfo-lio value under scenario θ, an approximationto Xθ. 6

Q Probability distribution of Xθ(ω). 6

X Initial portfolio value. 7, 12L Portfolio loss between now and the time hori-

zon. 8Q Quantile of the distribution of the portfolio

value at time horizon that is required in orderto calculate the portfolio VaR. 8

S Number of market scenario samples θs used inthe quantile estimation Q. 10, 20

Q Estimator for Q using the empirical inversedistribution function of a number of MonteCarlo samples of Xθ. 10

N Number of Monte Carlo paths generated inthe pricing routine to calculate Xθ

p . 11, 20H Number of time steps used in each Monte

Carlo path generated to calculate Xθp . 11, 20

P Number of positions in the portfolio. 12

Xp Initial value of the p-th position. 12Xθp Value of the p-th position at time horizon un-

der scenario θ. 12Cp Expected computational cost of performing

the pricing algorithm for the p-th position. 12L Number of levels used either in the MLMC or

the MIMC algorithm. 14ε Allowed root mean square error in the VaR

calculation. 15

1

Yl Dual level estimator for level l in the MLMCapproach, average of Gl samples of ∆Ql. 15,22

G Number of outer samples used to estimate theexpectations in the MLMC and MIMC ap-proach. 15, 16

Y MLMC or MIMC estimator for Q. 16, 22, 30P Number of positions evaluated in the quantile

estimation Q. 21

∆Ql Backward difference operator applied to thequantile estimator for level l in the MLMCapproach. 23

∆Ql Mixed difference operator applied to the quan-tile estimator at index l in the MIMC ap-proach. 31

Yl Mixed difference estimator for index l in theMIMC approach, average of Gl samples of∆Ql. 32

2

Chapter 1

Introduction

The computation of the Value-at-Risk (VaR) of a large derivative portfolio is stillone of the big methodical and computational challenges in finance. While there arevarious relatively efficient approximations, the need of capturing risk more adequatelyand the corresponding regulatory pressure are driving investment banks to adapt themore challenging full valuation approaches. Among these, the most comprehensiveand at the same time most computationally challenging approach is the full MonteCarlo approach of calculating portfolio VaR.

The majority of structured products and many exotic derivatives require MonteCarlo (MC) simulation for their valuation. In the presence of such derivatives, fullMonte Carlo VaR calculation becomes a problem of nested simulation: For each sam-ple in the outer simulation a nested simulation with its own number of samples hasto be performed. Nested simulation problems are well known for their computationaldemand and are therefore usually avoided in financial practice. Examples of suchproblems include the pricing of American or Bermudan derivatives, the pricing of vari-able annuity products, the calculation of credit valuation adjustments (see Albaneseet al. [2011] or Albanese et al. [2013]), and the calculation of required risk capitalunder Solvency II. Instead, so called least squares Monte Carlo (LSMC) methodolo-gies are often used which replace the inner Monte Carlo simulation with least squaresregressions. See e.g. Longstaff and Schwartz [2001] for the case of American optionpricing, Bacinello et al. [2011] for the pricing of variable annuities, or Bauer et al.[2010] for Solvency II risk capital.

As for portfolio VaR calculation, we are not aware of LSMC approaches to avoidthe nested simulation. However, there are approaches that aim at solving the fullMonte Carlo VaR problem without avoiding the nested simulation but instead bymaximizing the computational efficiency otherwise. A recent example of such an ap-proach was published by Gordy and Juneja [2010] who analyze the optimal allocationof resources between the inner and the outer simulation layers. Another example isthe work by Broadie et al. [2011], although they focus on estimating the probabilityof a large loss instead of VaR. Such methodologies can often be combined with tra-ditional techniques of variance reduction that are available for regular Monte Carlo

3

problems, such as importance sampling, see for instance Glasserman et al. [2000].These other techniques could potentially also be employed in combination with theapproaches introduced in this work.

A different approach of handling challenging Monte Carlo problems, called multi-level Monte Carlo (MLMC), was introduced by Giles [2008a]. An extensive overviewof this approach is given in Giles [2015]. Since its initial introduction it has been ap-plied to a variety of simulation problems ranging from the initial examples of financialstochastic differential equations (SDEs) and stochastic partial differential equations(SPDEs) over fluid dynamics equations to biological systems (e.g. Mishra et al. [2012],Lester et al. [2015]). Roughly speaking it can be applied whenever the computationalcost – and inversely the accuracy – per Monte Carlo sample can be controlled throughone or more parameters. Nested simulation is such a case because the number of sim-ulations in the inner Monte Carlo simulation can be seen as a parameter controllingthe cost of one sample in the outer simulation. Consequently some initial applicationsof multilevel Monte Carlo for nested simulation problems have been published overrecent years. Two examples are the initial work by Haji-Ali [2012] and more recentlythe work by Bujok et al. [2015].

Using multilevel Monte Carlo nested simulation for full Monte Carlo VaR is anew application that we will discuss in this work. To date we are not aware of anypublished examples of this approach in the literature. For nested simulation, the keyaddition to previous MLMC applications is an antithetic “trick” introduced in Haji-Ali [2012] and described in detail in Giles [2015] that allows the efficient comparisonof different accuracy levels of the inner simulation along given paths of the outer sim-ulation; this comparison lies at the heart of any MLMC approach. The number of MCsamples in the inner pricing simulation is the first of four sample accuracy parametersthat we will consider throughout this work. The other three will be motivated in thefollowing paragraphs.

A particularity of the full Monte Carlo VaR problem when compared to existingMLMC applications is that one is not concerned with the calculation of an expec-tation but rather the calculation of a quantile. However, the main result of Giles[2008a], which justifies the use of MLMC to improve computational efficiency whencompared to traditional Monte Carlo techniques, is stated for expectations. We thusembed the VaR calculation problem inside that of calculating an expectation, whichin a way adds another layer of simulation. The outer samples of the full Monte CarloVaR calculation become part of a single sample of the expectation problem. Boththe number of VaR samples as well as the number of inner pricing samples becomeparameters controlling the cost of one sample in the expectation layer. The numberof VaR samples is therefore the second of our four accuracy parameters. Using thisapproach we will build on the results by Gordy and Juneja [2010] who found theoptimal efficiency when using traditional Monte Carlo techniques. We will analyze ifthe MLMC approach can be used to further reduce the computational cost under thesame model assumptions.

4

Furthermore, when Monte Carlo methodologies are required to price derivatives, itis usually because the pricing requires the simulation of a time-discretized stochasticdifferential equation. In this case, the number of time steps used in the discretizationcontrols how large the so called discretization bias becomes. Thus, the number of timesteps is again a parameter controlling the sample cost and accuracy of the derivatives’MC pricing. Incidentally this was the initial example for the use of MLMC that wasdiscussed in Giles [2008a]. It suggests itself to also consider this as a parameter ofaccuracy in the full Monte Carlo VaR. This increases the number of relevant accu-racy parameters to three. This parameter was not considered in Gordy and Juneja[2010] who assumed an unbiased pricing algorithm, and we will extend their modelby considering the impact of this additional parameter.

Finally, there is a fourth natural parameter controlling the computational cost,namely the number of different derivatives in the portfolio. Instead of calculatingthe whole portfolio value within each VaR scenario, one can sample a subset of allderivative positions in the portfolio as an approximation which naturally reduces thecomputational effort per scenario. To our knowledge this is a new approach that wasnot discussed in the literature before.

When considering all parameters at once, the sample accuracy level is a four di-mensional vector. A drawback of the MLMC approach is that the different levelsthat are compared only trace a one dimensional path across this four dimensionalparameter space. In certain cases this can be suboptimal because it does not use allavailable degrees of freedom within the optimization of resource allocation. This iswhy the MLMC approach has recently been generalized by Haji-Ali et al. [2015] toa new approach introducing vector-indexed levels, which is called multi-index MonteCarlo (MIMC). We will also discuss this approach and analyze how it compares tothe MLMC approach for the problem of full Monte Carlo VaR calculation.

The remainder of the work is structured as follows: In Chapter 2 we give somebackground on portfolio VaR as well as define the stochastic model and formulateour basic notations for portfolio structure and valuation functions. Afterwards weintroduce the MLMC approach in Chapter 3 and elaborate on how the parameters ofaccuracy influence the VaR estimation. In Section 3.5 we construct the actual MLMCVaR estimator taking into account the various possible combinations of parameters,first without the sampling of subportfolios and then in Section 3.6 with positionsampling. In these sections we also discuss optimal strategies of choosing the accuracyparameters along the different levels and the corresponding convergence properties.The analogous steps under the MIMC framework are then performed in Chapter 4.In Chapter 5 we construct a hypothetical sample portfolio and perform numericalexperiments using all the different estimators defined in the previous chapters. Welist our findings about estimator convergence rates and the resulting computationalefforts of the MLMC and MIMC algorithms. Finally, Chapter 6 concludes.

5

Chapter 2

Value-at-Risk and Monte Carlosimulation

2.1 Stochastic model

We are going to start by defining the stochastic model that we will consider in theremainder of this work.

Let θ : Ω→ Θ be a random vector describing a set of parameters and let X : Θ→R be a real-valued function depending on this parameter vector. We will writeXθ := X(θ) for θ ∈ Θ. Let Ω be equipped with the probability measure P.

In this work Xθ will model a portfolio value under the market parameters θ. θrepresents a market scenario at a certain future time horizon. So, without going intodetail, when considering X, what the reader should have in mind is something likethe usual risk-neutral pricing formula

Xθ(ω) = EQ∗ [Xθ(ω)

∣∣ θ(ω)]

where Q∗ is an equivalent martingale measure and Xθ represents the value of allfuture cashflows and assets in the portfolio discounted to the time horizon and wherethe evolution of all future market parameters depends only on the starting point θ.

We will assume that this expression is not straightforward to compute because itrepresents the valuation of a whole portfolio of possibly complex financial products.More specifically, we will assume that there are parts of the portfolio that can notbe valued analytically or through finite difference schemes and for which only MonteCarlo valuation is viable. Thus, even for a fixed scenario θ(ω), the numerical valuationresult, denoted by Xθ(ω), will be a random variable. We will denote its domain by Ω2

and the underlying probability measure by Q1:

Xθ(ω) := X(θ(ω), ·) : Ω2 → R1Note that Q is the real world distribution of the Monte Carlo estimator which technically is not

necessarily the same as the martingale measure Q∗ but an approximation.

6

This setup is illustrated by the following diagram:

Ω Θ R

Ω2 R

θ X

X

Note that when we write Xθ we mean the random variable on the product spaceΩ × Ω2. When we are talking about the random variable on Ω2 alone, i.e. we keepthe scenario fixed, we will indicate this by writing θ(ω) as above.

When both the outer random variable θ and the inner random variable X arebeing simulated with Monte Carlo methods, it is called nested simulation. There arenumerous applications of nested simulation that could be described by this generalstochastic model, often where only the expectation of Xθ is of interest. Well knownexamples are the valuation of Bermudan options, where X is the present value at agiven call date and θ describes the market parameters at that date, or the calcula-tion of credit valuation adjustments where X is a risk-neutral portfolio value and θdescribes the default status of counterparties. Contrary to those other applications,in this work we are not interested in the expectation EP⊗Q[Xθ] but in quantiles ofthe distribution (or law) of Xθ, L (Xθ). The reason being that these quantiles areneeded in Value-at-Risk calculations as described in the following section.

2.2 Value-at-Risk in this model

We are posed with a portfolio that has a known starting value of X (not to be con-fused with the random variable Xθ above) and we are tasked with calculating theValue-at-Risk for a given time horizon. For information about this risk measure werefer to the books Jorion [2007], McNeil et al. [2015], and Korn et al. [2010]. Herewe will only reiterate the essentials to establish notation. Suffice it to say that VaRis the most widely-used risk measure in practice and its calculation on portfolio levelfor certain types of risk is mandated by various regulatory entities.

Value-at-Risk is always calculated with respect to a given time horizon, e.g. oneday or one year from now. We shall model the random portfolio value at that timehorizon by Xθ where the random vector θ models all the external information thatis relevant for valuing the portfolio at the time horizon (including the time horizonitself, market parameters, cashflows between now and the time horizon, etc.). Wewill disregard discounting of Xθ to today or at least not consider it explicitly when itcomes to modeling Xθ. We will also not explicitly model any other P&L between nowand the time horizon (like new trades or coupon payments) apart from the changesin position values. Thus the portfolio loss L until the time horizon in terms of fair

7

value is given by L := X −Xθ.

For a given confidence level p ∈ (0, 1), e.g. p = 99%, the Value-at-Risk is definedas the (smallest) loss that will not be exceeded with probability higher than (1− p).Formally:

VaR(p) := inf l ∈ R : P(L > l) ≤ 1− p = X − supv ∈ R : P(Xθ < v) ≤ 1− p

For any random variable Z we will denote its distribution function by FZ . From

now on we assume that the distribution function FXθ of Xθ is differentiable andstrictly increasing. This means that the distribution of Xθ has no point masses andconnected support, i.e. no in-between areas of zero probability. Apart from the obvi-ous complication that a real world portfolio value will always be measured in discretemonetary values, these assumptions are not very restricting.

By considering the generalized inverse distribution function of Xθ, given by

F−1Xθ (p) := infx : FXθ ≥ p,

the above definition can be written as:

VaR(p) = X − F−1Xθ (1− p)

Throughout this work we will use the following notation:

q := 1− p

Q := F−1Xθ (q)

which leaves us with the core definition:

VaR(p) = X −Q (2.2.1)

Q will be the quantity whose estimation we set as our goal. There are two reasonsstemming from practical experience why we leave X out of the estimation:

1. One is that we assume it to be known a priori: In practice, X corresponds tothe usual end-of-day valuation that is done independently of risk calculation,e.g. for profit and loss calculation, accounting, or regulatory reporting.

2. The second reason is that typically there is a number of different valuationlibraries being used, some of which proprietary “black boxes”, which can onlybe used to calculate position values. If we wanted to do the VaR calculation inone step, by estimating the quantile of the loss L = X −Xθ directly, we wouldneed valuation libraries that can simulate losses rather than values. Customlibraries might be tweaked to provide these simulations, but even then it meanssignificant effort.

8

However we do note that – from a mathematical standpoint – there might be argu-ments why the simulation of loss directly has benefits. For instance, if X and Xθ arelarge numbers but relatively close, their difference L will be of smaller order whichcan also imply a smaller estimation error. This decision only affects our numericalresults but has little impact on the theory that is described in this work. All thedescribed approaches could equally well be applied to the simulation of L.

2.3 Monte Carlo approach for computing Value-

at-Risk

There are multiple approaches in practice on how to compute or approximate theportfolio VaR. Again we refer to Jorion [2007], McNeil et al. [2015], and Korn et al.[2010] for further information on the prevalent methodologies. Among the most rel-evant are the following, roughly ordered by increasing computational complexity ordecreasing number of simplifying assumptions:

• The delta-normal approach, also called variance-covariance approach.

• The historical simulation approach.

• The full Monte Carlo approach.

The delta-normal or variance-covariance approach as described for instance inJ.P.Morgan [1996] had gained popularity early. It assumes that all positions’ lossesin the portfolio depend linearly on a set of risk factors and that all the risk factorsfollow a joint normal distribution. Under these assumptions also the vector of po-sition values has a joint normal distribution whose loss quantile can be computedanalytically. There are extensions to this approach, e.g. the delta-gamma approachwhich considers also second degree dependencies on the risk factors but still makesthe assumption of joint normality of the risk factors.

In contrast to the simpler approaches described above, the historical simulationapproach uses so called full valuation, where the portfolio losses are actually com-puted under a large number of given market scenarios. Comparing that to the delta(-gamma) approach, where only a single valuation (including greeks) was done, showsthe dramatic increase in computational complexity. The difficulty with full valuationlies in defining the market scenarios which should be evaluated. If the scenarios aretoo optimistic, VaR will be underestimated; if they are too conservative, VaR will beoverestimated. Historical simulation deals with this by evaluating a set of recentlyobserved market scenarios. The assumption being that until the time horizon, themarket will behave as it did in the past.

Finally, the full Monte Carlo approach also employs full valuation and the mar-ket scenarios are generated using Monte Carlo simulation. To this end, a stochasticmodel for the market parameters has to be assumed to simulate their evolution until

9

the time horizon.

There are various mixtures and variations of these approaches as well as variouscompeting measures of risk, but most of the approaches used in practice can be cat-egorized under one of the three described above. As for their relative popularity inthe banking industry it seems2 that over recent years the delta-normal approach hasbeen slowly fading and the historical simulation approach is currently most preva-lent. Use of the full Monte Carlo approach is still mostly constricted to tier one banks.

In this work we will only consider the full Monte Carlo approach. Using ournotation as described above it can be described the following way:

1. Generate a number S of samples θs (s ∈ 1, . . . , S) for the market scenariosfrom the assumed market parameter model.

2. For each θs, re-evaluate the portfolio under this scenario, obtaining an estimatedvalue X θs (which is not necessarily identical to X θs because of possible pricingerrors).

3. Construct the empirical inverse distribution function F−1Xθ from these samples

and evaluate F−1Xθ (q). This gives you an estimator for Q which we denote by Q.

In view of the multilevel approaches described later we will refer to this as thesingle-level estimator.

4. ComputeVaR(p) ≈ X − Q (2.3.1)

The simplest way of obtaining an empirical inverse distribution function is bycreating the order statistics of the samples and picking the value at position dqSe asdescribed in Korn et al. [2010]. There, we can also find a reference to a central limittheorem for quantiles by Glynn [1996] which we can make use of if (!) Xθ follows thesame distribution as Xθ:

√S(F−1S,Xθ(q)− FXθ(q)

)L−→ Z ∼ N

(0,

q(1− q)fXθ(FXθ(q))

)(2.3.2)

where fXθ is the density function of Xθ. This theorem tell us in particular that

QL−→ Q, where the L denotes convergence in distribution, so long as Xθ L−→ Xθ.

This will be a key assumption in Chapter 3 and Chapter 4.Usually we will not know what the precise variance of this estimator is, but we

can observe the convergence rate of 1/S in the variance.

Note that this particular quantile estimator is not unbiased and that the biasscales with the standard deviation of the distribution, see Okolewski and Rychlik[2001]. There are a number of alternative quantile estimators that are all based on

2Judging from internal surveys at d-fine GmbH, a risk management consultancy.

10

the order statistics of the samples (θs)Ss=1, see e.g. Hyndman and Fan [1996], but

none of them is perfect. For our empirical analysis in Chapter 5 we will use linearinterpolation between samples which is the default for many statistics libraries andwell suited when the approximated distribution is continuous.

So far, this full Monte Carlo approach is often avoided in practice, especiallywhen Monte Carlo simulations are also needed in each re-evaluation of the portfolio,as we are assuming in this work. Under these circumstances, the full Monte Carloapproach gives rise to nested simulation: For each of the S “outer” samples andfor each “Monte Carlo position” in the portfolio, the straightforward approach is togenerate a large number N of “inner” samples of the position payoff such that foreach such position a total of S · N payoff evaluations are necessary. Add to that,that almost always when a position needs Monte Carlo methods for evaluation onealso needs to simulate a time discretization of some stochastic differential equationfor the underlying price process(es). If such a time discretization includes the gener-ation of H random time steps, a single position will require the sampling of a total ofS ·N ·H random process evolution steps and that is for a single-underlying derivative.

On the other hand, there is a regulatory pressure on banks and other financialinstitutions to calculate VaR with more sophisticated models in order to avoid con-servative safety margins in place for less sophisticated models. These penalizationstogether with increased capabilities that are caused by improvements in computationpower and methodology, establish an incentive towards the use of more precise ap-proaches. This reasoning motivates us to investigate how the relatively precise fullMonte Carlo approach can be made more computationally efficient.

To put things into perspective however, one should note that there are also otherobstacles in practice that often prevent more sophisticated methodologies of comput-ing VaR. For instance, many banks have dedicated systems for the calculation of VaRthat are separated from the trading system where the portfolio is usually valued andwhere greeks are computed for intraday risk-management. In such a typical setup,the risk system incorporates the “scenario engine” and computes the VaR from thescenario valuations. Often this risk system can not evaluate all products and dependson input from the trading system in form of valuations or greeks. The full MonteCarlo approach, especially when dynamic scenario sampling schemes are involved,makes the separation of scenario generation and evaluation very inefficient becauseof the number of such interactions. A similar problem in practice that is relevant forall approaches that require full valuation under given scenarios is that the definitionof risk factors and scenarios can be hard to align between risk system and “valuationsystem”. Such practical limitations can however always be overcome given time andshould not prevent us from studying possible improvements.

11

2.4 Portfolio structure and computational cost

So far we have treated the portfolio as a black box of which we only observe the totalvalue X or Xθ. In truth however, a typical financial portfolio is very heterogeneousand its structure plays an important role in the calculation of the portfolio VaR. Asstated earlier, we assume that some of the positions in the portfolio need Monte Carlomethods to be valued. However, in a typical investment bank portfolio only a subsetof positions will be in complex products while the majority will be in simple productsfor which relatively “cheap” valuation methods are available.

For the remainder of this work we will focus on the “expensive” Monte Carlopositions. To simplify notation we will pretend that the portfolio consisted only ofsuch “expensive” positions. In practice, the effort of evaluating analytical or evenfinite difference solvers is dominated by the effort of evaluating Monte Carlo solvers,so modeling the Monte Carlo positions only is a reasonable approximation in termsof computational cost. Further research could be done on how to distribute com-putational effort between positions depending on their cost and their impact on theoverall result. See Gordy and Juneja [2010] for an initial discussion.

For the sake of notation we will also assume that each position is in a differentfinancial derivative and thus all positions have individual pricing functions. Due tothe way that a bank’s portfolio is usually structured, a single derivative will havemultiple positions in various subportfolios. In such a case, valuation results can becached and reused in between positions. We will not model this explicitly and insteadpretend that positions with identical valuation functions were already grouped intoone position.

In the following we will consider a portfolio that consists of P positions. Thecurrent value of the portfolio is the sum of all of the current position values:

X =P∑p=1

Xp

Analogously the portfolio value under scenario θ is:

Xθ =P∑p=1

Xθp

For the remainder of this work we will make the assumption that the number ofpaths and the number of time steps used in the pricing algorithm is kept constantacross all positions.

We will denote by Cp the computational cost of calculating the (approximate)

value of position p, Xθ(ω)p , for a given realization ω ∈ Ω. We assume that the calcu-

lation of Xθ(ω)p requires Monte Carlo simulation of an SDE for one underlying price.

The accuracy of the estimator Xθp for the true value Xθ

p depends on the number N

12

of Monte Carlo samples that are generated and the number H of time steps that aresimulated. Further, we assume that the cost is proportional to these factors:

Cp ≡ cNH, (2.4.1)

i.e. we disregard any overhead for payoff evaluation etc.

13

Chapter 3

Multilevel Monte Carlo forportfolio VaR

3.1 The MLMC approach

Multilevel Monte Carlo is a technique for reducing the computational effort in certainMonte Carlo estimators. It was first introduced in Giles [2008a] for the Monte Carlosimulation of an SDE through an Euler-Maruyama-Scheme. However, the underlyingprinciple can be applied in many different Monte Carlo applications – broadly speak-ing whenever there is a way of controlling the computational effort per sample.

We will first sketch how the MLMC method works in principle. Assume for thebeginning that we have a fixed number of levels L ∈ N+. For each level l ∈ 0, . . . , Lwe generate a number of samples. The idea now is to choose a large number of samplesin the initial level l = 0 but to reduce the cost per sample at this level. This can beachieved in various ways, depending on the problem at hand, see Section 3.4. Sincethere is no free lunch, reducing the cost per sample can only achieved by decreasingthe accuracy of the samples, typically in terms of bias. (Obviously, if methods areavailable to reduce the cost while maintaining the same accuracy, these should beemployed first.) This initial estimate will thus have a relatively large error introducedby the bias of the samples while having relatively low variance due to the high numberof samples.

To refine the estimate, within the next level l we use a lower number of samples toestimate the error that was introduced by the inaccurate samples in the previous level.This is done by computing the difference between the previous coarse approximationand a new finer approximation using the same new set of samples. This process canthen be repeated for each further level, each time estimating the reduction in biasdue to finer and finer approximations. The exact number of samples per level can bechosen in order to optimize the overall computational cost given a maximum level oferror that should be achieved. The total number L of refinement steps is also chosendepending on the allowed level of error.

14

In the next paragraph, we will define the relevant measures of error and cite themain theorem which establishes the improvements in efficiency that can be gainedcompared to traditional Monte Carlo methods.

3.2 Complexity Theorem

We will be concerned with the convergence of an estimator for Q, say Q, in terms ofmean square error (MSE) or root mean square error (RMSE). Ignoring the differencebetween the real world measure P and the distribution of our pseudo random MonteCarlo samples θ, the latter is simply the L2(P ⊗ Q) distance between Q and Q andthe former its square.

MSE =∥∥∥Q−Q∥∥∥2

L2= E

[(Q−Q)2

]RMSE =

√MSE

An important initial observation is that the mean square error can be decomposedinto the variance of the estimator and its squared bias:

E[(Q−Q)2

]= V[Q] +

(E[Q]−Q

)2(3.2.1)

With classical Monte Carlo, to achieve an RMSE of ε, i.e. an MSE of ε2, onewould independently choose an approximation for Q which has a sample bias of ε orless and then average over enough samples to reduce the variance to below ε2. By thecentral limit theorem, the necessary number of samples is of the order 1/ε2. Thus, ifthe average cost of generating one such sample is of the order of cε, then the overallcost is of the order cε/ε

2.

The following theorem, quoted from Giles [2015], determines which reduction ofthe overall cost can be achieved by the multilevel method and under which conditions.This version of the theorem is a slight generalization of the original version given inGiles [2008a].

Theorem 1 (Complexity Theorem - MLMC).Let Z denote a random variable, and let Zl denote the corresponding level l numericalapproximation.

If there exist independent estimators Yl based on Gl Monte Carlo samples, eachwith expected cost Cl and variance Vl, and positive constants α, β, γ, c1, c2, c3 suchthat α ≥ 1

2minβ, γ and

i)∣∣∣E[Zl − Z]

∣∣∣ ≤ c12−αl

ii) E[Yl] =

E[Z0], l = 0,

E[Zl − Zl−1], l > 0

15

iii) Vl ≤ c22−βl

iv) Cl ≤ c32γl,

then there exists a positive constant c4 such that for any ε < e−1 there are values Land Gl for which the multilevel estimator

Y =L∑l=0

Yl

has mean-square-error with bound

MSE ≡ E[(Y − E[Z])2

]< ε2

with a computational complexity C with bound

E[C] ≤

c4ε−2, β > γ,

c4ε−2(log ε)2, β = γ,

c4ε−2−(γ−β)/α, β < γ.

The theorem shows that in the best case, where β > γ, the ε−2 order of computa-tional complexity can be recovered that one also achieves in a classical Monte Carloproblem with “constant” cost cε ≡ c of simulating each sample. The additional effortof having one or more convergence parameters that drive the computational cost persample can be avoided with efficient usage of resources.

We will construct the estimators Yl in Section 3.5 and Section 3.6. Before, thereare a few things to observe.

One important thing to note is that the theorem is stated for the case where wewant to estimate the expectation of a random variable Z. However, in our situationof VaR estimation, we want to find the quantile of a distribution, not its mean. Recallthat Q ∈ R is a quantile of the distribution L (Xθ) and we are approximating it bycalculating the empirical quantile Q of the distribution L (Xθ). To apply Theorem 1we need to treat the quantile as a random variable, i.e. we let the random variableZ := Q almost surely. Then we construct our estimators Zl := Ql that approximateZ ≡ Q which means that they converge in distribution to the Dirac measure at pointQ as l → ∞ (a direct consequence of Equation 2.3.2). Only now can we use theapproach described in the theorem to estimate the expectation E[Z] which of courseequals Q.

The point being that now one simulation of Ql is part of a single sample in thesense of Theorem 1 out of Gl samples over which we will later average. N.b. thatevaluating Ql already consists of simulating Sl nested samples of θ and further nestedNl paths in the pricing routine. To accommodate the number Gl we have to actuallysimulate Gl · Sl scenarios θ which we then group into batches of Sl scenarios for thecalculation of Ql. So Ql is the empirical quantile of one such group of Sl samplesbut it is only part of one single sample in the sense of the complexity theorem. Gl

16

represents the number of groups of scenarios. Figure 3.1 gives a visual demonstrationof the process. We will detail this further in Section 3.5.

Figure 3.1: Histogram of a simulation of four groups of scenarios (different colors)and the corresponding four quantile estimator values (dashed vertical lines). Theexpected value of the quantile estimator is estimated by averaging over these fourvalues. The solid lines indicate the true distribution and the true quantile.

In summary we introduced a number Gl of scenario groups and repeat the exerciseof calculating empirical scenario quantiles for each group, only to later average over allthese quantiles. Intuitively one might rather use the additional number of scenariosdirectly in the single quantile estimation to increase its accuracy. As the extremeexample, if we assumed no uncertainty in the portfolio valuation, i.e. if Xθ = Xθ,our VaR problem would boil down to a simple Monte Carlo quantile estimation usingS samples. In that case, we would probably not come to the idea of even using themultilevel approach since there is no parametric sample accuracy or cost. The regularMC approach would already result in the same ε−2 complexity that MLMC reachesin the ideal case. On the other hand, we could do the same trick of creating Gl groupsto artificially turn S into such an accuracy parameter and to turn the problem intoone of mean estimation in order to apply the theorem nonetheless. This extremeexample goes to show how we need to create artificial complexity to go from quantile

17

estimation to mean estimation. Considering this additional degree of freedom thatwe introduced solely for this purpose, one is drawn to speculate if it should notbe possible to formulate an altered version of Theorem 1 that replaces the targetexpectation with a quantile. One could directly consider the number of scenarios asthe number of outer samples in the sense of the theorem and as a canonical duallevel estimator one could simply use the difference between two quantile estimatorsobtained from differently accurate approximations of the inner random variable. SinceEquation 2.3.2 provides the key ingredient – a central limit theorem for quantiles –this estimator will converge with similar order as an average. Hence we see no reasonwhy almost the same theoretical results should not hold for quantiles as well, althoughusing this different approach might certainly change the values of α, β, and γ andturn out less efficient in the end. This would be an area for future research. A draftof how this theorem could be formulated is given as an excursion in Appendix B. Forcomparability with existing research we will only use the conventional expectation-based approach in this work.

3.3 The MLMC algorithm

Theorem 1 is not constructive in the sense that it does not provide us with the val-ues for L and Gl. If all the single level estimators’ expectations and variances wereknown, these could be determined right away, but that is certainly not the case inmost applications. Hence some slightly heuristic approximations are done.

The optimal final level L can be obtained by iteratively testing for convergence inthe remaining bias after computing Y0, . . . , YL and increasing L by 1 if convergenceis not reached. As a convergence condition, Giles [2008a] proposes to use

max|YL−1|2

, |YL| <1√2ε (3.3.1)

which tries to ensure a squared discretization bias (right hand term of Equa-tion 3.2.1) below ε2/2.

Within the notation of Theorem 1, the overall variance and expected computa-tional cost C are

V := V[Y ] =L∑l=0

VlGl

C =L∑l=0

GlCl

Treating Gl as continuous and minimizing the expected cost for a fixed targetvariance of V = ε2/2 using the Lagrange method gives

18

L (G0, . . . , GL, λ) =L∑l=0

ClGl − λ(V −L∑l=0

VlGl

)

∂L

∂Gl

= Cl − λVlG2l

!= 0

⇔ Gl =

√λVlCl

∂L

∂λ= V −

L∑l=0

VlGl

!= 0

⇔ λ =

(∑Ll=0

√VlCl

V

)2

⇔ Gl =1

V

√VlCl

(L∑l=0

√VlCl

)

Plugging in the variance constraint V = ε2/2 and taking the next larger integervalue gives us the optimal

Gl =

⌈2ε−2

√VlCl

L∑l=0

√VlCl

⌉(3.3.2)

Together with the convergence condition Equation 3.3.1 this bounds the two termsof Equation 3.2.1 to give an overall MSE of ε2.

Since the value of Vl is not known a priori, we need to spend some additional ef-fort to estimate it. This can be accomplished most efficiently by calculating an initial(heuristic) number G of samples for each level to have an initial variance estimatebefore determining Gl and calculating the (possible) remaining samples. The varianceestimates can then be updated using the full set of Gl samples. Thus we only wastesample calculations on levels where the optimal number Gl is less than G.

In total, the MLMC algorithm can be summarized as follows:

1. Set L = 0.

2. Generate G initial samples for level L which give a sensible estimate for VL.

3. Set GoldL = G.

4. Based on the estimates for V0, . . . , VL, calculate Gnew0 , . . . , Gnew

L using equation(3.3.2).

5. For l = 0, . . . , L, if Gnewl > Gold

l , generate Gnewl − Gold

l new samples of level l,update the Vl estimate and set Gold

l = Gnewl .

19

6. Check for convergence using equation (3.3.1).

7. If not converged, increase L by one and repeat from step 2.

8. Otherwise calculate Y from all samples generated so far.

The value of Cl will be determined by which parameters of accuracy we want tovary. The next section should give an overview of the possible ways of controlling thecost per sample, the constructions of the estimators, and the corresponding bias andvariance properties.

3.4 Parameters of accuracy

In the situation of portfolio VaR calculation through the full Monte Carlo method,there are multiple ways of reducing the computational cost per sample. We willconsider the following “dimensions” along which accuracy and cost per sample canbe controlled:

• S ∈ N+

Number of scenario samples per group, i.e. how many θs simulations are usedfor the quantile estimation within each of the G or Gl groups of samples. Thisis the parameter S of Equation 2.3.2 that directly controls the variance of thequantile estimator.

• N ∈ N+

Number of MC samples used in the pricing routine. Note that as not to com-plicate things further we assume this to be constant over all positions. Furtheroptimization might be achieved by varying Np between positions, depending onthe different variances of the positions’ pricing routines. Increasing the numberof samples within the pricing of an individual position will decrease the vari-ance of the pricing error |Xθ(ω)

p − Xθ(ω)p | by the order 1/N , which affects both

the expectation as well as the variance of the estimator Q in a way that is notanalytically known in general.

• H ∈ N+

Number of time steps used in the pricing routine. Again we assume this to

be constant over all positions. This controls the bias∣∣∣E[Xθ

p ]− E[Xθp ]∣∣∣ in the

numerical pricing that vanishes as H → ∞. One typically has convergence ofthe form 1/Hα with α > 0. For instance with a basic Euler-Maruyama schemeand a smooth European payoff one typically achieves α = 1, see e.g. Higham[2001] or Kloeden et al. [2012] for more details. This bias in the underlyingrandom variables translates mostly to a bias in the estimator Q, although itcould also have a very small impact on its variance.

20

• P ∈ 1, . . . ,PNumber of positions to evaluate. Only evaluating a subportfolio

∑Pp=1Xπp ,

where π is a permutation of 1, . . . ,P, instead of the full portfolio∑P

p=1Xp

changes the distribution L (Xθ(ω)) in almost arbitrary ways. We will alwaysconsider randomly sampled positions and rescale the chosen subportfolio valueby P/P to guarantee that the expectation stays the same. However, for theestimation of arbitrary quantiles the full distribution of the generated samplesis relevant. This is a more challenging requirement than generating samplesthat just need to have the right expectation as is often needed for other MonteCarlo applications. Hence this parameter is probably the one whose behavioris the most difficult to predict.

We will continue the analysis for certain combinations of these dimensions belowand provide more details there.

3.5 Multilevel parameter strategies - fixed portfo-

lio

To keep notation manageable, we will first construct the estimators only for the pa-rameters S, N , and H. The estimators including P will be defined in Section 3.6.

For fixed θ(ω), we will write Xθ(ω)(N,H) for a sample (an approximation to Xθ(ω))that is generated using N Monte Carlo paths and H time steps in the pricing routine.Let Q(S,N,H) := QX θ(N,H)(S) denote our single-level estimator for Q obtained from

S samples of X θ(N,H). Note that this estimator will only converge in distributionto the correct quantile of the true distribution as all three parameters are increased:

Q(S,N,H)L−→ Q for S →∞, N →∞, H →∞.

This can be seen right away from Equation 2.3.2 in conjunction with the assumption

X θ(N,H)L−→ X θ. If only the sample size S is increased, the estimator will converge

to the correct quantile but of the distribution of the estimator X θ(N,H):

Q(S,N,H)L−→ F−1

X θ(N,H)(q) for S →∞.

As described in Section 3.2, to fit our quantile estimator into the setting of The-orem 1 we need to add an additional layer of “outer” samples over which we takethe average. This is because the theorem is stated for the case where an expectation,namely E[Q], is to be estimated. The estimator Q(S,N,H) is then only a singlesample in the sense of Theorem 1 and S becomes one of the parameters controllingthe accuracy of the sample. To estimate the expectation, we will generate a numberG of independent groups of samples, each with S scenario samples i.i.d., and for eachsuch group we will calculate Q(S,N,H).

21

Given a set of parameters (Sl, Nl, Hl) on level l, we will simply write

Qgl := Qg(Sl, Nl, Hl) , g ∈ 1, . . . , G.

Furthermore, for l > 0 we need an estimator Yl for the difference in error thatresults from two different levels of accuracy (Sl, Nl, Hl) and (Sl+1, Nl+1, Hl+1). Hereit is important that these estimators are calculated ω-wise, i.e. along a fixed outersample g ∈ 1, . . . , G. This means that for each of the Gl sets of Sl samples each,we want an estimator for

E[Qgl − Q

gl−1

], g ∈ 1, . . . , Gl

that uses the same set of Sl samples.

Here we employ the antithetic approach as described e.g. in Giles [2015], wherebywe split the Sl samples into Sl/Sl−1 sub-groups of size Sl−1 to be able to calculateQgl−1 and then average over these subgroups when calculating the difference. From

now on we assume that Sl/Sl−1 is a constant and call it M . Then, as an estimatorfor the above expectation we use

1

Gl

Gl∑g=1

[Qgl −

1

M

M∑m=1

Qg,ml−1

], l > 0 (3.5.1)

where Qg,ml is now the quantile estimator computed from the m-th subgroup of the

g-th group of samples.1

Using this notation, we can define the final multilevel estimator Y for Q:

Y :=L∑l=0

Yl (3.5.2)

where

Yl :=

1

G0

G0∑g=1

Qg0 , if l = 0,

1

Gl

Gl∑g=1

[Qgl −

1

M

M∑m=1

Qg,ml−1

], if l > 0.

(3.5.3)

Under the assumption of Equation 2.4.1 the expected cost per sample on level lis then approximately

1From a theoretical standpoint it would also be possible and sensible to use this antithetic ap-proach for the Nl samples of the Monte Carlo pricing routine. This would however increase theimplementation effort even further and seems almost impossible in practice considering the reasonswe discussed in Chapter 2. For these reasons we do not pursue it in our numerical example.

22

Cl :=

cSlNlHl , if l = 0,

cSl (NlHl +Nl−1Hl−1) , if l > 0.

In contrast to the plain single-level Monte Carlo VaR in Equation 2.3.1 we obtainthe multilevel Monte Carlo VaR:

ˆVaR = X − Y . (3.5.4)

We would like to introduce one more shorthand notation for the definition ofYl which will be useful later when comparing MLMC to MIMC (Multi-index MonteCarlo) in Chapter 4:

Yl ≡1

Gl

Gl∑g=1

∆Qgl , l ≥ 0

where ∆ is the (antithetic) backward difference operator

∆Ql := Ql −1

M

M∑m=1

Qml−1 (3.5.5)

with Q−1 := 0.

Regarding the assumptions of Theorem 1, clearly Y satisfies condition ii) becauseof the linearity of expectation.

As for condition iii) one can make the initial observation that for l > 0

Vl = V

[Ql −

1

M

M∑m=1

Qml−1

]=

= V[Ql] +1

MV[Ql−1]− 2Cov[Ql, Ql−1] =

= V[Ql] +1

MV[Ql−1]− 2Corr[Ql, Ql−1]

√V[Ql]V[Ql−1]

(3.5.6)

so the decrease in variance comes not only from the decrease in the single level vari-ance V[Ql] but also from the correlation between the different levels’ estimators, hencethis being called an antithetic approach.

Conditions i, iii, and iv are about the parameters parameters α, β, and γ whichdepend on how we choose Sl, Nl, Hl, and Pl. We will pursue different “parameterstrategies” for these as described in the following subsections. We will always letthe individual parameters grow along powers of 2 and we will denote the differentstrategies according to the chosen exponents, i.e.

23

“S1 ”: Sl = S02l, Nl ≡ N0, Hl ≡ H0, Pl ≡ P0,

“S2N1 ”: Sl = S022l, Nl = N02

l, Hl ≡ H0, Pl ≡ P0,

“S2N1H1 ”: Sl = S022l, Nl = N02

l, Hl = H02l, Pl ≡ P0,

“S1P1 ”: Sl = S02l, Nl ≡ N0, Hl ≡ H0, Pl = P02

l,

. . .

3.5.1 Analytical pricing

In this subsection, as a canonical example, we temporarily drop the assumption thatMonte Carlo methods are required in the position pricing and that instead the pricingis done in constant time, e.g. through an analytical formula. Thus, in this simplecase we have Xθ(ω) = Xθ(ω). The only parameter that we vary to increase sampleaccuracy is S. Since we have no pricing error, the central limit theorem for quantiles(Equation 2.3.2) implies that we should expect to retrieve the 1/S variance decay inQl. Equation 3.5.6 tells us that if the correlations stay more or less constant, whichwe can observe empirically, we should expect β = 1. Since the sample cost Cl isnow proportional to S we have γ = 1. In our numerical example in Chapter 5 wecan reproduce β slightly larger than 1 which implies a cost of order ε−2 according toTheorem 1 which we can also confirm numerically. However, in this single parametercase regular Monte Carlo would also achieve ε−2 complexity due to Equation 2.3.2 .

3.5.2 MC pricing without bias

In this subsection we assume that there is no time-discretization bias in the MonteCarlo pricing algorithm, so EQ[Xθ(ω)] = Xθ(ω). The accuracy of Q is determined bythe number of samples used for each quantile estimation S and the number of MCsamples in the pricing algorithm N . This situation appears for instance for a complexEuropean payoff when the underlying price process model is integrable, i.e. the under-lying price at expiry can be simulated exactly without the need for time discretization.

Under these assumptions we arrive at the same setup for Q as was used in Gordyand Juneja [2010], because these assumptions imply that the pricing error will havezero mean and for large N the pricing error will also be approximately normally dis-tributed according to the central limit theorem.

Gordy and Juneja [2010] show that under given mild assumptions on the jointdistribution of the portfolio value Xθ and the pricing errors Xθ(ω) − Xθ(ω), the biasand variance of the single level quantile estimator Q can be described as follows:

E[Q]−Q =cq

Nf(Q)+ oN(1/N) +OS(1/S) + oN(1)OS(1/S)

24

V[Q] =q(1− q)

(S + 2)f(Q)2+OS(1/S2) + oN(1)OS(1/S)

where f is the density of Xθ and cq is a constant for whose definition we refer toGordy and Juneja [2010].

From this and from Equation 3.2.1, Gordy and Juneja [2010] deduce that tominimize the MSE of Q for a given cost C, the optimal choice of S and N are:

S∗ =

(q(1− q)

2c2c2q

)1/3

C2/3 + oC(C2/3)

N∗ =

(2c2q

q(1− q)c

)1/3

C1/3 + oC(C1/3)

where c is the cost per path as defined in Equation 2.4.1.

Plugging these choices of S∗ and N∗ into the formulas for bias and variance aboveand setting squared bias and variance both to order ε2, which is required to achievean RMSE of ε, shows that with the conventional single level Monte Carlo estimatorthe expected computational cost C is of order O(ε−3).

For MLMC the forms of S∗ and N∗ indicate that we should choose the strategy“S2N1”, i.e. Nl = N02

l and Sl = S022l. However, because of Equation 3.5.6, it is

not clear that this choice will also be optimal in the multilevel setup, because of thecovariance term in V[Yl] (Equation 3.5.6).

Our numerical results in Chapter 5 do indeed suggest that “S2N1” is the opti-mal strategy in this setup. This strategy has γ = 3 and we obtain β somewhatbelow 3 which according to Theorem 1 results in a cost of order slightly above ε−2.However, our numerical results can only confirm this order for relatively large errorlevels. While we decrease the allowed RMSE, the computational cost grows largerthan predicted and reaches orders > ε−3 asymptotically. Considering that MLMCwill typically also have some computational overhead in comparison to other MCmethods, see for instance the examples in Korn et al. [2010], we can not ultimatelyproclaim to achieve better results than with the regular MC approach as performedby Gordy and Juneja [2010]. Of course these results only apply to the hypotheticalexample portfolio we analyzed and might come out different for real world portfolios.

There is another step that has to be done heuristically now that more than oneparameter is controlling the sample accuracy, namely to choose the initial proportionbetween the parameters, i.e. S0/N0. During our numerical results we observed radi-cally different behavior when this proportion was not chosen adequately. In practicesome fine tuning will have to be done initially to get this right because obviously itdepends on the given circumstances, specifically the variance of the pricing errors in

25

relation to the variance resulting from the different scenarios.

One special thing to note about this case is that in the absence of pricing bias,for q < 1/2 the q quantile of Xθ(ω) will under most circumstances be lower than thatof Xθ(ω) because both random variables have the same mean while the former hashigher variance (because of the stochastic independence between the pricing error andthe position value their variances simply add up). Assuming that this inequality willnot be broken by any bias introduced by the quantile estimator itself, Q will have anegative bias that vanishes (neglecting again the bias of the quantile estimator) asN →∞. The levels l > 0 of the multilevel estimator incrementally reduce this bias,although it will still remain with the same sign. Since – following these heuristicarguments – the sign of the bias is known, Richardson extrapolation heavily suggestsitself to increase the convergence rate. This would also be an area for future research.

3.5.3 MC pricing with bias

We will now relax one of the assumptions of the previous Subsection 3.5.2 and thusalso of the setup in Gordy and Juneja [2010]. In practice, depending on the type ofproduct under investigation, Monte Carlo methods are often necessary in conjunctionwith the simulation of an SDE through time discretization. In these cases, the ap-proximative payoff evaluation will introduce an unknown bias into the pricing. Thebias is controlled by the number H of time steps used in the discretization and will

vanish as H →∞. To fulfill the basic requirement of QlL−→ Q, the chosen strategy

needs to fulfill Hl →∞ as l→∞. Looking at the assumptions of Theorem 1, since Hwill not have a large impact on the variance of the estimator Q, H should be chosensuch that the strategy barely fulfills the requirement of α ≥ 1

2minβ, γ.

Since the form of this bias depends on the actual portfolio, the choice of strategycan not be described in full generality. We performed numerical experiments using ageometrically decaying bias and found surprisingly unstable convergence of the esti-mators. Strategy S2N1H2 turned out to be required to fulfill the α requirement eventhough it did not provide good results in terms of β compared to the high γ value of5. Consequently we achieved an order of cost of slightly below ε−4. This is a littleless expensive than what one might expect to achieve with conventional Monte Carloin the model setup that we analyzed, although the practical difference might not besignificant.

That the γ value scales with the number of parameters is an expected problem withthe MLMC approach when it comes to high-dimensional parameter spaces. MIMCcould in theory provide better results in such cases.

26

3.6 Multilevel parameter strategies - position sam-

pling

Only evaluating a subset of all P positions is an easy way of reducing the compu-tational cost. In fact, Gordy and Juneja [2010] already investigated how varyingnumbers of samples per position can be chosen to increase efficiency and obtained theresult that as P grows large, fewer and fewer samples should be used per position.They did however bound Np ≥ 1. The next step would be to allow Np = 0, i.e. notvaluing certain positions at all. We will accommodate this by choosing a randomsubset π1, . . . , πP of all positions 1, . . . ,P and rescaling the position values byP/P to maintain the expected portfolio value.

It should be noted that to use the evaluation of such a subportfolio on level l as akind of control variate in the next level l+ 1 requires a good correlation between thetwo chosen subportfolios and eventually the whole portfolio.

Apart from random choice, the chain of subportfolios could also be constructedaccording to specific criteria. Typically, for purposes of hedging and risk limit control,an investment bank’s positions are grouped into subportfolios based on their relevantrisk factors (e.g. dominant risk type, underlying, currency). Simply choosing suchexisting subportfolios would disregard sensitivities to other risk factors and hence nothave a good correlation w.r.t. those risk factors.

A better approach would be to construct synthetic subportfolios that combineproducts with many different relevant risk factors. There might be significant vari-ance reduction to be achieved through a custom tailored choice of subportfolios thatare known to be well correlated but such a custom choice could also turn out detri-mental because it might introduce a bias.

The rescaling of the chosen subportfolios could also be done in various other ways,e.g. by notional, by fair value, or by exposure (delta). We do not pursue these ideasfor the sake of simplicity.

As noted in Section 3.4, the behavior of the estimators Yl as P varies is verydifficult to predict because it is highly dependent on the structure of the portfolio,i.e. how the individual positions react to the scenario θ.

One difficulty lies in the fact that the whole portfolio of an investment bank willtypically be almost neutral w.r.t. small changes in the market parameters because ofhedging. Although these hedges will not be perfect in all the scenarios θ, they willcertainly have a large mitigating effect in the overall VaR. If one happens to choose asubportfolio that is not adequately hedged, the correlation between subportfolio andfull portfolio might be relatively small.

A way to deal with this hedging effect could be to use an analytical delta- ordelta-gamma-VaR approximation as an initial control variate for the Monte Carlosimulation. Then, one can evaluate the correction term on a subportfolio.

Another difficulty lies in the high dimensionality of the market parameter space.

27

In a typical investment bank portfolio, only very few positions will depend on a givenarbitrary parameter (although there certainly are few parameters on which manypositions depend, like the domestic risk free discount rate or the bank’s own creditspread). That means that the correlation between two randomly chosen positions willoften be very small.

The only general mitigation for these small correlations is to choose subportfoliosthat are large enough to behave similar to the whole portfolio which limits the rangeof different possible values for Pl. This is not a problem per se, although it is certainlybeyond the scope of our theoretical analysis here.

The multilevel estimator then has the following general form:

Yl :=

1

G0

G0∑g=1

Qg(S0, N0, H0, P0) , if l = 0,

1

Gl

Gl∑g=1

[Qg(Sl, Nl, Hl, Pl)−

1

M

M∑m=1

Qg,m(Sl−1, Nl−1, Hl−1, Pl−1)

], if l > 0.

where Qg(Sl, Nl, Hl, Pl) is now the q quantile estimator of the random variables (drop-ping the other indices)

X θ(Pl) :=PPl

Pl∑p=1

X θπp ,

where P/Pl is the scaling factor and πp ∈ 1, . . . ,P represents the p-th uniformlyrandom non-repeating choice of position on level l. That πp does not depend on l isby choice, as it guarantees consistency in the bias reduction along the levels l. I.e.after choosing a certain subportfolio with position indices Π0 ⊂ 1, . . . ,P on level0, we would rather estimate the resulting bias by comparing Π0 with Π1 ⊃ Π0 thanby comparing unrelated subportfolios Π1 and Π0,2. In short we impose the constraintthat we fix the chain of subportfolios Π0 ⊂ Π1 ⊂ · · · ⊂ 1, . . . ,P a priori.

A straightforward extension of this approach would be to also employ the an-tithetic approach for parameter P . That is, to calculate the coarse approximationby taking the average of the values calculated from Πl−1 on the one hand and fromΠl \ Πl−1 on the other hand. We did not include this approach in our numericalexample purely for reasons of implementation.

The expected cost per sample on level l now includes the number of positions,hence

Cl :=

cSlNlHlPl , if l = 0,

cSl (NlHlPl +Nl−1Hl−1Pl−1) , if l > 0.

28

3.6.1 Position sampling with analytical pricing

As a first step we consider a portfolio where all positions can be priced analyticallywithout bias and without Monte Carlo variance. The two relevant parameters are Sand P .

The effect of S is similar to previous chapters while the impact of P can not bedescribed without making very specific assumptions about the portfolio. In Chapter 5we analyzed a numerical example portfolio where we assumed that the dependencyof Xθ

p on a scalar risk factor θ is normally distributed with zero mean, which modelsa portfolio that is on average perfectly hedged against changes in θ and where mostpositions depend little on θ while few react strongly to changes in θ. With theseassumptions we observed S1P3 to be the optimal strategy among the ones we testedand the α, β, and γ values actually implied the optimal ε−2 case of Theorem 1. Weempirically observed that order of cost as well, which is an improvement over regularMonte Carlo where the cost would be of order ε−2/P .

3.6.2 Position sampling with Monte Carlo pricing

If we consider the general case with Monte Carlo pricing that includes time discretiza-tion, we arrive at the total number of four parameters: S, N , H, and P . Therefore,the “cheapest” strategy of increasing all parameters equally, already implies γ = 4.Not unexpectedly, in our numerical experiments we found no strategy that achievesβ > γ. This does not mean that the MLMC approach fares worse than regular MonteCarlo, whose cost scales more strictly with the number of parameters. As an addi-tional problem, due to the high dimensionality, it becomes increasingly difficult toidentify the optimal strategy. Both of these problems could in theory be alleviatedby the MIMC approach.

Our empirical results for the progression of computational cost in this case wereinconclusive because the high computational complexity prevented us from runningcomprehensive tests.

29

Chapter 4

Multi-index Monte Carlo forportfolio VaR

4.1 The MIMC approach

Multi-index Monte Carlo (MIMC ) is a generalization of MLMC that was introducedby Haji-Ali et al. [2015]. If there is more than one parameter driving the accuracyand cost per sample, especially if the parameters are not interchangeable, MIMC cangive very different results from MLMC. If there was only one parameter, the twoapproaches would be identical. Again we will first sketch the general approach beforedetailing when and how the results of the approaches differ.

Again we are considering an estimator Y for the portfolio value at the time hori-zon as in Equation 3.5.4. To apply the MIMC treatment we generalize the “tele-scoping” sum of MLMC (Equation 3.5.2) in the following way: Instead of taking a“one-dimensional” sum of L summands Yl we now take the sum over vector indexedestimators Yl:

Y :=∑

l∈I(L)

Yl (4.1.1)

where the dimension d of the vector l corresponds to the number of parametersthat govern the accuracy of the estimator. Here, I(L) ⊂ Nd is the index set whoseconstruction we will detail later. Note that in our case with the parameters of accuracy(S,N,H, P ) and correspondingly the indices l ≡ (lS, lN , lH , lP ) we have d = 4.

This will allow us to “fine tune” which of the parameters need to be increasedwhereas in the MLMC approach they were increased along static proportions (in logspace). Obviously if there was only one parameter, as e.g. in the original MLMCdemonstrations in Giles [2008a] and Giles [2008b], l would be one-dimensional andone would retrieve the MLMC approach. Hence MIMC only differs if there are mul-tiple parameters and one can suspect that the potential improvements increase withthe number of parameters. In particular d = 4 functionally different parameters is arelatively high number compared to the ones mentioned in the literature, for instance

30

the numerical example in Haji-Ali et al. [2015] has d = 3 parameters which only differnumerically.

Abusing notation we will from now on write

Ql ≡ Q(Sl, Nl, Hl, Pl).

which should be read as

Q(SlS , NlN , HlH , PlP )

so the indices l need not be identical for the four parameter even though we use thesame symbol.

We will now define Yl. To this end we extend the backward difference operatorfrom Equation 3.5.5 to one-dimensional backward difference operators along eachindividual parameter. For N , H, and P this is straightforward while for S we againuse the antithetic trick of averaging over the sample subsets:

∆SQ(Sl, Nl, Hl, Pl) := Q(Sl, Nl, Hl, Pl)−1

M

M∑m=1

Q(Sl−1, Nl, Hl, Pl)

∆NQ(Sl, Nl, Hl, Pl) := Q(Sl, Nl, Hl, Pl)− Q(Sl, Nl−1, Hl, Pl)

∆HQ(Sl, Nl, Hl, Pl) := Q(Sl, Nl, Hl, Pl)− Q(Sl, Nl, Hl−1, Pl)

∆P Q(Sl, Nl, Hl, Pl) := Q(Sl, Nl, Hl, Pl)− Q(Sl, Nl, Hl, Pl−1)

where we set the subtrahend on the right to 0 whenever l = 0 for any of the fourparameters.

From the one-dimensional backward difference operator Haji-Ali et al. [2015] re-cursively define the first-order mixed difference operator

∆Ql := (⊗i∈S,N,H,P∆i)Ql := ∆S((⊗i∈N,H,P∆i)Ql) := . . .

which we observe to equal

=∑

i∈−1,0d−1‖i‖1Ql+i,

(4.1.2)

a representation that is useful for numerical representation; ‖·‖1 being the 1-norm,i.e. the sum of absolute values.

For example, if we ignore the P parameter for a moment and let l = (lS, lN , lH)then

31

∆Ql = ∆S(∆N(∆HQ(Sl, Nl, Hl))) =

= ∆S(∆N(Q(Sl, Nl, Hl)− Q(Sl, Nl, Hl−1))) =

= ∆S

([Q(Sl, Nl, Hl)− Q(Sl, Nl, Hl−1)]−

−[Q(Sl, Nl−1, Hl)− Q(Sl, Nl−1, Hl−1)])

=

=(

[Q(Sl, Nl, Hl)− Q(Sl, Nl, Hl−1)]−

−[Q(Sl, Nl−1, Hl)− Q(Sl, Nl−1, Hl−1)])−

− 1

M

M∑m=1

([Qm(Sl−1, Nl, Hl)− Qm(Sl−1, Nl, Hl−1)]−

−[Qm(Sl−1, Nl−1, Hl)− Qm(Sl−1, Nl−1, Hl−1)])

The number of different parameter tuples for which the estimator Q needs to becalculated is always 2d, e.g. in this example 23 = 8. If P is included, ∆Ql includes24 = 16 different estimators Q. All the while one has to keep in mind that all these2d quantile estimators need to be evaluated from the same set of Sl samples whichare split into M disjoint subsets for the evaluation of Q(Sl−1, ·).

Yl is then again obtained by averaging over a number Gl of independent groupsof samples:

Yl :=1

Gl

Gl∑g=1

∆Qgl

The expected cost per sample for index l is correspondingly higher than forMLMC, namely

Cl :=∑

i∈−1,0dζl+i

where

ζl :=

0 , if ‖l‖min < 0,

cSlNlHlPl , else.

Following the same argumentation as for the multilevel approach (see Section 3.3),the optimal number of samples per index l can be found to be:

Gl =

2ε−2√VlCl

∑l∈I(L)

√VlCl

(4.1.3)

32

4.2 Assumptions

Analogously to Theorem 1 for MLMC, certain assumptions on the estimators ∆Ql

need to be made. We restate the assumptions made by Haji-Ali et al. [2015] in theterminology used above and writing l = (lS, lN , lH , lP ) ≡ (l1, . . . , ld):

i) E[Ql]→ E[Q] as ‖l‖min →∞

ii)∣∣∣E[∆Ql]

∣∣∣ ≤ c1∏d

i=1 2−αili

iii) V[∆Ql] ≤ c2∏d

i=1 2−βili

iv) Cl ≤ c3∏d

i=1 2γili ,

where c1, c2, c3 are positive constants and Cl is the expected computational costof calculating one sample of ∆Ql.

Furthermore we assume βi ≤ 2αi and γi > 0 for all i ∈ 1, . . . , d.

Comparing these assumptions to those in Theorem 1, Haji-Ali et al. [2015] notethat MIMC requires mixed regularity of a given order while MLMC only requiresordinary regularity of the same order. E.g. while MLMC only requires that thesample variance decays approximately by a factor of 2−β as one goes one step furtheralong the chosen parameter strategy, MIMC requires the variance decay by factor2−βi for each possible direction i. This could be viewed as the distinction between adirectional derivative and the total derivative, only in a discrete version.

On the other hand, the condition βi ≤ 2αi of MIMC can be much weaker thanα ≥ 1

2minβ, γ of MLMC because the MLMC γ will typically scale with the number

of dimensions d.

4.3 Optimal index sets

So far, we did not discuss how to choose I(L) ⊂ Nd. The canonical approach might beto choose the hyperrectangle bounded by the origin and a given maximum accuracym(L), i.e. I(L) = l ∈ Nd : li ≤ mi(L) ∀i ∈ 1, . . . , d. This way, as can be seenfrom the definition of the mixed difference operator in Equation 4.1.2, the sum inEquation 4.1.1 telescopes under expectation, thus guaranteeing E[Y ] = E[Q(m(L))].This is what Haji-Ali et al. [2015] call a full tensor index set and for which they provea complexity theorem which we do not cite here, because it is only relevant when theconvergence rates satisfy certain constraints that will only be met by edge cases.

However, Haji-Ali et al. [2015] show that the minimal workload to reach a givenaccuracy will under most practically relevant circumstances be obtained by choosing

33

I(L) to be the d-simplex spanned by the origin and certain multiples of the unit vec-tors. They arrive at this result by first showing that it is optimal to add indices l toI(L) based on their ratio of error contribution E[∆Ql] over their work contributionClGl where Gl is the optimal number of samples according to Equation 4.1.3. Plug-ging in Equation 4.1.3 and the bounds of assumptions ii) - iv) given in Section 4.2leads to the following index sets:

I(L) :=

l ∈ Nd :

d∑i=1

δili ≤ L/d

where δi =αi + γi−βi

2∑dj=1(αi + γi−βi

2).

(4.3.1)

Note that αi + γi−βi2

> 0 by our assumptions in Section 4.2. This means that

the index set grows in dimension i inversely proportional to αi + γi−βi2

, i.e. faster forhigh rates of variance decay, low rates of complexity increase and low rates of biasdecay. The latter is interesting to note: Apart from the edge cases where the differ-

ence operator has no subtrahend, a high expectation in∣∣∣E[∆Ql]

∣∣∣ implies a stronger

correction of the remaining bias from previous levels. This is an effect that is notdirectly considered in the MLMC approach; there, the bias only considered withinthe stopping criterion. The MLMC parameter strategy is chosen solely on the valuesof β and γ, as long as α fulfills the prerequisite α ≥ 1

2minβ, γ.

This is showcased in Figure 4.1 for the case of two parameters S and N for whichwe used the hypothetical values of αS = 0.9, βS = 1.1, αN = 0.7, and βN = 1 whichresemble those that we obtain in Chapter 5. The optimal MLMC parameter strategyin terms of variance decay favors increasing S over increasing N . The optimal indexset I(L) actually grows faster in the N direction because of the lower bias decay.

34

Figure 4.1: Comparison of an optimal MLMC parameter strategy and the optimalMIMC index set for the same problem. Each arrow represents one difference betweendifferent single level estimators that is performed during the multilevel or multi-indexestimator evaluation. Differently styled arrows represent the different levels L. Thesize of the MIMC index set was doubled for better visibility.

4.4 Complexity theorem

We will cite the main result of Haji-Ali et al. [2015] in the terminology used before,similar to the formulation found in Giles [2015].

Theorem 2 (Complexity Theorem - MIMC - Optimal Index Sets).Let Q denote a random variable, and let Ql denote the corresponding level l numericalapproximation. Further let the assumptions i) - iv) of Section 4.2 on Ql and ∆Ql befulfilled.

If there exist independent estimators Yl based on Gl Monte Carlo samples suchthat E[Yl] = E[∆Ql] then there exists a positive constant c4 such that for any ε < e−1

there is an index set I and integers Gl for which the multilevel estimator

Y =∑

l∈I(L)

Yl


MSE ≡ E[(Y − E[Q])2

]< ε2


E[C] ≤

c4ε−2, η < 0,

c4ε−2| log ε|e1 , η = 0,

c4ε−2−η| log ε|e2 , η > 0,

where

η = maxi

γi − βiαi

.

35

For the values of e1 and e2, Haji-Ali et al. [2015] differ between four cases A)to D) depending on the dimensionality d and the values of α, β, and γ. The caserelevant to us is case B) when βi < 2αi for all i ∈ 1, . . . , d. In this case,

e1 = 2d2, e2 = (d2 − 1)(2 + η)

where

d2 = #i ∈ 1, . . . , d :γi − βiαi

= η

For the other cases we refer to Haji-Ali et al. [2015].

We can observe right away that the condition E[Yl] = E[∆Ql] is fulfilled also byour antithetic estimator due to the linearity of expectation. The remaining assump-tions need to be checked for each practical problem in particular. In our empiricalexample in Chapter 5 we analyze the convergence behavior of the mixed differenceestimator and the corresponding values for the α and β vectors. As in Section 3.5and Section 3.6 we vary between different constellations of the available parameters.The detailed results are given in Chapter 5 but in summary we found:

1. In the one-dimensional case (only S) we reproduce the MLMC results as ex-pected.

2. In the case with S and N , where the MLMC approach seemed slightly morecostly than regular MC, the MIMC approach achieved slightly better results.(η = 0.)

3. In the case with S, N , and H, we observed an improvement over MLMC, whichcould be somewhat expected due the number of parameters. (η > 0.)

4. In the S and P case, where MLMC already achieved the optimum ε−2 cost,MIMC did not perform any better. (η < 0.)

5. Finally in the full four parameter case, while we did not have reliable MLMCresults to compare to, with MIMC we observed on average an order of costbelow ε−3 which seems very good, although the results are not fully conclusive.(η > 0.)

4.5 The MIMC algorithm

As in the MLMC case, Theorem 2 leaves some practical issues to be filled in heuris-tically. Again, we fix a minimum number G of samples per index l to obtain initialvariance estimates. As convergence criterion Haji-Ali et al. [2015] propose to approx-imate the remaining bias by

Bias(I(L)) :=

∣∣∣∣∣∣∑

l∈∂I(L)

1

Gl

Gl∑g=1

∆Qgl

∣∣∣∣∣∣ (4.5.1)

36

where ∂I(L) is the upper boundary of I(L)1. For the index sets I(L) we use Equa-tion 4.3.1 which gives us a sequence (∅ =: I(−1) ⊂ I(0) ⊂ I(1) ⊂ . . . ). Thealgorithm then goes as follows:

1. Start with L = 0.

2. For each l ∈ I(L) \ I(L− 1), generate G samples of ∆Ql.

3. Calculate the sample variance of ∆Ql for each l ∈ I(L).

4. Calculate Gl for each l ∈ I(L) according to Equation 4.1.3. Note that Equa-tion 4.1.3 includes variances of all l hence also previously calculated Gl need tobe updated.

5. Calculate additional samples for ∆Ql to match the number of Gl required sam-ples for each l ∈ I(L).

6. Estimate the remaining bias according to Equation 4.5.1 and check for conver-gence, i.e. if Bias(I(L)) < 1√

2ε. Note that for convergence in terms of MSE

(which we pursue) the boundary is different than the one given in Haji-Ali et al.[2015] who are constructing a confidence interval around the result value.

7. If the convergence criterion has not been met, increase L by one and repeatfrom 2.

8. Otherwise calculate Y from all samples generated so far.

1Haji-Ali et al. [2015] write the “outer boundary” but we want to emphasize that the lowerboundaries along the axes S = 0, N = 0, H = 0, or P = 0 are excluded.

37

Chapter 5

Numerical Example

5.1 Model portfolio assumptions

In this chapter we will apply the concepts defined above to a very simple exampleportfolio. As has been pointed out before, the actual portfolio structure and the prop-erties of the pricing functions Xp : Θ 7→ R play a critical role in the converge ratesof the estimators. As such, the results obtained for the example portfolio can notbe generalized to every other portfolio. Rather they should be considered exemplaryvalues that demonstrate how the different approaches can be applied in practice. Apossible exception being the Monte Carlo error of the pricing algorithm, which can bemodeled quite adequately in a very simple manner, and the corresponding parametersN and H.

We will again assume a portfolio of P = 212 positions which all require Monte Carlosimulation of an SDE for pricing. We will assume that for each position p ∈ 1, . . . ,Pthe result of the pricing routine obeys the following law:

Xθp ∼ N

(Xθp

(1 +

b

(Hp)α

),Xθpσ

2

Np

)This is a natural assumption because the normal law with variance of the order

1/N is approximated by any Monte Carlo estimator due to the central limit theorem.We are simplifying by assuming that bias and variance scale along constants b andσ2 regardless of p. We set these parameters to b = 0.1 and σ = 0.01. The choice ofthese parameters affects our results insofar as it determines how S0 and N0 need to bebalanced in the MLMC approach so that one error does not dominate the other. Forthe bias we additionally assume α = 1 which is realistic for most practical problems,see e.g. Higham [2001] or Kloeden et al. [2012].

A more critical assumption is that of the form of Xp itself. For the sake ofsimplicity we assume that Θ = R, so that θ is a single scalar risk factor. In real worldapplications θ would usually consist of hundreds of risk factors, each of which mightitself be a high-dimensional object like a point-wise representation of a volatilitysurface or a yield curve. By modeling just one scalar risk factor one might risk

38

overestimating the dependency on θ among the different position values Xθp . To

make up for this, we will assume that the dependency on θ is normally distributedwith mean zero. Thus many positions will react very little to θ and the number ofpositions that react positively is similar to the number that react negatively, whichwould be expected in a well hedged portfolio. Specifically, we will assume that

Xθp = Xp(1 + ρpθ),

i.e. that the position values depend linearly on θ and that

ρp ∼ N (0, 1) . (5.1.1)

We also experimented with E[ρp] 6= 0 which simulates a portfolio that is less wellhedged and found improved convergence rates. We are not following this venue anyfurther since it would be a very strong assumption. In practice one might very wellbenefit of the fact that position values depend on the market parameters in a moresystematic way.

For the initial value we simply assume it is distributed equally among positionsand normed to X = 100:

Xp =X

P≡ 100

PTo simulate the risk factor θ we used a normal distribution with a standard devi-

ation of σθ = 0.2:

θ ∼ N(1, σ2

θ

)≡ N (1, 0.04) .

In the following we will apply the approaches from Chapter 3 and Chapter 4 tothis problem. Each time the goal is to estimate VaR(0.9). We report the convergencerates of the multilevel estimators (α and β or α and β respectively), followed by theprogression of the total computational cost C as an order of the allowed RMSE ε,first for the MLMC approach and then for MIMC. The corresponding figures can befound in Appendix A.

5.2 MLMC - Convergence rates


Here we assume b = 0 and σ = 0. Since we are not varying the parameter P yet, wealso set P = 1 and ρ1 = −0.5, so the average loss of the portfolio until time horizonis half of its initial value. Since the only parameter is S and Sl = S02

l we have γ = 1.

Our empirical experiments (see Figure A.1) suggest α = 1, β = 1.3, which roughlycoincides with what we might expect from Equation 2.3.2.

39


Again we set b = 0, P = 1, and ρ = −0.5 but now we assume σ = 0.01.

When varying the parameters S and N individually, we get α = 0.9 and β = 1.1for S (Figure A.2a) and we get α = 0.7 and β = 1.0 for N (Figure A.2b). Each timeγ = 1 in these cases. These are only indicative figures because we need to vary all

available parameters to have QL−→ Q which is a prerequisite for any approach.

1. Using the parameter strategy S1N1, i.e. setting Sl = S02l and Nl = S02

l, weget α = 0.9 and β = 1.3 (Figure A.3a). This is only a slight improvement inconvergence speed over the individual parameter strategies while now γ = 2.

2. Interestingly we get about the same convergence when using S1N2, i.e. settingSl = S02

l and Nl = S022l: α = 0.9 and β = 1.3 (Figure A.3b) while γ = 3 now.

3. The most promising strategy, which coincides with the findings of Gordy andJuneja [2010], is S2N1. Here we observe α = 2.2 and β = 2.6 (Figure A.3c).Still this means β < γ since γ = 3.


We still hold P = 1 and ρ = −0.5 fixed but now we set b = 0.1.

First we again observe the mean and variance decay along the individual param-eters:

1. For S1 we get α = 0.9 and β = 1.2 (Figure A.4a).

2. For N1 we get α = 0.3 and β = 1.0 (Figure A.4b).

3. Finally for the new direction H1 we get α = 0.9 and β = 0.0 (Figure A.4c).Obviously we do not expect H to have a significant impact on the estimatorvariance.

For the actual converging strategies where all three parameters are increased, wehave now kept the ratio between S and N fixed to that of the optimal strategy S2N1of Subsection 5.2.2 and tried different proportions of H:

1. For S2N1H1 we get α = 0.9 and β = 2.7 (Figure A.5a) while γ = 4.

2. For S4N2H1 we get on average α ≈ 1 and β ≈ 5 (Figure A.5b) while γ = 7. Dueto memory limitations we had to reduce the number of levels and we observedvarying convergence rates.

3. For S2N1H2 we get α = 2.0 and β = 2.6 (Figure A.5c) while γ = 5.

40

In these higher dimensional problems one can observe the problem with the MLMCapproach that the values of γ become relatively large and it seemingly becomes lessand less likely to find strategies with β > γ. In addition, only strategy S2N1H2,which has the worst variance convergence, fulfills the requirement α ≥ 1

2minβ, γ.


Now we choose P = 212 and we let ρp be random as described in Equation 5.1.1.We begin by assuming analytical pricing algorithms, so b = 0 and σ = 0. The twoavailable parameters are now S and P .


1. For S1 we get on average α = 1.0 and β = 1.3 (Figure A.6a).

2. For P1 we get on average α = 0.6 and β = 1.1 (Figure A.6b).

We then try different converging parameters strategies along S and P :

1. For S2P1 we get on average α = 0.6 and β = 3.2 (Figure A.6c) while γ = 3.

2. For S1P1 we get on average α = 0.6 and β = 2.1 (Figure A.7a) while γ = 2.

3. For S1P2 we get on average α = 1.2 and β = 3.1 (Figure A.7b) while γ = 3.

4. For S1P3 we get on average α = 2.5 and β = 5.3 (Figure A.7c) while γ = 4.

In this random portfolio setup the α and β values depend significantly on theρ vector that was chosen (Equation 5.1.1) – or more precisely, the relation betweenthe ρ vector of the full portfolio and that of the chosen subportfolio –, and we onlyreport average values of a few experimental runs here, so they need to be taken witha grain of salt. In each run, the ρ vector and the subportfolios are only chosen once.That being said, the values are very promising, especially for strategy S1P3 wherewe observed the optimal case of β > γ in almost all runs. Strategy S1P3 is also theonly one to fulfill the α ≥ 1

2minβ, γ requirement of Theorem 1.

With the P parameter one has to keep in mind the restriction that it can nogrow infinitely like the other parameters. Once P = P there are no more positions tosample so the variance and bias due to the P parameter will not decrease any further.Hence the choice of strategy in a practical situation has to be made under considera-tion of the number of positions P and the final level L (which can be estimated fromexperimental runs).


We are going to skip the case without bias and go straight to the full four parametercase. As before we choose σ = 0.01, b = 0.1, and P = 212.

41


1. For S1 we get on average α = 1.0 and β = 1.1 (Figure A.8a).

2. For N1 we get on average α = 0.8 and β = 1.0 (Figure A.8b).

3. For H1 we get on average α = 0.9 and β = 0.0 (Figure A.8c).

4. For P1 we get on average α = 0.6 and β = 1.2 (Figure A.9a).

For the converging strategies where all four parameters are increased, we tried thefollowing:

1. For S1N1H1P3 we get on average α = 1.8 and β = 4.2 (Figure A.9b) whileγ = 6.

2. For S2N1H1P3 we get on average α = 1.6 and β = 4.6 (Figure A.9c) whileγ = 7.

3. For S2N1H2P2 we get on average α = 1.8 and β = 5.1 (Figure A.10a) whileγ = 7.

4. For S2N1H2P3 we get on average α = 1.2 and β = 4.4 (Figure A.10b) whileγ = 8.

5. For S4N2H1P3 we get on average α = 1.3 and β = 6.9 (Figure A.10c) whileγ = 10.

Due to memory limitations we were not able to test even higher exponents andwe also had to limit the maximum L during testing. Again the figures are averagesover a few runs because of the randomness introduced by the choice of subportfolios.Because of this and the lower number of levels, the results are not too reliable. Forinstance it seems unlikely that the convergence rates should decrease from S2N1H2P2to S2N1H2P3. However, it is quite clear that in this case with the full set of fourparameters, because of the high γ values, the MLMC approach is unlikely to achievethe β < γ case or even fulfill the α ≥ 1

2minβ, γ requirement. This is where MIMC

might be more suitable.

5.3 MLMC - Computational cost


In this case the choice of strategy is trivial because there is only the parameter S.Subsection 5.2.1 implies the β > γ case of Theorem 1 which means ε−2 order ofcomputational cost. Our empirical results show only slightly higher order ε−2.19 cost(see Figure A.11a).

42


Here we employ the S2N1 strategy which promises the highest efficiency according tothe results of Subsection 5.2.2. The convergence rates from Subsection 5.2.2 indicatethe β < γ case of Theorem 1. This implies worse cost behavior than the ideal ε−2

case but only slightly, with computational cost proportional to ε−2.2. In our empiricalresults it seems that the error does not quite follow the expected polynomial growth(see Figure A.11b). Depending on what range of ε we consider to measure the tangent,we retrieve an order of computational cost between ε−2 (considering the whole range)and ε−3.17 (asymptotically).


Subsection 5.2.3 did not paint a clear picture of which strategy would be most efficient.We choose S2N1H2 because it fulfills the requirement on α. The β and γ values implya ε−3.2 cost according to Theorem 1. In our empirical results we observed on averageε−3.9 although we had to reduce the maximum level due to memory limitations andthe results were not very stable (see Figure A.11c).


Here we choose strategy S1P3 because of the results in Subsection 5.2.4. These doin fact imply the β > γ case with ε−2 computational cost. Because of the random-ness involved in the subportfolio choice we again report average values of multipleruns. These empirical results give on average an order ε−1.8 computational cost (seeFigure A.11d).


Judging by the convergence rates in Subsection 5.2.5 one should not expect very goodresults in this case. We chose strategy S2N1H2P2 as a middle ground even thoughit does not fulfill the requirement of Theorem 1 on α. Our empirical results actuallysuggest very slowly increasing cost of, on average, order ε−1/2 (see Figure A.11e).However, due to memory limitations we only evaluated three different levels withrelatively large error tolerances so the results are not comprehensive. Already atthe highest error tolerance the total cost was about as high as in the S2N1H2 casewhere we also observed memory issues. Further numerical experiments with moresophisticated memory allocation would be necessary to assess this case.

5.4 MIMC - Convergence rates

In this section we analyze convergence rates of the mixed difference estimators ∆Ql

used in the MIMC approach. In contrast to the MLMC approach there is no need tofind the optimal strategy here. Instead, we only analyze the behavior of the estima-tor along the parameter axes. We use the same parameters as in the corresponding

43

subsections for MLMC.

Note that we do not discuss the analytical pricing case because in this case d = 1and MIMC is equivalent to MLMC, as can be seen empirically in Figure A.1.


1. For S we get αS = 0.9 and βS = 1.1 (Figure A.2a).

2. For N we get αN = 0.5 and βS = 1.0 (Figure A.2b).


For this case of the pricing function we again observe somewhat unstable results forN and H.

1. For S we get αS = 1.0 and βS = 1.2 (Figure A.4a).

2. For N we get αN ≈ 0.5 and βN = 1.0 (Figure A.4b).

3. For H we get αH ≈ 0.9 and βH = 0.0 (Figure A.4c).


For the position sampling tests we again report average values over multiple runs.

1. For S we get on average αS = 1.0 and βS = 1.2 (Figure A.6a).

2. For P we get on average αP = 0.7 and βP = 1.2 (Figure A.6b).


1. For S we get on average αS = 0.9 and βS = 1.0 (Figure A.8a).

2. For N we get on average αN = 0.7 and βN = 0.9 (Figure A.8b).

3. For H we get on average αH = 0.8 and βH = 0.0 (Figure A.8c).

4. For P we get on average αP = 0.8 and βP = 1.3 (Figure A.9a).

5.5 MIMC - Computational cost


In this case, because of βN = γN = 1, we have η = 0 in Theorem 2. Because β < 2αfor both parameters, we have case B) for the exponent e1 with d2 = 1 which meansthe expected computational cost is of order ε−2| log ε|2. In our empirical experimentswe got roughly ε−2.8 (see Figure A.12a). This is an improvement over the MLMCapproach where we had β < γ and is slightly better than regular MC.

44


Because of the βH value we have η ≈ 1 > 0. Again we have case B) for the exponente2 with d2 = 1 which gives an expected cost of order ε−3. Empirically we observe onaverage ε−3.5 (see Figure A.12b) again with some variability as described earlier forthis case.


In this case we have the optimal expected cost of order ε−2 because η < 0 as bothparameters fulfill β > γ. Our empirical results seem to confirm this with an averageprogression of ε−1.9 (see Figure A.12c).


Due to the βH value we again arrive in the η > 0 case with η ≈ 5/4. d2 = 1 and thuse2 = 0 so the expected cost is of order ε−2−5/4. Empirically we observed on averageε−2.7 (see Figure A.12d) although the results are again for a progression of only twoto three levels.

45

Chapter 6

Conclusion

We described how the multilevel Monte Carlo and the multi-index Monte Carlo ap-proaches can be applied to the computationally challenging problem of calculatingfull Monte Carlo portfolio Value-at-Risk. Specifically we investigated the situationwhere the portfolio valuation itself requires Monte Carlo methods which leads tonested simulation and consequently a high number of parameters that drive the over-all computational complexity. We identified four functionally different parameters:The number of market scenarios, the number of Monte Carlo pricing paths per po-sitions, the number of time steps used in the Monte Carlo pricing, and the numberof positions in the portfolio. Through numerical experiments we showed that bothapproaches can be used to reduce the computational cost when compared to regularMonte Carlo, although the exact results will vary from portfolio to portfolio. Specif-ically for the two-parameter model assumed in Gordy and Juneja [2010], both theMLMC and the MIMC approach were only slightly more efficient. Comparing thetwo approaches we found that the multi-index Monte Carlo approach proved superiorin our example models mostly when the number of relevant parameters controllingthe sample accuracy was high, i.e. at least three of the four parameters. This issomewhat expected considering the design of the multi-index estimator.

Further theoretical analysis as well as empirical experiments using real world port-folios would be needed to establish the validity of our results. In this work we de-scribed the problem setting, introduced a uniform notation for both approaches –MLMC and MIMC –, and defined estimators that fulfill the prerequisites of bothapproaches. In particular the sampling of subportfolios within the estimators as away of controlling the computational cost per sample is a new approach that has toour knowledge not been incorporated in VaR estimators before.

There are various areas for extension or for future research corresponding to thedifferent topics we touched upon. MLMC or MIMC has only been analyzed for rel-atively few problems of nested simulation. Others, like Bermudan option pricingor credit valuation adjustment calculations are potential open use cases. Insteadof focusing on Value-at-Risk and quantile estimation, other portfolio risk measurescould be analyzed, as has been done by Gordy and Juneja [2010] and Broadie et al.

46

[2011] in the context of conventional Monte Carlo. The idea of subportfolio samplingcould be extended by considering the heterogeneity of positions and sampling by theirmarginal utility or by using more sophisticated methods of rescaling the subportfolios.

In summary this work could be considered a small step in the direction towardspractical use of multilevel Monte Carlo for Value-at-Risk calculation. Although thereis still a lot of ground to cover, we identified some possible paths that seem promisingfor further investigation.

47

Appendix A

Empirical results

The following figures show convergence rates in terms of variance and expectation ofQl, ∆Ql, and ∆Ql. The x-axis always describes the level l.

Figure A.1: Analytical pricing.

48

Figure A.2: Monte Carlo pricing without bias: Individual parameters.

49

Figure A.3: Monte Carlo pricing without bias: Full strategies.

50

Figure A.4: Monte Carlo pricing with bias: Individual parameters.

51

Figure A.5: Monte Carlo pricing with bias: Full strategies.

52

Figure A.6: Position sampling with analytical pricing: Part I of II.

53

Figure A.7: Position sampling with analytical pricing: Part II of II.

54

Figure A.8: Position sampling with Monte Carlo pricing: Part I of III.

55

Figure A.9: Position sampling with Monte Carlo pricing: Part II of III.

56

Figure A.10: Position sampling with Monte Carlo pricing: Part III of III.

57

The following figures show the progression of the computational cost with decreas-ing error tolerance. The computational cost is reported according to the theoreticalvalues for Cl and Cl, so without consideration of implementation issues or overheads.

58

Figure A.11: MLMC strategies: Each plot represents one case of the assumed pricingfunctions and the single chosen strategy for that case. E.g. S2N1 is the strategythat was chosen for the case of Monte Carlo pricing without bias. For the two caseswith position sampling the highest RMSE values (to the left on the x-axis) have beenskipped because the absolute VaR value is much smaller here. The upper limits onthe RMSE axis are due to memory constraints.

59

Figure A.12: MIMC strategies: The indicated parameters again represent one case ofthe assumed pricing functions. E.g. SN represents the case of Monte Carlo pricingwithout bias.

60

Appendix B

Excursion: Direct quantile MLMC

Theorem 3 (Complexity Theorem - Quantile MLMC).Let X denote a random variable on some probability space, and let Xl denote thecorresponding level l numerical approximation. Let q ∈ (0, 1). Let Q denote theq-quantile of the distribution of X and Ql the q-quantile of the distribution of Xl.

If there exist independent estimators Yl based on Sl Monte Carlo samples of Xl,each with expected cost Cl and variance Vl, and positive constants α, β, γ, c1, c2, c3such that α ≥ 1

2minβ, γ and

i)∣∣∣E[Ql −Q]

∣∣∣ ≤ c12−αl

ii) E[Yl] =

E[Q0], l = 0,

E[Ql − Ql−1], l > 0

iii) Vl ≤ c22−βl

iv) Cl ≤ c32γl,

then there exists a positive constant c4 such that for any ε < e−1 there are values Land Gl for which the multilevel estimator

Y =L∑l=0

Yl


MSE ≡ E[(Y − E[Q])2

]< ε2


E[C] ≤

c4ε−2, β > γ,

c4ε−2(log ε)2, β = γ,

c4ε−2−(γ−β)/α, β < γ.

61

As estimator the canonical choice to trivially fulfill condition ii) would be

Yl :=

Q0, l = 0,

Ql − 1M

∑Mm=1 Q

ml−1, l > 0

where Ql is again the empirical quantile1 taken over Sl VaR scenario simulations ofthe approximative portfolio value Xl, calculated with the parameters Nl, Hl, and Pl.The coarse estimations Qm

l−1 are again calculated from the M := Sl/Sl−1 independentsubsets of all Sl samples.

1Ignoring the bias of the empirical quantile that we mentioned in Section 2.3.

62

Acronyms

VaR Value-at-Risk. 3–5, 7MC Monte Carlo. 3LSMC least squares Monte Carlo. 3MLMC multilevel Monte Carlo. 4SDE stochastic differential equation. 4, 12SPDE stochastic partial differential equation. 4MIMC multi-index Monte Carlo. 5MSE mean square error. 15RMSE root mean square error. 15

63

References

Claudio Albanese, Toufik Bellaj, Guillaume Gimonet, and Giacomo Pietronero. Co-herent global market simulations and securitization measures for counterpartycredit risk. Quantitative Finance, 11(1):1–20, 2011.

Claudio Albanese, Damiano Brigo, and Frank Oertel. Restructuring counterpartycredit risk. International Journal of Theoretical and Applied Finance, 16(02):1350010, 2013.

Anna Rita Bacinello, Pietro Millossovich, Annamaria Olivieri, and Ermanno Pitacco.Variable annuities: A unifying valuation approach. Insurance: Mathematics andEconomics, 49(3):285–297, 2011.

Daniel Bauer, Daniela Bergmann, and Andreas Reuss. Solvency ii and nestedsimulations–a least-squares monte carlo approach. In Proceedings of the 2010 ICAcongress, 2010.

Mark Broadie, Yiping Du, and Ciamac C Moallemi. Efficient risk estimation vianested sequential simulation. Management Science, 57(6):1172–1194, 2011.

Karolina Bujok, BM Hambly, and Christoph Reisinger. Multilevel simulation of func-tionals of bernoulli random variables with application to basket credit derivatives.Methodology and Computing in Applied Probability, 17(3):579–604, 2015.

Michael B Giles. Multilevel Monte Carlo path simulation. Operations Research, 56(3):607–617, 2008a.

Michael B Giles. Improved multilevel Monte Carlo convergence using the Milsteinscheme. In Monte Carlo and quasi-Monte Carlo methods 2006, pages 343–358.Springer, 2008b.

Michael B Giles. Multilevel Monte Carlo methods. Acta Numerica, 24:259–328, 2015.

Paul Glasserman, Philip Heidelberger, and Perwez Shahabuddin. Variance reductiontechniques for estimating value-at-risk. Management Science, 46(10):1349–1364,2000.

Peter W Glynn. Importance sampling for Monte Carlo estimation of quantiles. InMathematical Methods in Stochastic Simulation and Experimental Design: Proceed-ings of the 2nd St. Petersburg Workshop on Simulation, pages 180–185, 1996.

64

Michael B Gordy and Sandeep Juneja. Nested simulation in portfolio risk measure-ment. Management Science, 56(10):1833–1848, 2010.

Abdul-Lateef Haji-Ali. Pedestrian Flow in the Mean Field Limit. PhD thesis, KingAbdullah University of Science and Technology (KAUST), 2012.

Abdul-Lateef Haji-Ali, Fabio Nobile, and Raul Tempone. Multi-index monte carlo:when sparsity meets sampling. Numerische Mathematik, pages 1–40, 2015.

Desmond J Higham. An algorithmic introduction to numerical simulation of stochas-tic differential equations. SIAM review, 43(3):525–546, 2001.

Rob J Hyndman and Yanan Fan. Sample quantiles in statistical packages. TheAmerican Statistician, 50(4):361–365, 1996.

Philippe Jorion. Value at Risk: The new benchmark for managing financial risk,volume 3. McGraw-Hill New York, 2007.

J.P.Morgan. Riskmetrics technical document. J.P.Morgan, fourth edition, 1996.

Peter Eris Kloeden, Eckhard Platen, and Henri Schurz. Numerical solution of SDEthrough computer experiments. Springer Science & Business Media, 2012.

Ralf Korn, Elke Korn, and Gerald Kroisandt. Monte Carlo methods and models infinance and insurance. CRC press, 2010.

Christopher Lester, Christian Adam Yates, Michael B Giles, and Ruth E Baker. Anadaptive multi-level simulation algorithm for stochastic biological systems. TheJournal of chemical physics, 142(2):024113, 2015.

Francis A Longstaff and Eduardo S Schwartz. Valuing american options by simulation:a simple least-squares approach. Review of Financial studies, 14(1):113–147, 2001.

Alexander J McNeil, Rudiger Frey, and Paul Embrechts. Quantitative risk manage-ment: Concepts, techniques and tools. Princeton university press, 2015.

Siddhartha Mishra, Ch Schwab, and Jonas Sukys. Multi-level monte carlo finitevolume methods for nonlinear systems of conservation laws in multi-dimensions.Journal of Computational Physics, 231(8):3365–3388, 2012.

Andrzej Okolewski and Tomasz Rychlik. Sharp distribution-free bounds on the biasin estimating quantiles via order statistics. Statistics & probability letters, 52(2):207–213, 2001.

65

multilevel and multi-index monte carlo for portfolio value-at-risk · x portfolio value at time...

Documents