enhancements subset simulation

ceh

bDepartment of Building and Construction, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong KongcDepartment of Civil and Environmental Engineering, Th

a r t i c l e i n f o

Article history:Received 26 July 2011Accepted 20 October 2011Available online 21 November 2011

Keywords:Reliability engineeringStochastic simulation methodsMarkov chain Monte Carlo

usually expressed as an integral over a high-dimensional uncertainparameter space:

pF Z

IFhphdh EpIFh; 1

unacceptable); and IF(h) stands for the indicator function, i.e.IF(h) = 1 if h 2 F and IF(h) = 0 if h R F. The dimension d is typicallylarge for dynamic reliability problems (e.g. d 103) because the sto-chastic input time history is discretized in time. As a result, theusual numerical quadrature methods for integrals are not computa-tionally feasible for evaluation (1).

Over the past decade, the engineering research community hasrealized the importance of advanced stochastic simulation meth-ods for reliability analysis. As a result, many different efcientalgorithms have been developed recently, e.g. Subset Simulation[2], Importance Sampling using Elementary Events [3], Line

Corresponding author.E-mail addresses: [email protected] (K.M. Zuev), [email protected] (J.L.

Computers and Structures 9293 (2012) 283296

Contents lists available at

Computers an

lseBeck), [email protected] (S.-K. Au), [email protected] (L.S. Katafygiotis).ferent reliability problems: a linear reliability problem and reliability analysis of an elasto-plastic struc-ture subjected to strong seismic ground motion. The relationship between the original SS and SS+ is alsodiscussed.

2011 Elsevier Ltd. All rights reserved.

1. Introduction

One of the most important and challenging problems in reliabil-ity engineering is to estimate the failure probability pF for a system,that is, the probability of unacceptable system performance. This is

where h 2 Rd represents the uncertain parameters needed to specifycompletely the excitation and dynamic model of the system; p(h) isthe joint probability density function (PDF) for h; F Rd is the fail-ure domain in the parameter space (i.e. the set of parameter valuesthat lead to performance of the system that is considered to beSubset SimulationBayesian approach0045-7949/$ - see front matter 2011 Elsevier Ltd. Adoi:10.1016/j.compstruc.2011.10.017e Hong Kong University of Science and Technology, Hong Kong, China

a b s t r a c t

Estimation of small failure probabilities is one of the most important and challenging computationalproblems in reliability engineering. The failure probability is usually given by an integral over a high-dimensional uncertain parameter space that is difcult to evaluate numerically. This paper focuses onenhancements to Subset Simulation (SS), proposed by Au and Beck, which provides an efcient algorithmbased on MCMC (Markov chain Monte Carlo) simulation for computing small failure probabilities for gen-eral high-dimensional reliability problems. First, we analyze the Modied Metropolis algorithm (MMA),an MCMC technique, which is used in SS for sampling from high-dimensional conditional distributions.The efciency and accuracy of SS directly depends on the ergodic properties of the Markov chains gen-erated by MMA, which control how fast the chain explores the parameter space. We present some obser-vations on the optimal scaling of MMA for efcient exploration, and develop an optimal scaling strategyfor this algorithm when it is employed within SS. Next, we provide a theoretical basis for the optimalvalue of the conditional failure probability p0, an important parameter one has to choose when usingSS. We demonstrate that choosing any p0 2 [0.1,0.3] will give similar efciency as the optimal value ofp0. Finally, a Bayesian post-processor SS+ for the original SS method is developed where the uncertainfailure probability that one is estimating is modeled as a stochastic variable whose possible values belongto the unit interval. Simulated samples from SS are viewed as informative data relevant to the systemsreliability. Instead of a single real number as an estimate, SS+ produces the posterior PDF of the failureprobability, which takes into account both prior information and the information in the sampled data.This PDF quanties the uncertainty in the value of the failure probability and it may be further used inrisk analyses to incorporate this uncertainty. To demonstrate SS+, we consider its application to two dif-Konstantin M. Zuev , James L. Beck , Siu-Kui Au , Lambros S. KatafygiotisaDivision of Engineering and Applied Science, California Institute of Technology, Mail Code 104-44, Pasadena, CA 91125, USABayesian post-processor and other enhanfor estimating failure probabilities in hig

a, a b

journal homepage: www.ell rights reserved.ments of Subset Simulationdimensions

c

SciVerse ScienceDirect

d Structures

vier .com/locate /compstruc

search to develop more efcient stochastic simulation algorithmsfor estimating small failure probabilities in high-dimensions.

StrSampling [28], auxiliary domain method [25], Spherical SubsetSimulation [24], Horseracing Simulation [44], to name but a few.

This paper focuses on enhancements to Subset Simulation (SS),proposed by Au and Beck [2], which provides an efcient algorithmfor computing failure probabilities for general high-dimensionalreliability problems. It has been shown theoretically [2] and veri-ed with different numerical examples (e.g. [4,5,38,39]) that SSgives much higher computational efciency than standard MonteCarlo Simulation when estimating small failure probabilities.Recently, various modications of SS have been proposed: SS withsplitting [9], Hybrid SS [10], and Two-Stage SS [23]. It is importantto highlight, however, that none of these modications offer adrastic improvement over the original algorithm.

We start with the analysis of the Modied Metropolis algorithm(MMA), a Markov chain Monte Carlo technique used in SS, which ispresented in Section 2. The efciency and accuracy of SS directlydepends on the ergodic properties of the Markov chains generatedby MMA. In Section 3, we examine the optimal scaling of MMA totune the parameters of the algorithm to make the resulting Markovchain converge to stationarity as fast as possible. We present a col-lection of observations on the optimal scaling of MMA for differentnumerical examples, and develop an optimal scaling strategy forMMA when it is employed within SS for estimating small failureprobabilities.

One of the most important components of SS which affects itsefciency is the choice of the sequence of intermediate thresholdvalues or, equivalently, the intermediate failure probabilities (seeSection 2, where the original SS algorithm is described). In Section4, a method for optimally choosing these probabilities is presented.

The usual interpretation of Monte Carlo methods is consistentwith a purely frequentist approach, meaning that they can be inter-preted in terms of the frequentist denition of probability whichidenties it with the long-run relative frequency of occurrence ofan event. An alternative interpretation can be made based on theBayesian approach which views probability as a measure of theplausibility of a proposition conditional on incomplete informationthat does not allow us to establish the truth or falsehood of theproposition with certainty. Bayesian probability theory was, infact, primarily developed by the mathematician and astronomerLaplace [29,30] for statistical analysis of astronomical observa-tions. Moreover, Laplace developed the well-known Bayes theo-rem in full generality, while Bayes did it only for a special case[6]. A complete development based on Laplaces theory, withnumerous examples of its applications, was given by the mathe-matician and geophysicist Jeffreys [22] in the early 20th century.Despite its usefulness in applications, the work of Laplace andJeffreys on probability was rejected in favor of the frequentistapproach by most statisticians until late last century. Because ofthe absence of a strong rationale behind the theory at that time,it was perceived as subjective and not rigorous by many statisti-cians. A rigorous logic foundation for the Bayesian approach wasgiven in the seminal work of the physicist Cox [11,12] and ex-pounded by the physicist Jaynes [20,21], enhancing Bayesian prob-ability theory as a convenient mathematical language for inferenceand uncertainty quantication. Although the Bayesian approachusually leads to high-dimensional integrals that often cannot beevaluated analytically nor numerically by straightforward quadra-ture, the development of Markov chain Monte Carlo algorithmsand increasing computing power have led over the past few dec-ades to an explosive growth of Bayesian papers in all researchdisciplines.

In Section 5 of this paper, a Bayesian post-processor for the ori-ginal Subset Simulation method is developed, where the uncertain

284 K.M. Zuev et al. / Computers andfailure probability that one is estimating is modeled as a stochasticvariable whose possible values belong to the unit interval.The basic idea of Subset Simulation [2] is the following: repre-sent a very small failure probability pF as a product of larger prob-abilities so pF

Qmj1pj, where the factors pj are estimated

sequentially, pj p^j, to obtain an estimate p^SSF for pF as p^SSF Qmj1p^j. To reach this goal, let us consider a decreasing sequence

of nested subsets of the parameter space, starting from the entirespace and shrinking to the failure domain F:

Rd F0 F1 Fm1 Fm F: 4

Subsets F1, . . . ,Fm1 are called intermediate failure domains. As a re-sult, the failure probability pF = P(F) can be rewritten as a productof conditional probabilities:

Ym YmAlthough this failure probability is a constant dened by the inte-gral in (1), its exact value is unknown because the integral cannotbe evaluated; instead, we must infer its value from available rele-vant information. Instead of a single real number as an estimate,the post-processor, written as SS+ (SS-plus) for short, producesthe posterior PDF of the failure probability, which takes into ac-count both relevant prior information and the information fromthe samples generated by SS. This PDF expresses the relative plau-sibility of each possible value of the failure probability based onthis information. Since this PDF quanties the uncertainty in thevalue of pF, it can be fully used in risk analyses (e.g. for life-cyclecost analysis, decision making under risk, etc.), or it can be usedto give a point estimate such as the most probable value basedon the available information.

2. Subset Simulation

2.1. Basic idea of Subset Simulation

The original and best known stochastic simulation algorithm forestimating high-dimensional integrals is Monte Carlo Simulation(MCS). In this method the failure probability pF is estimated byapproximating the mean of IF(h) in (1) by its sample mean:

pF p^MCF 1N

XNi1

IFhi; 2

where samples h(1), . . . ,h(N) are independent and identically distrib-uted (i.i.d.) samples from p(), denoted hi i:i:d:p. This estimate isjust the fraction of samples that produce system failure accordingto a model of the system dynamics. Notice that each evaluation ofIF requires a deterministic system analysis to be performed to checkwhether the sample implies failure. The main advantage of MCS isthat its efciency does not depend on the dimension d of the param-eter space. Indeed, straightforward calculation shows that the coef-cient of variation (c.o.v.) of the Monte Carlo estimate (2), servingas a measure of accuracy in the usual interpretation of MCS, is givenby:

d p^MCF 1 pF

NpF

s: 3

However, MCS has a serious drawback: it is inefcient in estimatingsmall failure probabilities. If pF is very small, pF 1, then it followsfrom (3) that the number of samples N (or, equivalently, number ofsystem analyses) needed to achieve an acceptable level of accuracyis very large, N / 1/pF 1. This deciency of MCS has motivated re-

uctures 9293 (2012) 283296pF j1

PFjjFj1 j1

pj; 5

Monte Carlo technique at the expense of generating dependentsamples.

Markov chain Monte Carlo (MCMC) [31,34,35] is a class of algo-

Modied Metropolis algorithm [2]

Input:

. h1 2 F, initial state of a Markov chain;

. N, total number of states, i.e. samples;

. p1(), . . . ,pd(), marginal PDFs of h1, . . . ,hd, respectively;

. S1(ja), . . . ,Sd(ja), univariate proposal PDFs depending on aparameter a 2 R and satisfying the symmetry propertySk(bja) = Sk(ajb), k = 1, . . . ,d.

Algorithm:

for i = 1, . . . ,N 1 do% Generate a candidate state n:for k = 1, . . . ,d do

Sample ~nk Sk jhik

Compute the acceptance ratio

r pk~nk 8

that pj -erated b

Re acritical r eMetrop

Structures 9293 (2012) 283296 285rithms for sampling frommulti-dimensional target probability dis-tributions that cannot be directly sampled, at least not efciently.These methods are based on constructing a Markov chain that hasthe distribution of interest as its stationary distribution. By simu-lating samples from the Markov chain, they will eventually bedraws from the target probability distribution but they will notbe independent samples. In Subset Simulation, the ModiedMetropolis algorithm (MMA) [2], an MCMC technique based onthe original Metropolis algorithm [33,18], is used for samplingfrom the conditional distributions p(jFj1).

Remark 1. It was observed in [2] that the original Metropolisalgorithm does not work in high-dimensional conditional proba-bility spaces, because it produces a Markov chain with very highlycorrelated states. The geometrical reasons for this are discussed in[26].

2.2. Modied Metropolis algorithm

Suppose we want to generate a Markov chain with stationarydistribution

phjF phIFhPF

Qdk1pkhkIFh

PF ; 7

where F Rd is a subset of the parameter space. Without signi-cant loss of generality, we assume here that ph Qdk1pkhk,i.e. components of h are independent (but are not so when condi-tioned on F). MMA differs from the original Metropolis algorithmin the way the candidate state n = (n1, . . . ,nd) is generated. Insteadof using a d-variate proposal PDF on Rd to directly obtain the candi-date state, in MMA a sequence of univariate proposal PDFs is used.Namely, each coordinate nk of the candidate state is generated sep-arately using a univariate proposal distribution dependent on thecoordinate hk of the current state. Then a check is made whetherthe d-variate candidate generated in such a way belongs to the sub-set F in which case it is accepted as the next Markov chain state;where pj = P(FjjFj1) is the conditional probability at the (j 1)thconditional level. Clearly, by choosing the intermediate failure do-mains appropriately, all conditional probabilities pj can be madelarge. Furthermore, they can be estimated, in principle, by the frac-tion of independent conditional samples that cause failure at theintermediate level:

pj p^MCj 1N

XNi1

IFj hij1

; hij1

i:i:d:pjFj1: 6

Hence, the original problem (estimation of the small failure proba-bility pF) is replaced by a sequence of m intermediate problems(estimation of the larger failure probabilities pj, j = 1, . . . ,m).

The rst probability p1 = P(F1jF0) = P(F1) is straightforward toestimate by MCS, since (6) requires sampling from p() that is as-sumed to be readily sampled. However, if jP 2, to estimate pjusing (6) one needs to generate independent samples from condi-tional distribution p(jFj1), which, in general, is not a trivial task. Itis not efcient to use MCS for this purpose, especially at higher lev-els, but it can be done by a specically tailored Markov chain

K.M. Zuev et al. / Computers andotherwise it is rejected and the current MCMC sample is repeated.To summarize, the Modied Metropolis algorithm proceeds asfollows:ole. If Sk does not satisfy this property, by replacing tholis ratio in (8) by the MetropolisHastings ratiomark 2. The symmetry property Sk(bja) = Sk(ajb) does not playF is the stationary distribution for the Markov chain geny MMA is given in the appendix.Check whether n 2 F by system analysis and accept orreject n by setting

hi1 n; if n 2 F;hi; if n R F:

10

end forOutput:

I h(1), . . . ,h(N), N states of a Markov chain with stationarydistribution pjF.

Schematically, the Modied Metropolis algorithm is shown inFig. 1. For completeness and the readers convenience, the proofpk hik

Accept or reject ~nk by setting

nk ~nk; with probability minf1; rg;hik with probability 1minf1; rg:

(9

end forFig. 1. Modied Metropolis algorithm.

olisHastings algorithm.

sp 1.U r-ta -st equ inex ere eex

F Th das

Fj w bjde d,th alcases it -vance, s i-

StrThus, if we run the Markov chain for sufciently long (the burn-in period), starting from essentially any seed h1 2 F, then forlarge N the distribution of h(N) will be approximately pjF. Note,however, that in any practical application it is very difcult tocheck whether the Markov chain has reached its stationary distri-bution. If the seed h1 pjF, then all states h(i) will be automat-ically distributed according to the target distribution, hi pjF,since it is the stationary distribution for the Markov chain. This iscalled perfect sampling [35] and Subset Simulation has this prop-erty because of the way the seeds are chosen [2].

Let us assume now that we are given a seed h1j1 pjFj1,where j = 2, . . . ,m. Then, using MMA, we can generate a Markovchain with N states starting from this seed and construct an esti-mate for pj similar to (6), where MCS samples are replaced byMCMC samples:

pj p^MCMCj 1N

XNi1

IFj hij1

; hij1 pjFj1: 12

Note that all samples h1j1; . . . ; hNj1 in (12) are identically distributed

in the stationary state of the Markov chain, but are not independent.Nevertheless, these MCMC samples can be used for statistical aver-aging as if they were i.i.d., although with some reduction in ef-ciency [14]. Namely, the more correlated h1j1; . . . ; h

Nj1 are, the less

efcient is the estimate (12). The correlation between successivesamples is due to proposal PDFs Sk, which govern the generationof the next state of the Markov chain from the current one inMMA. Hence, the choice of proposal PDFs Sk controls the efciencyof estimate (12), making this choice very important. It was observedin [2] that the efciency of MMA depends on the spread of proposaldistributions, rather then on their type. Both small and largespreads tend to increase the dependence between successive sam-ples, slowing the convergence of the estimator. Large spreads mayreduce the acceptance rate in (10), increasing the number of re-peated MCMC samples. Small spreads, on the contrary, may leadto a reasonably high acceptance rate, but still produce very corre-lated samples due to their close proximity. To nd the optimalspread of proposal distributions for MMA is a non-trivial task whichis discussed in Section 3.

Remark 3. In [45] another modication of the Metropolis algo-rithm, called Modied MetropolisHastings algorithm withDelayed Rejection (MMHDR), has been proposed. The key ideabehind MMHDR is to reduce the correlation between states of theMarkov chain. A way to achieve this goal is the following:whenever a candidate n is rejected in (10), instead of getting arepeated sample h(i+1) = h(i), as the case of MMA, a new candidate ~nis generated using a new set of proposal PDFs eSk. Of course, theacceptance ratios (8) for the second candidate have to be adjustedin order to keep the target distribution stationary. In general,MMHDR generates less correlated samples than MMA but it iscomputationally more expensive.

2.3. Subset Simulation algorithm

Subset Simulation uses the estimates (6) for p1 and (12) for pj,jP 2, to obtain the estimate for the failure probability:r pk~nkSk hik j~nk

pk hik

Sk ~nkjhik ; 11

we obtain an MCMC algorithm referred to as the Modied Metrop-

286 K.M. Zuev et al. / Computers andpF p^SSF p^MC1Ymj2

p^MCMCj 13mated conditional probabilities are equal to a xed valuep0 2 (0,1). We will refer to p0 as the conditional failure probability.

Subset Simulation algorithm [2]

Input:

. p0, conditional failure probability;

. N, number of samples per conditional level.Algorithm:

Set j = 0, number of conditional levelSet NF(j) = 0, number of failure samples at level j

Sample h10 ; . . . ; hN0

i:i:d:pfor i = 1, . . . ,N do

if gi g hi0

> b do

NF(j) NF(j) + 1end if

end forwhile NF(j)/N < p0 do

j j + 1Sort fgig : gi1 6 gi2 6 6 giNDene

bj giNNp0 giNNp01

216

for k = 1, . . . ,Np0 do

Starting from h1;kj hiNNp0kj1 pjFj, generate 1/p0

states of a Markov chain h1;kj ; . . . ; h1=p0;kj pjFj,

using MMA.end for

Renumber: hi;kjn oNp0 ;1=p0

k1;i1# h1j ; . . . ; h

Nj pjFj

for i = 1, . . . ,N do

if gi ghij > b doNF(j) NF(j) + 1

end ifend for

end whileOutput:

I p^SSF , estimate of pF:

p^SS pj NFj 17Fis difcult to make a rational choice of the bj-values in ado the bj are chosen adaptively (see (16)) so that the estThe remaining ingredient of Subset Simulation that we haveecify is the choice of intermediate failure domains F1, . . . ,Fmsually, performance of a dynamical system is described by a cein positive-valued performance function g : Rd ! R, for inance, g(h) may represent some peak (maximum) responsantity when the system model is subjected to the uncertacitation h. Then the failure region, i.e. unacceptable performancgion, can be dened as a set of excitations that lead to thceedance of some prescribed critical threshold b:

fh 2 Rd : gh > bg: 14e sequence of intermediate failure domains can then be dene

fh 2 Rd : gh > bjg; 15here 0 < b1 < < bm1 < bm = b. Intermediate threshold valuesne the values of the conditional probabilities pj = P(FjjFj1) anerefore, affect the efciency of Subset Simulation. In practicto

uctures 9293 (2012) 2832960 N

StruSchematically, the Subset Simulation algorithm is shown inFig. 2.

The adaptive choice of bj-values in (16) guarantees, rst, that allseeds h1;kj are distributed according to p(jFj) and, second, that theestimated conditional probability P(FjjFj1) is equal to p0. Here, forconvenience, p0 is assumed to be chosen such that Np0 and 1/p0 arepositive integers, although this is not strictly necessary. In [2] it issuggested to use p0 = 0.1. The optimal choice of conditional failureprobability is discussed in the next section.

Fig. 2. Subset Simulation algorithm: disks and circles represent samples fromp(jFj1) and p(jFj), respectively; circled disks are the Markov chain seeds for p(jFj).K.M. Zuev et al. / Computers andRemark 4. Subset Simulation provides an efcient stochasticsimulation algorithm for computing failure probabilities for gen-eral reliability problems without using any specic informationabout the dynamic system other than an inputoutput model. Thisindependence of a systems inherent properties makes SubsetSimulation potentially useful for applications in different areas ofscience and engineering where the notion of failure has its ownspecic meaning, e.g. in Computational Finance to estimate theprobability that a stock price will drop below a given thresholdwithin a given period of time, in Computational Biology to estimatethe probability of gene mutation, etc.

3. Tuning of the Modied Metropolis algorithm

The efciency and accuracy of Subset Simulation directly de-pends on the ergodic properties of the Markov chain generatedby the Modied Metropolis algorithm; in other words, on how fastthe chain explores the parameter space and converges to its sta-tionary distribution. The latter is determined by the choice ofone-dimensional proposal distributions Sk, which makes thischoice very important. In spite of this, the choice of proposal PDFsis still largely an art. It was observed in [2] that the efciency ofMMA is not sensitive to the type of the proposal PDFs; however,it depends on their spread (e.g. their variance).

Optimal scaling refers to the need to tune the parameters of thealgorithm to make the resulting Markov chain converge to sta-tionarity as fast as possible. The issue of optimal scaling was recog-nized in the original paper by Metropolis et al. [33]. Gelman et al.[17] were the rst authors to obtain theoretical results on the opti-mal scaling of the original Metropolis algorithm. They proved thatfor optimal sampling from a high-dimensional Gaussian distribu-tion, the Metropolis algorithm should be tuned to accept approxi-mately 23% of the proposed moves only. Since then many papershave been published on optimal scaling of the original Metropolisalgorithm. In this section, in the spirit of [17], we address the fol-lowing question which is of high practical importance: what isthe optimal variance of the univariate Gaussian proposal PDFs forsimulating a high-dimensional Gaussian distribution conditionalon some specic domain using MMA?

This section is organized as follows: in Section 3.1 we recall theoriginal Metropolis algorithm and provide a brief overview ofexisting results on its optimal scaling; in Section 3.2 we presenta collection of observations on the optimal scaling of the ModiedMetropolis algorithm for different numerical examples, and dis-cuss the optimal scaling strategy for MMA when it is employedwithin Subset Simulation for estimating small failure probabilities.

3.1. Metropolis algorithm: a brief history of its optimal scaling

The Metropolis algorithm is the most popular class of MCMCalgorithms. Let p be the target PDF on Rd; hn be the current stateof the Markov chain; and S(jh(n)) be a symmetric (i.e. S(ajb) =S(bja)) d-variate proposal PDF depending on h(n). Then the Metrop-olis update h(n)? h(n+1) of the Markov chain works as follows: rst,simulate a candidate state n S(jh(n)); next, compute the accep-tance probability a(njh(n)) = min{1,p(n)/p(h(n))}; and, nally, acceptn as a next state of the Markov chain, i.e. set h(n+1) = n, with proba-bility a(njh(n)) or reject n by setting h(n+1) = h(n) with the remainingprobability 1 a(njh(n)). It easy to prove that such updating leavesp invariant, i.e. if h(n) is distributed according to p, then so is h(n+1).Hence the chain will eventually converge to p as its stationary dis-tribution. Practically it means that if we run the Markov chain for along time, starting from any h1 2 Rd, then for large N the distribu-tion of h(N) will be approximately p.

The variance r2 of the proposal PDF S turns out to have a signif-icant impact on the speed of convergence of the Markov chain to itsstationary distribution. Indeed, if r2 is small, then the Markovchain explores its state space very slowly. On the other hand, ifr2 is large, the probability to accept a new candidate state is verylow and this results in a chain remaining still for long periods oftime. Since the Metropolis algorithm with extremal values of thevariance of the proposal PDF produce a chain that explores its statespace slowly, it is natural to expect the existence of an optimal r2

for which the convergence speed is maximized.The importance of optimal scaling was already realized in the

landmark paper [33] where the Metropolis algorithm was rstintroduced. Metropolis et al. developed an algorithm for generatingsamples from the Boltzmann distribution for solving numericalproblems in statistical mechanics. In this work the uniform pro-posal PDF was used, Snjhn Uhna;hnan, and it was noted:

It may be mentioned in this connection that the maximum dis-placement a must be chosen with some care; if too large, mostmoves will be forbidden, and if too small, the conguration willnot change enough. In either case it will then take longer tocome to equilibrium.

In [18] Hastings generalized the Metropolis algorithm. Namely,he showed that the proposal distribution need not be uniform, andit need not to be symmetric. In the latter case, the acceptance prob-

ability must be slightly modied: anjhn min 1; pnShn jnphnSnjhn

n o.

The corresponding algorithm is called the MetropolisHastings

ctures 9293 (2012) 283296 287algorithm. Furthermore, Hastings emphasized that the originalsampling method has a general nature and can be applied in differ-ent circumstances (not only in the framework of statistical

Let us rst dene what we mean by optimal variance. Let hi;kj1be the ith sample in the kth Markov chain at simulation level j 1.

Strtance rate, i.e. the average number of accepted candidate states,is between 30% and 70%. The rationale behind such rule is thattoo low an acceptance rate means that the Markov chain has manyrepeated samples, while too high an acceptance rate indicates thatthe chain moves very slowly. Although qualitatively correct, theserules suffered from the lack of theoretical justication for the lowerand upper bounds for the acceptance rate. The rst theoretical re-sult on the optimal scaling of the Metropolis algorithm was ob-tained by Gelman et al. [17]. It was proved that in order toperform optimally in high dimensional spaces, the algorithmshould be tuned to accept as small as 23% of the proposed moves.This came as an unexpected and counter-intuitive result. Indeed,this states that the Markov chain should stay still about 77% ofthe time in order to have the fastest convergence speed. Let us for-mulate the main result more precisely.

Suppose that all components of h 2 Rd are i.i.d., i.e. the targetdistribution p(h) has the product form, ph Qdi1f hi, wherethe one-dimensional density f satises certain regularity condi-tions (namely, f is a C2-function and (logf)0 is Lipschitz continuous).Then the optimal random walk Gaussian proposal PDF Snjhn N njhn;r2Id has the following properties:

1. The optimal standard deviation is r 2:4=dIp

, where I Ef log f 02

R11

f 0 x2f x dx measures the roughness of f. The

smoother the density is, the smaller I is, and, therefore, the lar-ger r is. In particular, for a one-dimensional case (d = 1) andstandard Gaussian f(I = 1): r 2.4 (a surprisingly high value!).

2. The acceptance rate of the corresponding Metropolis algorithmis approximately 44% for d = 1 and declines to 23% when d?1.Moreover, the asymptotic optimality of accepting 23% of pro-posed moves is approximately true for dimension as low asd = 6.

This result gives rise to the following useful heuristic strategy,that is easy to implement: tune the proposal variance so that theaverage acceptance rate is roughly 25%. In spite of the i.i.d.assumption for the target components, this result is believed tobe robust and to hold under various perturbations of the target dis-tribution. Being aware of practical difculties of choosing the opti-mal r2, Gelman et al. provided a very useful observation:

Interestingly, if one cannot be optimal, it seems better to usetoo high a value of r than too low.

Remark 5. This observation is consistent with the numerical resultobtained recently in [45]: an increased variance of the second stageproposal PDFs improves the performance of theMMHDR algorithm.

Since the pioneering work [17], the problem of optimal scalinghas attracted the attention of many researchers and optimal scal-ing results have been derived for other types of MCMC algorithms.For instance, the Metropolis-adapted Langevin algorithm (MALA)was studied in [36] and it was proved that the asymptotically opti-mechanics) and that Markov chain theory (which is absent in [33])is a natural language for the algorithm. Among other insights, Has-tings made the following useful yet difcult to implementrecommendation:

Choose a proposal distribution so that the sample point in onestep may move as large a distance as possible in the sample space,consistent with a low rejection rate.

Historically, the tuning of the proposals variance was usuallyperformed by trial-and-error, typically using rules of thumb ofthe following form: select r2 such that the corresponding accep-

288 K.M. Zuev et al. / Computers andmal acceptance rate for MALA is approximately 57%. For a more de-tailed overview of existing results on the optimal scaling of theMetropolis algorithm see [8] and references cited therein.The conditional probability pj = P(FjjFj1) is then estimated asfollows:

pj p^j 1N

XNck1

XNsi1

IFj hi;kj1

; hi;kj1 pjFj1; 18

where Nc is the number of Markov chains and Ns is the total numberof samples simulated from each of these chains, Ns = N/Nc, so thatthe total number of Markov chain samples is N. An expression forthe coefcient of variation (c.o.v.) of p^j, derived in [2], is given by:

dj 1 pjNpj

1 cjs

; 19

where

cj 2XNs1i1

1 iNs

RijR0j

; 20

and

Rij E IFj h1;kj1

IFj h1i;kj1

h i p2j 21

is the autocovariance of the stationary stochastic process Xi IFj h

i;kj1

at lag i. The term

1 pj=Npj

qin (19) is the c.o.v. of the

MCS estimator with N independent samples. The c.o.v. of p^j can thusbe considered as the one in MCS with an effective number of inde-pendent samples N/(1 + cj). The efciency of the estimator usingdependent MCMC samples (cj > 0) is therefore reduced comparedto the case when the samples are independent (cj = 0). Hence, cj gi-ven by (20) can be considered as a measure of correlation betweenthe states of a Markov chain and smaller values of cj imply higherefciency.

Remark 6. Formula (19) was derived assuming that the Markovchain generated according to MMA is ergodic and that the samplesgenerated by different chains are uncorrelated through the indi-cator function, i.e. EIFj hIFj h0 p2j 0 if h and h0 are fromdifferent chains. The latter, however, may not be always true, sincethe seeds for each chain may be dependent. Nevertheless, theexpression in (19) provides a useful theoretical description of thec.o.v. of p^j.

Remark 7. The autocovariance sequence Rij ; i 0; . . . ;Ns 1,needed for calculation of cj, can be estimated using the Markovchain samples at the (j 1)th level by:

Rij 1

N iNcXNck1

XNsii01

IFj hi0 ;kj1

IFj hi

0i;kj1 p^2j : 22

Note that in general, cj depends on the number of samples Ns inthe Markov chain, the conditional probability pj, the intermediatefailure domains Fj1 and Fj, and the standard deviation rj of the3.2. Optimal scaling of the Modied Metropolis algorithm

In this section we address two questions: what is the optimalvariance r2 of the univariate Gaussian proposal PDFs Skjl N jl;r2; k 1; . . . ; d for simulating a high-dimensional condi-tional Gaussian distribution pjF N j0; IdIF=PF using theModied Metropolis algorithm and what is the optimal scalingstrategy for Modied Metropolis when it is employed within Sub-set Simulation for estimating small failure probabilities?

uctures 9293 (2012) 283296proposal PDFs Skjl N jl;r2j

. According to the basic

description of the Subset Simulation algorithm given in Section 2,

pj = p0 for all j and Ns = 1/p0. The latter, as it has been already men-tioned, is not strictly necessary, yet convenient. In this section, thevalue p0 is chosen to be 0.1, as in the original paper [2]. In this set-ting, cj depends only on the standard deviation rj and the geome-try of Fj1 and Fj. For a given reliability problem (i.e. for a givenperformance function g that denes domains Fj for all j), roptj is saidto be the optimal spread of the proposal PDFs at level j, if it mini-mizes the value of cj:

roptj argminrj>0 cjrj: 23

We will refer to cj = cj(rj) as c-efciency of the Modied Metropolis

algorithm with proposal PDFs N jl;r2j

at level j.

Consider two examples of the sequence of intermediate failuredomains.

Example 1 (Exterior of a ball). Let h re 2 Rd, where e is a unitvector and r = khk. For many reasonable performance functions g, ifr is large enough, then h 2 F fh 2 Rd : gh > bg, i.e. h is a failurepoint, regardless of e. Therefore, an exterior of a ball, Br fh 2 Rd : khkP rg, can serve as an idealized model of many failuredomains. Dene the intermediate failure domains as follows:

Fj Brj ; 24where the radii rj are chosen such that P(FjjFj1) = p0, i.e.2 1 j

T d

Fj fh 2 Rd : hh; eaiP bjg; 25where ea akak is the unit normal to the hyperplane specied by g,and the values of bj are chosen such that P(FjjFj1) = p0, i.e.bj U1 1 pj0

, whereU denotes the CDF of the standard normal

distribution. The dimension d is chosen to be 103.For both examples, cj as a function of rj is plotted in Fig. 3 and

the approximate values of the optimal spread roptj are given inTable 1 for simulation levels j = 1, . . . ,6. As expected, the optimalspread roptj decreases when j increases, and based on the numericalvalues in Table 1, roptj seems to converge to approximately 0.3 and0.4 in Example 1 and 2, respectively. The following properties ofthe function cj = cj(rj) are worth mentioning:

(i) cj increases very rapidly, when rj goes to zero;(ii) cj has a deep trough around the optimal value roptj , when j is

large (e.g., jP 4).

Interestingly, these observations are consistent with the state-ment given in [17] and cited above: if one cannot be optimal(due to (ii), it is indeed difcult to achieve optimality), it is betterto use too high a value of rj than too low.

The question of interest now is what gain in efciency can weachieve for a proper scaling of the Modied Metropolis algorithmwhen calculating small failure probabilities? We consider the fol-lowing values of failure probability: pF = 10k,k = 2, . . . ,6. Thec.o.v. of the failure probability estimates obtained by Subset Simu-lation are given in Figs. 4 and 5 for Examples 1 and 2, respectively.

Table 1Approximate values of the optimal spread for different simulation levels.

Simulation level j 1 2 3 4 5 6

opt 0.9 0.7 0.4 0.3 0.3 0.3

K.M. Zuev et al. / Computers and Structures 9293 (2012) 283296 289performance function g(h) = a h + b, where a 2 R and b 2 R arexed coefcients. The corresponding intermediate failure domainsFj are half-spaces dened as follows:

0 2 4 6 8 100

20

40

60

j=1

0 2 4 6 8 100

20

40j=2

0 2 4 6 8 100

20

40j=3

BallLinear

BallLinear

BallLinearrj Fv2d

1 p0 , where Fv2d denotes the cumulative distributionfunction (CDF) of the chi-square distribution with d degrees of free-dom. The dimension d is chosen to be 103.

Example 2 (Linear case). Consider a linear reliability problem with

Fig. 3. The c-efciency of the Modied Metropolis algorithm0 2 4 6 8 100

10

20

30j=4

0 2 4 6 8 100

10

20

30j=5

0 2 4 6 8 100

20

40

60j=6

BallLinear

BallLinear

BallLinear

Example 1, rjExample 2, roptj 1.1 0.8 0.6 0.4 0.4 0.4

as a function of spread r for simulation levels j = 1, . . . ,6.

Str2 2.5 3 3.5 4 4.5 5 5.5 60

0.2

0.4

0.6

0.8

1

1.2

1.4

log pF

2 2.5 3 3.5 4 4.5 5 5.5 61

1.2

1.4

log pF

1/

opt

1, N=1000

opt, N=1000

1, N=300

opt, N=300

N=1000N=300

290 K.M. Zuev et al. / Computers andThe dashed (solid) curves correspond to the case when N = 300(N = 1000) samples are used per each intermediate failure region.For estimation of each value of the failure probability, two differentMMAs are used within SS: the optimal algorithm with rj roptj(marked with stars); and the reference algorithm with rj = 1(marked with squares). The corresponding c.o.v.s are denoted bydopt and d1, respectively. From Figs. 4 and 5 it follows that the smal-ler pF, the more important to scale MMA optimally. WhenpF = 106, the optimal c.o.v. dopt is approximately 80% of the refer-ence c.o.v. d1 for both examples, when N = 1000.

Despite its obvious usefulness, the optimal scaling of the Mod-ied Metropolis algorithm is difcult to achieve in practice. First, asshown in Table 1, the values of the optimal spread roptj are differentfor different reliability problems. Second, even for a given reliabil-ity problem, to nd roptj is computationally expensive because ofproperty (ii) of cj; and our simulation results show that the quali-tative properties (i) and (ii) generally hold for different reliabilityproblems, not only for Examples 1 and 2. Therefore, we look forheuristic to choose rj that is easy to implement and yet gives nearoptimal behavior.

It has been recognized for a long time that, when using anMCMC algorithm, it is useful to monitor its acceptance rate. Both

2 2.5 3 3.5 4 4.5 5 5.5 60

0.2

0.4

0.6

0.8

1

log pF

2 2.5 3 3.5 4 4.5 5 5.5 61

1.2

1.4

log pF

1/

opt

N=1000N=300

1, N=1000

opt, N=1000

1, N=300

opt, N=300

Fig. 5. The c.o.v. of pF estimates obtained by Subset Simulation for Example 2.

Fig. 4. The c.o.v. of pF estimates obtained by Subset Simulation for Example 1.c-efciency cj and the acceptance rate qj at level j depend on rj.For Examples 1 and 2, the approximate values of the acceptancerate that corresponds to the reference value rj = 1 and the optimalspread rj roptj are given in Table 2; cj as a function of qj is plottedin Fig. 6, for simulation levels j = 1, . . . ,6. A key observation is that,contrary to (ii), cj is very at around the optimal acceptance rateqoptj , which is dened as the acceptance rate that corresponds to

the optimal spread, i.e. qoptj qj roptj

. Furthermore, according to

our simulation results this behavior is typical, and not specic justfor the considered examples. This observation gives rise to the fol-lowing heuristic scaling strategy:

At simulation level jP 1 select rj such that the correspondingacceptance rate qj is between 30% and 50%.

This strategy is easy to implement in the context of Subset Sim-ulation. At each simulation level j, Nc Markov chains are generated.Suppose, we do not know the optimal spread roptj for our problem.We start with a reference value, say r1:nj 1, for the rst n chains.Based only on these n chains, we calculate the correspondingacceptance rate q1:nj . If q1:nj is too low (i.e. it is smaller than 30%)we decrease the spread and use rn1:2nj < r1:nj for the next n chains.If q1:nj is too large (i.e. it is larger than 50%) we increase the spreadand use rn1:2nj > r1:nj for the next n chains. We proceed like thisuntil all Nc Markov chains have been generated. Note that accord-ing to this procedure, rj is kept constant within a single chain andit is changed only between chains. Hence the Markovian propertyis not destroyed. The described strategy guaranties that the corre-sponding scaling on the Modied Metropolis algorithm is nearlyoptimal.

4. Optimal choice of conditional failure probability p0

The parameter p0 governs how many intermediate failure do-mains Fj are needed to reach the target failure domain F, whichin turn affects the efciency of Subset Simulation. A very small va-lue of the conditional failure probability means that fewer inter-mediate levels are needed to reach F but it results in a very largenumber of samples N needed at each level for accurate estimationof the small conditional probabilities pj = P(FjjFj1). In the extreme

Table 2Approximate values of the acceptance rates qj(1) and qoptj that correspond to thereference value rj = 1 and the optimal spread rj roptj , respectively.

Simulation level j

1 2 3 4 5 6

Example 1, qj(1) 49% 30% 20% 13% 9% 6%Example 1, qoptj 51% 39% 47% 50% 44% 40%

Example 2, qj(1) 53% 35% 23% 16% 11% 8%Example 2, qoptj 52% 41% 40% 49% 43% 37%

uctures 9293 (2012) 283296case when p0 6 pF, Subset Simulation reduces to the standardMonte Carlo simulation. On the other hand, increasing the valueof p0 will mean fewer samples are needed for accurate estimationat each level but it will increase the number of intermediate condi-tional levelsm. In this section we provide a theoretical basis for theoptimal value of the conditional failure probability.

We wish to choose the value p0 such that the coefcient of var-iation (c.o.v.) of the failure probability estimator p^SSF is as small aspossible, for the same total number of samples. In [2], an analysisof the statistical properties of the Subset Simulation estimator isgiven. If the conditional failure domains are chosen so that the cor-responding estimates of the conditional probabilities are all equalto p0 and the same number of samples N is used in the simulationat each conditional level, then the c.o.v. of the estimator p^SSF for a

0.4 0.6 0.8 10

20

40

60j=1

0.2 0.4 0.6 0.8 10

20

40j=2

40j=3

Ball

BallLinear

BallLinear

s a f

K.M. Zuev et al. / Computers and Strufailure probability pF pm0 (requiring m conditional levels) isapproximated by

d2 m1 p0Np0

1 c; 26

0 0.2 0.4 0.6 0.8 10

20

Acceptance Rate

Linear

Fig. 6. The c-efciency of the Modied Metropolis algorithm awhere c is the average correlation factor over all levels (assumed tobe insensitive to p0) that reects the correlation among the MCMCsamples in each level and depends on the choice of the spread of theproposal PDFs. Since the total number of samples NT =mN and thenumber of conditional levels m = logpF/logp0, (26) can be rewrittenas follows:

d2 1 p0p0log p02

logpF2

NT1 c: 27

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p0

=0

=2

=4

=6

=8

=10

Fig. 7. Variation of d as a function of p0 according to (27) for pF = 103, NT = 2000,and c 0;2;4;6;8;10.Note that for given target failure probability pF and the total numberof samples NT, the second factor in (27) does not depend on p0. Thus,minimizing the rst factor to minimize the c.o.v. d yields the opti-mal value as popt0 0:2. Fig. 7 shows the variation of d as a functionof p0 according to (27) for pF = 103, NT = 2000, and c 0;2;4;6;

0 0.2 0.4 0.6 0.8 10

10

20

30j=4

0 0.2 0.4 0.6 0.8 10

10

20

30j=5

0 0.2 0.4 0.6 0.8 10

20

40

60j=6

Acceptane Rate

BallLinear

BallLinear

BallLinear

unction of the acceptance rate for simulation levels j = 1, . . . ,6.

ctures 9293 (2012) 283296 2918;10. This gure indicates that d is relatively insensitive to p0around its optimal value. Note that the shape of the trend is invari-ant with respect to pF, NT and c because their effects are multiplica-tive. The gure shows that choosing 0.1 6 p0 6 0.3 will practicallylead to similar efciency and it is not necessary to ne tune the va-lue of the conditional failure probability p0 as long as Subset Simu-lation is implemented properly.

5. Bayesian post-processor for Subset Simulation

In this section we develop a Bayesian post-processor SS+ for theoriginal Subset Simulation algorithm described in Section 2, thatprovides more information about the value of pF than a single pointestimate.

Recall that in SS the failure probability pF is represented as aproduct of conditional probabilities pj = P(FjjFj1), each of which isestimated using (6) for j = 1 and (12) for j = 2, . . . ,m. Let nj denotethe number of samples h1j1; . . . ; h

Nj1 that belong to subset Fj. The

estimate for probability pj is then:

p^j njN 28

and the estimate for the failure probability dened by (1) is:

p^SSF Ymj1

p^j Ymj1

njN: 29

In order to construct a Bayesian post-processor for SS, we have toreplace the frequentist estimates (28) in (29) by their Bayesian ana-logs. In other words, we have to treat all p1, . . . ,pm and pF as stochas-tic variables and, following the Bayesian approach, proceed asfollows:

Strities for similar systems. If the amount of data is large (i.e. N islarge), however, then the effect of the prior PDF on the posteriorPDF will be negligible if the likelihood function has a unique globalmaximum. This phenomenon is usually referred to in the literatureas the stability or robustness of Bayesian estimation.

Since initial samples h10 ; . . . ; hN0 are i.i.d. according to p, the se-

quence of zeros and ones, IF1 h10

; . . . ; IF1 h

N0

, can be considered

as Bernoulli trials and, therefore, the likelihood function pD0jp1 isa binomial distribution where D0 consists of the number of F1-fail-ure samples n1

PNk1IF1 h

k0

and the total number of samples is

N. Hence, the posterior distribution of p1 is the beta distributionBen1 1;N n1 1 (e.g. [16]) with parameters (n1 + 1) and(N n1 + 1), i.e.

pp1jD0 pn11 1 p1Nn1

Bn1 1;N n1 1 ; 31

which is actually the original Bayes result [6]. The beta function Bin (31) is a normalizing constant. If jP 2, all MCMC samplesh1j1; . . . ; h

Nj1 are distributed according to p(jFj1), however, they

are not independent. Nevertheless, analogously to the frequentistcase, where we used these samples for statistical averaging (12)as if they were i.i.d., we can use an expression similar to (31) as agood approximation for the posterior PDF ppjjDj1 for jP 2, so:

ppjjDj1 pnjj 1 pjNnj

Bnj 1;N nj 1 ; jP 1; 32

where nj PN

k1IFj hkj1

is the number of Fj-failure samples. Note

1 Nperhaps based on previous experience with the failure probabil-1. Specify prior PDFs p(pj) for all pj = P(FjjFj1), j = 1, . . . ,m;2. Update each prior PDF, using new data Dj1 fh1j1; . . . ; hNj1 pjFj1g, i.e. nd the posterior PDFs ppjjDj1 via Bayestheorem;

3. Obtain the posterior PDF ppF j[m1j0 Dj of pF Qm

j1pj frompp1jD0; . . . ; ppmjDm1.

Remark 8. The term stochastic variable is used rather then ran-dom variable to emphasize that it is a variable whose value isuncertain, not random, based on the limited information that wehave available, and for which a probability model is chosen todescribe the relative plausibility of each of its possible values[7,21]. The failure probability pF is a constant given by (1) whichlies in [0,1] but its exact value is unknown because the integralcannot be evaluated exactly, and so we quantify the plausibilityof its values based on the samples that probe the performancefunction g.

To choose the prior distribution for each pj, we use the Principleof Maximum Entropy (PME), introduced by Jaynes [19]. The PMEpostulates that, subject to specied constraints, the prior PDF pwhich should be taken to represent the prior state of knowledgeis the one that gives the largest measure of uncertainty, i.e. maxi-mizes Shannons entropy which for a continuous variable is givenby Hp R11 px logpxdx. Since the set of all possible valuesfor each stochastic variable pj is the unit interval, we impose thisas the only constraint for p(pj), i.e. suppp(pj) = [0,1]. It is wellknown that the uniform distribution is the maximum entropy dis-tribution among all continuous distributions on [0,1], so

ppj 1; 0 6 pj 6 1: 30

Remark 9. We could choose a more informative prior PDF,

292 K.M. Zuev et al. / Computers andthat in Subset Simulation the MCMC samples hj1; . . . ; hj1 consistof the states of multiple Markov chains with different initial seedsobtained from previous conditional levels. This makes the approxi-mation (32) more accurate in comparison with the case of a singlechain.

Remark 10. It is important to highlight that using the r.h.s of (32)as the posterior PDF ppjjDj1 is equivalent to considering samplesh1j1; . . . ; h

Nj1 as independent. This probability model ignores infor-

mation that the samples are generated by an MCMC algorithm, justas it ignores the fact that the random number generator used togenerate these samples is, in fact, a completely deterministicprocedure. One corollary of this neglected information is thatgenerally dppF < dp^SSF , where dppF is the c.o.v. of the posterior PDFof pF and dp^SSF is the c.o.v. of the original SS estimator p^

SSF (see also

the numerical examples in Section 6). Notice, however, that thetwo c.o.v.s are fundamentally different: dppF is dened based onsamples generated from a single run of SS, while the frequentistc.o.v. dp^SSF is dened based on repeated runs of SS (an innitenumber of them!).

The last step is to nd the PDF of the product of stochastic vari-ables pF

Qmj1pj, given the distributions of all factors pj by (32).

Remark 11. Products of random (or stochastic) variables play acentral role in many different elds such as physics (interactiveparticle systems), number theory (asymptotic properties of arith-metical functions), statistics (asymptotic distributions of orderstatistics), etc. The theory of products of random variables is wellcovered in [15].

In general, to nd the distribution of a product of stochasticvariables is a non-trivial task. A well-known result is Rohatgis for-mula [37]: if X1 and X2 are continuous stochastic variables withjoint PDF fX1 ;X2 , then the PDF of Y = X1X2 is

fYy Z 11

fX1 ;X2 x;yx

1jxjdx: 33

This result is straightforward to derive but it is difcult to imple-ment, especially when the number of stochastic variables is morethan two. In the special case of a product of independent beta vari-ables, Tang and Gupta [40] derived an exact representation for thePDF and provided a recursive formula for computing the coefcientsof this representation.

Theorem 1 [40]. Let X1, . . . ,Xm be independent beta variables,Xj Beaj; bj, and Y = X1X2 Xm, then the probability densityfunction of Y can be written as follows:

fYy Ymj1

Caj bjCaj

!yam11 y

Pmj1

bj1

X1r0

rmr 1 yr ; 0 < y < 1; 34

where C is the gamma function and coefcients rmr are dened by thefollowing recurrence relation:

rkr CPk1j1 bj rCPk

j1bj r Xr

s0

ak bk ak1ss!

rk1rs ;

r 0;1; . . . ; k 2; . . . ;m 35with initial values

r10 1

Cb1 ; r1r 0 for r P 1: 36

Here, for any real number a 2 R; as aa 1 a s 1 CasCa .

uctures 9293 (2012) 283296We can obtain the posterior PDF of stochastic variable pF byapplying this theorem to pF

Qmj1pj, where pj Benj 1;N

nj 1.

E~pSS p2F EpSS p2F Eppj jDj1 p2j j jN 2N 3 :

p is accurately approximated by the original estimate pF of the

StruLet pSS pF j[m1j0 Dj

denote the right-hand side of (34) withaj = nj + 1 and bj = N nj + 1, and p^SSMAP be the maximum a posteriori(MAP) estimate, i.e. the mode of pSS+:

p^SSMAP arg maxpF20;1 pSS pF j[m1j0 Dj

: 37

Since the mode of a product of independent stochastic variables isequal to the product of the modes, and the mode of the beta vari-able X Bea; b is (a 1)/(a + b 2), we have:

p^SSMAP mode pSS pF j[m1j0 Dj

Ymj1

mode ppjjDj1

Ymj1

njN p^SSF : 38

Thus, the original estimate p^SSF of failure probability obtained in theoriginal Subset Simulation algorithm is just the MAP estimate p^SSMAPcorresponding to the PDF pSS+. This is a consequence of the choice ofa uniform prior.

Although (34) provides an exact expression for the PDF of Y, itcontains an innite sum that must be replaced by a truncated nitesum in actual computations. This means that in applications onehas to use an approximation of the posterior PDF pSS+ based on(34). An alternative approach is to approximate the distributionof the product Y Qmj1Xj by a single beta variable eY Bea; b,where the parameters a and b are chosen so that EeY EY andEeYk is as close to EYk as possible for 2 6 k 6 K, for some xedK. This idea was rst proposed in [42]. In general, the product ofbeta variables does not follow the beta distribution, nevertheless,it was shown in [13] that the product can be accurately approxi-mated by a beta variable even in the case of K = 2.

Theorem 2 [13]. Let X1, . . . ,Xm be independent beta variables,Xj Beaj; bj, and Y = X1X2 Xm, then Y is approximately distrib-uted as eY Bea; b, wherea l1

l1 l2l2 l21

; b 1 l1l1 l2l2 l21

; 39

and

l1 EY Ymj1

ajaj bj ; l2 EY

2 Ymj1

ajaj 1aj bjaj bj 1 :

40It is easy to check that if eY Bea; bwith a and b given by (39),

then the rst two moments of stochastic variables Y and eY coin-cide, i.e. EeY EY and EeY 2 EY2. The accuracy of the approx-imation Y _Bea; b is discussed in [13].

Using Theorem 2, we can therefore approximate the posteriordistribution pSS+ of stochastic variable pF by the beta distributionas follows:

pSSpF j[m1j0 Dj ~pSS pF j[m1j0 Dj

BepF ja; b; i:e: pF _Bea; b; 41where

a Qm

j1nj1N2 1

Qmj1

nj2N3

Qm

j1nj2N3

Qmj1

nj1N2

; b 1Qmj1 nj1N2 1Qmj1 nj2N3 Qm

j1nj2N3

Qmj1

nj1N2

:

42Since the rst two moments of pSS+ and ~pSS are equal (this also

SS+ SS

K.M. Zuev et al. / Computers andmeans the c.o.v. of p and ~p are equal), we have:failure probability pF.Let us now summarize the Bayesian post-processor of Subset

Simulation. From the algorithmic point of view, SS+ differs fromSS only in the produced output. Instead of a single real numberas an estimate of pF, SS+ produces the posterior PDF

pSS pF j[m1j0 Dj

of the failure probability, which takes into account

both prior information and the sampled data [m1j0 Dj generated bySS, while its approximation ~pSS pF j[m1j0 Dj

is more convenient for

further computations. The posterior PDF pSS+ and its approximation~pSS are given by (34) and (41), respectively, where

nj p0N; if j < m;NF ; if j m;

45

and m is the total number of intermediate levels in the run of thealgorithm. The relationship between SS and SS+ is given by (38)and (44). Namely, the original estimate p^SSF of the failure probabilitybased on the samples produced by the Subset Simulation algorithmcoincides with the MAP estimate corresponding to pSS+, and it alsoaccurately approximates the mean of ~pSS. Also, the c.o.v. ofpSS+

and ~pSS coincide and can be computed using the rst two momentsin (43).

Remark 12. Note that to incorporate the uncertainty in the valueof pF, one can use the full PDF ~pSS for life-cycle cost analyses,decision making under risk, and so on, rather than just using apoint estimate of pF. For instance, a performance loss function Loften depends on the failure probability. In this case one cancalculate an expected loss given by the following integral:

ELpF Z 10LpF~pSSpFdpF ; 46

which takes into account the uncertainty in the value of the failureprobability.

Remark 13. We note that a Bayesian post-processor for MonteCarlo evaluation of the integral (1), denoted as MC+, can beobtained as a special case of SS+. The posterior PDF for the failureprobability pF based on N i.i.d. samples D fh1; . . . ; hNg from p()is given by

pMCpF jD BepF jn 1;N n 1; 47where n PNk1IFhk is the number of failure samples.6. Illustrative examples

To demonstrate the Bayesian post-processor of Subset Simula-j1 j1

43Notice that

limN!1

E~pSS pF limN!1

p^SSF ; and E~pSS pF p^SSF ;when N is large44

so, the mean value of the approximation ~pSS to the posterior PDFSS+ ^SSE~pSS pF EpSS pF Ymj1

Eppj jDj1pj Ymj1

nj 1N 2 ;

Ym h i Ym n 1n 2

ctures 9293 (2012) 283296 293tion, we consider its application to two different reliability prob-lems: a linear reliability problem and reliability analysis of anelasto-plastic structure subjected to strong seismic ground motion.

6.1. Linear reliability problem

As a rst example, consider a linear failure domain. Let d = 103

be the dimension of the linear problem and suppose pF = 103 isthe exact failure probability. The failure domain F is dened as

F fh 2 Rd : hh; eiP bg; 48where e is a unit vector and b =U1(1 pF) 3.09 is the reliabilityindex. This example is one where FORM [32,27] gives the exact fail-ure probability in terms of b. Note that h = be is the design point ofthe failure domain F [26,32]. The failure probability estimate p^SSF ob-tained by SS and the approximation of the posterior PDF ~pSS ob-tained by SS+ are given in Fig. 8 based on a number of samplesN = 103 at each level (m = 3 levels were needed). Observe that ~pSS

is quite narrowly focused (with the mean l~pSS 1:064 103 andthe c.o.v. d~pSS 0:16) around p^SSF 1:057 103, which is very closeto the exact value. Note that the frequentist c.o.v. of the original SSestimator p^SSF is dp^SSF 0:28 (based on 50 independent runs of thealgorithm).

Dt = 0.03 s is the sampling time, Nt = 1001 is the number of timeinstants (which corresponds to a duration of 30 s), and Z1; . . . ; ZNtare i.i.d. standard Gaussian variables. The white noise sequence isthen modulated (multiplied) by an envelope function e(t;M,r) atthe discrete time instants. The discrete Fourier transform is thenapplied to the modulated white-noise sequence. The resultingspectrum is multiplied with a radiation spectrum A(f;M,r) [1], afterwhich the discrete inverse Fourier transform is applied to trans-form the sequence back to time domain to yield a sample for theground acceleration time history. The synthetic ground motiona(t;Z,M,r) generated from the model is thus a function of theGaussian vector Z Z1; . . . ; ZNt T and stochastic excitation modelparameters M and r. Here, M = 7 and r = 50 km are used. For moredetails about the ground motion sampling, refer to [1].

In this example, the uncertainty arises from seismic excitationsand the uncertain parameters h = Z, the i.i.d. Gaussian sequenceZ1; . . . ; ZNt that generates the synthetic ground motion. The systemresponse of interest, g(h), is dened to be the peak (absolute) inter-story drift ratio dmax = maxi=1,. . .,6di, where di is the maximum abso-lute interstory drift ratio of the ith story within the duration of

294 K.M. Zuev et al. / Computers and Structures 9293 (2012) 2832966.2. Elasto-plastic structure subjected to ground motion

This example of a non-linear system is taken from [1]. Consider astructure that is modeled as a 2D six-story moment-resisting steelframe with two-node beam elements connecting the joints of theframe. The oors are assumed to be rigid in-plane and the jointsare assumed to be rigid-plastic. The yield strength is assumed tobe 317 MPa for all members. Under service load condition, theoors and the roof are subjected to a uniformly-distributed staticspan load of 24.7 kN/m and 13.2 kN/m, respectively. For the hori-zonal motion of the structure, masses are lumped at the oor levels,which include contributions from live loads and the dead loadsfrom the oors and the frame members. The natural frequenciesof the rst two modes of vibration are computed to be 0.61 Hzand 1.71 Hz. Rayleigh damping is assumed so that the rst twomodes have 2% of critical damping. For a full description of thestructure, see [1].

The structure is subject to uncertain earthquake excitationsmodeled as a nonstationary stochastic process. To simulate a timehistory of the ground motion acceleration for given moment mag-nitude M and epicentral distance r, a discrete-time white noise se-quence Wj

2p=Dt

pZj; j 1; . . . ;Nt is rst generated, where

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5x 103

0

500

1000

1500

2000

2500

pF

pSS+

(pF)

SS+ (N=1000)SS (N=1000)Fig. 8. The failure probability estimate p^SSF obtained by SS and the approximation ofthe posterior PDF ~pSS obtained by SS+. The posterior PDF has mean l~pSS 1:064103 and the c.o.v. d~pSS 0:16.study, 30 s. The failure domain F RNt is dened as the exceedanceof peak interstory drift ratio in any one of the stories within theduration of study. That is

F fh 2 RNt : dmaxh > bg; 49

where b is some prescribed critical threshold. In this example,b = 0.5% is considered, which, according to [43], corresponds tooperational damage level. For this damage level, the structuremay have a small amount of yielding.

The failure probability is estimated to be equal to pF = 8.9 103 (based on 4 104 Monte Carlo samples). In the applicationof Subset Simulation, three different implementation scenariosare considered: N = 500, N = 1000, and N = 2000 samples are simu-lated at each conditional level. The failure probability estimates p^SSFobtained by SS for these scenarios and the approximation of thecorresponding posterior PDFs ~pSS obtained by SS+ are given inFig. 9. Observe that the more samples used (i.e. the more informa-tion about the system that is extracted), the more narrowly ~pSS isfocused around p^SSF , as expected. The coefcients of variation ared~pSS 0:190; d~pSS 0:134, and d~pSS 0:095 for N = 500, N = 1000,and N = 2000, respectively. The corresponding frequentist coef-cients of variation of the original SS estimator p^SSF are dp^SSF

5 6 7 8 9 10 11 12x 103

0

50

100

150

200

250

300

350

400

450

500

pF

pSS+

(pF)

SS+ (N=500)SS (N=500)SS+ (N=1000)SS (N=1000)SS+ (N=2000)SS (N=2000)Fig. 9. The failure probability estimates p^SSF obtained by SS and the approximationof the posterior PDF ~pSS obtained by SS+ for three computational scenarios:N = 500, N = 1000, and N = 2000 samples at each conditional level.

rate qj is between 30% and 50%.

mation and the information in the samples generated by SS. This

StruProof. Let K denote the transition kernel of the Markov chain gen-erated by MMA. From the structure of the algorithm it follows thatK has the following form:

Kdhi1jhi kdhi1jhi rhidhi dhi1; 50PDF quanties the uncertainty in the value of pF based on the sam-ples and prior information and it may be used in risk analyses toincorporate this uncertainty, or its approximation ~pSSpF, whichis more convenient for further computations. The original SS esti-mate corresponds to the most probable value in the Bayesianapproach.

Acknowledgements

This work was supported by the National Science Foundation,under award number EAR-0941374 to the California Institute ofTechnology. This support is gratefully acknowledged. Any opin-ions, ndings, and conclusions or recommendations expressed inthis material are those of the authors and do not necessarily reectthose of the National Science Foundation. Partial support isacknowledged from the Hong Kong Research Grant Councilthrough the General Research Fund 9041550 (CityU 110210).

Appendix A

In this appendix we give a detailed proof that pjF is the sta-tionary distribution for the Markov chain generated by the Modi-ed Metropolis algorithm described in Section 2.

Theorem 3 [2]. Let h(1),h(2), . . . be the Markov chain generated by theModied Metropolis algorithm, then pjF is a stationary distribution,i.e. if h(i) is distributed according to pjF, then so is h(i+1).Next, we provide a theoretical basis for the optimal value of theconditional failure probability p0. We demonstrate that choosingany p0 2 [0.1,0.3] will lead to similar efciency and it is not neces-sary to ne tune the value of the conditional failure probability aslong as SS is implemented properly.

Finally, a Bayesian extension SS+ of the original SS method isdeveloped. In SS+, the uncertain failure probability pF that one isestimating is modeled as a stochastic variable whose possible val-ues belong to the unit interval. Instead of a single real number asan estimate as in SS, SS+ produces the posterior PDF pSS+(pF) ofthe failure probability, which takes into account both prior infor-7. Conclusions

This paper focuses on enhancements to the Subset Simulation(SS) method, an efcient algorithm for computing failure probabil-ities for general high-dimensional reliability problems proposed byAu and Beck [2].

First, we explore MMA (Modied Metropolis algorithm), anMCMC technique employed within SS. This exploration leads tothe following nearly optimal scaling strategy for MMA: at simula-tion level jP 1, select rj such that the corresponding acceptance0:303; dp^SSF 0:201, and dp^SSF 0:131 (based on 50 independentruns of the algorithm). In SS+, the coefcient of variation d~pSS canbe considered as a measure of uncertainty, based on the generatedsamples.

K.M. Zuev et al. / Computers andwhere k describes the transitions from h(i) to hi1 hi; rhi 1 R

Fkdh0jhi is the probability of remaining at h(i), and dh denotespoint mass at h (Dirac measure). Note that if h(i+1) h(i), thenk(dh(i+1)jh(i)) can be expressed as a product of the component transi-tional kernels:

kdhi1jhi Ydj1

kj dhi1j jhij

; 51

where kj is the transitional kernel for the jth component of h(i). Bydenition of the algorithm

kj dhi1j jhij

Sj hi1j jhij

min 1;

pj hi1j pj hij

8

Remark 14. A stationary distribution is unique and, therefore, isthe limiting distribution for a Markov chain, if the chain is aperi-odic and irreducible (see, for example, [41]). In the case of MMA,aperiodicity is guaranteed by the fact that the probability of havinga repeated sample h(i+1) = h(i) is not zero. A Markov chain with sta-tionary distribution p() is irreducible if, for any initial state, it haspositive probability of entering any set to which p() assigns posi-tive probability. It is clear that MMA with standard proposal dis-tributions (e.g. Gaussian, uniform, log-normal, etc.) generatesirreducible Markov chains. In this case, pjF is therefore theunique stationary distribution of the MMA Markov chain.

References

[1] Au SK. Reliability based design sensitivity by efcient simulation. ComputStruct 2005;83:104861.

[21] Jaynes ET. In: Bretthorst GL, editor. Probabiity theory: the logic ofscience. Cambridge University Press; 2003.

[22] Jeffreys H. Theory of probability. 3rd revised edition 1961. Oxford: OxfordUniversity Press; 1939.

[23] Katafygiotis LS, Cheung SH. A two-stage subset simulation-based approach forcalculating the reliability of inelastic structural systems subjected to Gaussianrandom excitations. Comput Methods Appl Mech Eng 2005;194(12-16):158195.

[24] Katafygiotis LS, Cheung SH. Application of spherical subset simulation methodand auxiliary domain method on a benchmark reliability study. Struct Safe2007;29(3):194207.

[25] Katafygiotis LS, Moan T, Cheung SH. Auxiliary domain method for solvingmulti-objective dynamic reliability problems for nonlinear structures. StructEng Mech 2007;25(3):34763.

[26] Katafygiotis LS, Zuev KM. Geometric insight into the challenges of solvinghigh-dimensional reliability problems. Prob Eng Mech 2008;23(23):20818.

[27] Koo H, Der Kiureghian A. FORM, SORM and simulation techniques fornonlinear random vibrations. Report No. UCB/SEMM-2003/01. Berkeley, CA,February: Department of Civil & Environmental Engineering, University ofCalifornia; 2003.

296 K.M. Zuev et al. / Computers and Structures 9293 (2012) 283296[2] Au SK, Beck JL. Estimation of small failure probabilities in high dimensions bysubset simulation. Prob Eng Mech 2001;16(4):26377.

[3] Au SK, Beck JL. First-excursion probabilities for linear systems by very efcientimportance sampling. Prob Eng Mech 2001;16(3):193207.

[4] Au SK, Beck JL. Subset Simulation and its application to seismic risk based ondynamic analysis. J Eng Mech 2003;129(8):90117.

[5] Au SK, Ching J, Beck JL. Application of subset simulation methods to reliabilitybenchmark problems. Struct Safe 2007;29:18393.

[6] Bayes T. An essay towards solving a problem in the doctrine of chances. PhilosTrans Roy Soc London 1763;53:370418 [reprinted in Biometrika1989;45:296315].

[7] Beck JL. Bayesian system identication based on probability logic. StructControl Health Monit 2010;17:82547.

[8] Bedard M, Rosenthal JS. Optimal scaling of metropolis algorithms: headingtoward general target distributions. Can J Stat 2008;36:483503.

[9] Ching J, Au SK, Beck JL. Reliability estimation of dynamical systems subject tostochastic excitation using subset simulation with splitting. Comput MethodsAppl Mech Eng 2005;194(1216):155779.

[10] Ching J, Beck JL, Au SK. Hybrid subset simulation method for reliabilityestimation of dynamical systems subject to stochastic excitation. Prob EngMech 2005;20(3):199214.

[11] Cox RT. Probability, frequency, and reasonable expectation. Am J Phys1946;14:113.

[12] Cox RT. The algebra of probable inference. Baltimore: The Johns HopkinsUniversity Press; 1961.

[13] Fan D-Y. The distribution of the product of independent beta variables.Commun Stat Theor Methods 1991;20(12):404352.

[14] Doob JL. Stochastic processes. New York: Wiley; 1953.[15] Galambos J, Simonelli I. Products of random variables: applications to

problems of physics and to arithmetical functions. New York, Basel:Chapman & Hall/CRC Pure and Applied Mathematics; 2004.

[16] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. 2nd ed. CRCPress; 2003.

[17] Gelman A, Roberts GO, Gilks WR. Efcient Metropolis jumping rules. BayesianStat 1996;5:599607.

[18] Hastings WK. Monte Carlo sampling methods using Markov chains and theirapplications. Biometrika 1970;57(1):97109.

[19] Jaynes ET. Information theory and statistical mechanics. Phys Rev1957;106:62030.

[20] Jaynes ET. In: Rosenkrantz RD, editor. Papers on probability, statistics, andstatistical physics. Kluwer Academic Publishers; 1983.[28] Koutsourelakis P, Pradlwarter HJ, Schuller GI. Reliability of structures in highdimensions. Part I: Algorithms and applications. Prob Eng Mech2004;19(4):40917.

[29] Laplace PS. Theorie analytique des probabilites. Courcier, Paris; 1812.[30] Laplace PS. Philosophical essay on probability. New York: Dover Publications;

1951.[31] Liu JS. Monte Carlo strategies is scientic computing. New York: Springer-

Verlag; 2001.[32] Melchers R. Structural reliability analysis and prediction. Chichester: John

Wiley & Sons; 1999.[33] Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of

state calculations by fast computing machines. J. Chem. Phys.1953;21(6):108792.

[34] Neal RM. Probabilistic inference using Markov Chain Monte Carlo methods.Technical Report CRG-TR-93-1, Dept. of Computer Science, University ofToronto; 1993.

[35] Robert CP, Casella G. Monte Carlo statistical methods. 2nd ed. NewYork: Springer-Verlag; 2004.

[36] Roberts GO, Rosenthal JS. Optimal scaling of discrete approximations toLangevin diffusions. J R Stat Soc Ser B Stat Methodol 1998;60:25568.

[37] Rohatgi VK. An introduction to probability theory and mathematicalstatistics. New York: Wiley; 1976.

[38] Schuller GI. Efcient Monte Carlo simulation procedures in structuraluncertainty and reliability analysis recent advances. Struct. Eng. Mech.2009;32(1):120.

[39] Schuller GI, Pradlwarter HJ, Beck JL, Au SK, Katafygiotis LS, Ghanem R.Benchmark study on reliability estimation in higher dimensions of structuralsystems an overview. In: Proceedings 6th European conference on structuraldynamics. Rotterdam: Millpress; 2005.

[40] Tang J, Gupta AK. On the distribution of the product of independent betarandom variables. Stat Prob Lett 1984;2:1658.

[41] Tierney L. Markov chains for exploring posterior distributions. Ann Stat1994;22(4):170162.

[42] Tukey JW, Wilks SS. Approximation of the distribution of the product of betavariables by a single beta variable. Ann Math Stat 1946;17(3):31824.

[43] Vision 2000: Performance based seismic engineering of buildings. Tech. rep.Sacramento, California: Structural Engineers Association of California, 2000.

[44] Zuev KM, Katafygiotis LS. Horseracing Simulation algorithm for evaluation ofsmall failure probabilities. Prob Eng Mech 2011;26(2):15764.

[45] Zuev KM, Katafygiotis LS. Modied MetropolisHastings algorithm withdelayed rejection. Prob Eng Mech 2011;26(3):40512.

Bayesian post-processor and other enhancements of Subset Simulation for estimating failure probabilities in high dimensions1 Introduction2 Subset Simulation2.1 Basic idea of Subset Simulation2.2 Modified Metropolis algorithm2.3 Subset Simulation algorithm

3 Tuning of the Modified Metropolis algorithm3.1 Metropolis algorithm: a brief history of its optimal scaling3.2 Optimal scaling of the Modified Metropolis algorithm

4 Optimal choice of conditional failure probability p05 Bayesian post-processor for Subset Simulation6 Illustrative examples6.1 Linear reliability problem6.2 Elasto-plastic structure subjected to ground motion

7 ConclusionsAcknowledgementsAppendix A References

enhancements subset simulation

Documents

reliability analysis

dynamic reliability

ferent reliability problems

subset simulation ss

linear reliability problem

failure probability

h f rd

challenging problems