ch4. variance reduction techniques

IntroductionThe Basic Problem

Variance Reduction Techniques

Ch4. Variance Reduction Techniques

Zhang Jin-Ting

Department of Statistics and Applied Probability

July 17, 2012

Zhang J.T. Ch4. Variance Reduction Techniques

Outline

Introduction

The Basic Problem

I This chapter aims to improve the Monte Carlo Integrationestimator via reducing its variance using some usefultechniques.

I Stratified SamplingI Importance SamplingI Control Variates MethodI Antithetic Variates Method

The Integration ProblemI Suppose we want to estimate an integral over some

region, such as

Sk(x)dx

where S is subset of Rd , x denotes a generic point of Rd ,and k is a given real-valued function on S; or

Rdh(x)f (x)dx

where h is a real-valued function on Rd and f is a given pdfon Rd .

The Transformed Problem: Monte Carlo IntegrationI It is clear that IB can be written as an expectation:

IB = E(h(X )) where X ∼ f .

I Also, extend the definition of k to all of Rd by saying thatk(x) = 0 for every x that is not in S, then

Rdk(x) =

f (x)f (x)dx = E [

f (x)]. (1)

The Transformed Problem: Monte Carlo IntegrationI It is clear that IB can be written as an expectation:

IB = E(h(X )) where X ∼ f .

I Also, extend the definition of k to all of Rd by saying thatk(x) = 0 for every x that is not in S, then

Rdk(x) =

f (x)f (x)dx = E [

f (x)]. (1)

I Notice that k(x )f (x ) is well-defined except where f equals 0,

which is a set of probability 0.I This is a simple trick that will be especially useful in the

method known as Importance Sampling.

I Notice that k(x )f (x ) is well-defined except where f equals 0,

which is a set of probability 0.I This is a simple trick that will be especially useful in the

method known as Importance Sampling.

Simple SamplingI This leads to a natural Monte Carlo strategy for estimating

the value of IB, say.I If we can generate iid random variables X1, X2, . . . whose

common pdf is f , then for every n,

In =1n

h(X i)

is an unbiased estimator of IB.

Simple SamplingI This leads to a natural Monte Carlo strategy for estimating

the value of IB, say.I If we can generate iid random variables X1, X2, . . . whose

common pdf is f , then for every n,

In =1n

h(X i)

is an unbiased estimator of IB.

I Moreover, the strong law of large numbers implies that Inconverges to IB with probability 1 as n →∞.

I This method for estimating IB is called simple sampling.

I Moreover, the strong law of large numbers implies that Inconverges to IB with probability 1 as n →∞.

I This method for estimating IB is called simple sampling.

The Variance Reduction ProblemI The variance of simple sampling estimator In of IB is

var(In) =var(h(X ))

S h(x)2f (x)dx − I2B)

n. (2)

I The variance of the estimator determines the size of theconfidence interval.

I The n in the denominator is hard to avoid in Monte Carlo,but there are various ways to reduce the numerator.

var(In) =var(h(X ))

n. (2)

var(In) =var(h(X ))

n. (2)

I The goal of this chapter is to explore alternative samplingschemes which can achieve smaller variance for the sameamount of computational efforts.

Stratified SamplingStep 1: Range Partition

I Stratified sampling is a powerful and commonly usedtechnique in population survey and is also very useful inMonte Carlo computations.

I To evaluate IB, the stratified sampling is to partition S intoseveral disjoint sets S(1), . . . , S(M) (so that S = ∪M

i=1S(i)).

Stratified SamplingStep 1: Range Partition

I Stratified sampling is a powerful and commonly usedtechnique in population survey and is also very useful inMonte Carlo computations.

I To evaluate IB, the stratified sampling is to partition S intoseveral disjoint sets S(1), . . . , S(M) (so that S = ∪M

i=1S(i)).

I For i = 1, . . . , M, let

S(i)f (x)dx = P(X ∈ S(i)).

I Observe that a1 + · · ·+ aM = 1. Fix integers n1, . . . , nMsuch that n1 + · · ·+ nM = n.

I For i = 1, . . . , M, let

S(i)f (x)dx = P(X ∈ S(i)).

I Observe that a1 + · · ·+ aM = 1. Fix integers n1, . . . , nMsuch that n1 + · · ·+ nM = n.

Step 2: Sub-samplingI For each i , generate ni samples X (i)

1 , . . . , X (i)ni

from S(i)

having the conditional pdf

g(x) =

{f (x )

aiif x ∈ S(i)

0 otherwise

I Let Ti = n−1i

∑nij=1 h(X (i)

j ). Then

E(Ti) =

S(i)h(x)

aidx =

S(i)h(x)f (x)dx = Ii/ai ,

by defining Ii =∫

S(i) h(x)f (x)dx .

Step 3: The Stratified EstimatorI Observe that I1 + · · ·+ IM = IB. The stratified estimator is

T =M∑

aiTi .

I It is unbiased because of

E(T ) =M∑

aiE(Ti) =M∑

ai Ii/ai = IB.

Step 3: The Stratified EstimatorI Observe that I1 + · · ·+ IM = IB. The stratified estimator is

T =M∑

aiTi .

I It is unbiased because of

E(T ) =M∑

aiE(Ti) =M∑

ai Ii/ai = IB.

I The variance of T is

var(T ) =M∑

a2i var(Ti),

var(Ti) =

∫S(i) h(x)2 f (x )

aidx − ( Ii

following from (2).

TheoremThe Foundation Theory of the Stratified Sampling Ifni = nai for i = 1, . . . , M. then the stratified estimator hassmaller variance than the simple estimator In. In fact,

var(In) = var(T ) +1n

ai(Iiai− IB)2.

I The choice ni = nai , called “proportional allocation”, give astratified estimator which has smaller variance than thesimple estimator.

TheoremThe Foundation Theory of the Stratified Sampling Ifni = nai for i = 1, . . . , M. then the stratified estimator hassmaller variance than the simple estimator In. In fact,

var(In) = var(T ) +1n

ai(Iiai− IB)2.

I The choice ni = nai , called “proportional allocation”, give astratified estimator which has smaller variance than thesimple estimator.

Importance SamplingProperty of the Important Sampling

I Importance sampling is a very powerful method that canimprove Monte Carlo efficiency by orders of magnitude insome problems.

I But it requires Caution: an inappropriate implementationcan reduce efficiency by orders of magnitude!

Importance SamplingProperty of the Important Sampling

I Importance sampling is a very powerful method that canimprove Monte Carlo efficiency by orders of magnitude insome problems.

I But it requires Caution: an inappropriate implementationcan reduce efficiency by orders of magnitude!

The Basic IdeaI The method works by sampling from an artificial probability

distribution that is chosen by the user, and thenreweighting the observations to get an unbiased estimate.

I The idea is based on the identity (1)

Rdk(x) =

f (x)f (x)dx = E [

f (x)].

The Basic IdeaI The method works by sampling from an artificial probability

distribution that is chosen by the user, and thenreweighting the observations to get an unbiased estimate.

I The idea is based on the identity (1)

Rdk(x) =

f (x)f (x)dx = E [

f (x)].

I It implies that IA can be estimated by

Jn =1n

f (Xi),

where Xi ’s are iid from f .I We call Jn the importance sampling estimator based on f .I The identity (1) implies that Jn is unbiased.

Jn =1n

f (Xi),

Jn =1n

f (Xi),

The Important Sampling ProcedureI Suppose now one is interested in evaluating

Rdh(x)f (x)dx ,

the procedure of the importance sampling is as follows:(a) Draw X1, . . . , Xn from a trial density g.(b) Calculate the importance weight

wj = f (Xj)/g(Xj), for j = 1, . . . , n.

(c) Approximate IB by

Jg,n =

∑nj=1 wjh(Xj)∑n

j=1 wj. (3)

Rdh(x)f (x)dx ,

wj = f (Xj)/g(Xj), for j = 1, . . . , n.

Jg,n =

∑nj=1 wjh(Xj)∑n

j=1 wj. (3)

Rdh(x)f (x)dx ,

wj = f (Xj)/g(Xj), for j = 1, . . . , n.

Jg,n =

∑nj=1 wjh(Xj)∑n

j=1 wj. (3)

Rdh(x)f (x)dx ,

wj = f (Xj)/g(Xj), for j = 1, . . . , n.

Jg,n =

∑nj=1 wjh(Xj)∑n

j=1 wj. (3)

I Thus, in order to make the estimation error small, onewants to choose g as “close” in shape to h(x)f (x) aspossible.

An Alternative Important Sampling ProcedureI A major advantage of using (3) instead of the unbiased

estimate,

IB =1n

wjh(Xj)

is thatI in using the former, we need only to know the ratio

f (X )/g(X ) up to a multiplicative constant; whereas in thelatter, the ratio needs to be known exactly.

I Although introducing a small bias, (3) often has a smallermean squared error than the unbiased one IB.

estimate,

IB =1n

wjh(Xj)

estimate,

IB =1n

wjh(Xj)

Example 1I Let h(x) = 4

√1− x2, x ∈ [0, 1]. Let us imagine that we do

not know how to evaluate I =∫ 1

0 h(x)dx (which is π, ofcourse).

Use Simple SamplingI The simple sampling estimate is

In =1n

1− U2i ,

where Ui are iid U[0,1] random variables.I This is unbiased, with variance

var(In) =1n

0h(x)2dx−I2) =

016(1−x2)dx−π2) =

0.797n

Use Simple SamplingI The simple sampling estimate is

In =1n

1− U2i ,

where Ui are iid U[0,1] random variables.I This is unbiased, with variance

var(In) =1n

0h(x)2dx−I2) =

016(1−x2)dx−π2) =

0.797n

Use Inappropriate Important SamplingI Consider the importance sampling estimate based on the

pdf gb(x) = 2x , x ∈ [0, 1].I It is easy to generate Yi ∼ gb ( the cdf is F (t) = t2, so we

can set Yi ← F−1(Ui) =√

Ui , where Ui ∼ U[0, 1]).I The importance sampling estimator is

J(b)n =

h(Yi)/gb(Yi) =1n

1− Y 2i

J(b)n =

h(Yi)/gb(Yi) =1n

1− Y 2i

J(b)n =

h(Yi)/gb(Yi) =1n

1− Y 2i

I The J(b)n has mean I and variance

var(J(b)n ) =

var(h(Y )

gb(Y )) =

gb(x)− I)2dx = +∞.

I Hence, the trial density g(x) = 2x is very bad, and weneed try a different one.

I The J(b)n has mean I and variance

var(J(b)n ) =

var(h(Y )

gb(Y )) =

gb(x)− I)2dx = +∞.

I Hence, the trial density g(x) = 2x is very bad, and weneed try a different one.

Use Appropriate Important SamplingI Let gc(x) = (4− 2x)/3, x ∈ [0, 1].I The importance sampling estimator is

J(c)n =

1− Y 2i

(4− 2Yi)/3,

whose variance is

var(J(c)n ) =

var(h(Y )

gc(Y )) =

gc(x)− I)2dx

16(1− x2)

(4− 2x)/3dx − π2] = 0.224/n.

Use Appropriate Important SamplingI Let gc(x) = (4− 2x)/3, x ∈ [0, 1].I The importance sampling estimator is

J(c)n =

1− Y 2i

(4− 2Yi)/3,

whose variance is

var(J(c)n ) =

var(h(Y )

gc(Y )) =

gc(x)− I)2dx

16(1− x2)

(4− 2x)/3dx − π2] = 0.224/n.

I Thus, the importance sampling estimate of (c) can achievethe same size confidence interval as the simple samplingestimate of (a) while using only one third as manygenerated random variables.

Control Variates MethodThe Main Idea

I In this method, one uses a control variate C, which isCorrelated with the sample X , to produce a better estimate.

The ProcedureI Suppose the estimation of µ = E(X ) is of interest and

µC = E(C) is known.I Then we can construct Monte Carlo samples of the form

X (b) = X + b(C − µC),

which have the same mean as X , but a new variance

var(X (b)) = var(X )− 2bCov(X , C) + b2var(C).

X (b) = X + b(C − µC),

I If the computation of Cov(X , C) and var(C) is easy, thenwe can let b = Cov(X , C)/Var(C), in which case

var(X (b)) = (1− ρ2XC)var(X ) < var(X ).

A Special CaseI Another situation is when we know only that E(C) is equal

to µ. Then, we can form X (b) = bX + (1− b)C.I It is easy to show that if C is Correlated with X , we can

always choose a proper b so that X (b) has a smallervariance than X .

Antithetic Variates MethodThe Main Idea

I Suppose U is a random number used in the production ofa sample X that follows a distribution with cdf F, that is,X = F−1(U), then X ′ = F−1(1− U) also followsdistribution F .

I More generally, if g is a monotone function, then

[g(u1)− g(u2)][g(1− u1)− g(1− u2)] ≤ 0

for any u1, u2 ∈ [0, 1].

Antithetic Variates MethodThe Main Idea

I Suppose U is a random number used in the production ofa sample X that follows a distribution with cdf F, that is,X = F−1(U), then X ′ = F−1(1− U) also followsdistribution F .

I More generally, if g is a monotone function, then

[g(u1)− g(u2)][g(1− u1)− g(1− u2)] ≤ 0

for any u1, u2 ∈ [0, 1].

I For two independent uniform random variable U1 and U2,we have

E{[g(U1)−g(U2)][g(1−U1)−g(1−U2)]} = Cov(X , X ′) ≤ 0,

where X = g(U) and X ′ = g(1− U).I Therefore, var[(X + X ′)/2] ≤ var(X )/2, implying that using

the pair X and X ′ is better than using two independentMonte Carlo draws for estimating E(X ).

I For two independent uniform random variable U1 and U2,we have

E{[g(U1)−g(U2)][g(1−U1)−g(1−U2)]} = Cov(X , X ′) ≤ 0,

where X = g(U) and X ′ = g(1− U).I Therefore, var[(X + X ′)/2] ≤ var(X )/2, implying that using

the pair X and X ′ is better than using two independentMonte Carlo draws for estimating E(X ).

Example 2I We return once more to the problem of estimating the

integral I =∫ 1

0 4√

1− x2dx .I Choose a large even value of n. As usual, our Simple

Estimator and its Variance are

In =1n

h(Ui), var(In) = 0.797/n.

Example 2I We return once more to the problem of estimating the

integral I =∫ 1

0 4√

1− x2dx .I Choose a large even value of n. As usual, our Simple

Estimator and its Variance are

In =1n

h(Ui), var(In) = 0.797/n.

I Our corresponding Antithetic Estimator and its Varianceare

IAnn =

n/2∑

(h(Ui) + h(1− Ui)).

var(IAnn ) =

[var(h(U1) + 2Cov(h(U1), h(1− U1))

+ var(h(1− U1))]}

[var(h(U1) + Cov(h(U1), h(1− U1))]

= 0.219/n

ch4. variance reduction techniques

Documents