math 6630 - applied statistics assignment 3, question 6.3

MATH 6630 - Applied StatisticsAssignment 3, Question 6.3

RUNXI LIUCHAD MANNING

PETER MERCURIO

Envolope Function

It is important to note that we are working with a density proportional to

So, f(x) is only known up to some constant. (i.e. )

Now, the first step towards running a Monte Carlo simulation is determining a suitable envelope function g(x).

Envolope Function

Looking at the graph of q(x), we noticed q(x) is bell shaped and centered about zero. We can use a known bell-shaped distribution as the envelope function. Trying the standard normal, we find it a possible choice, but it requires a multiplicative constant.

Envolope Function

Noticing that both are centered about zero, we tried q(0)/stdnorm(0)=2.506628 as my multiplicative constant.

This is close, but the standard normal is a bit steeper than q(x), so we need a larger constant.

Envolope Function

Next we tried a multiplicative constant of 3.

It turns out to be a near perfect choice. It completely covers q(x) with only a minimal gap at the closest edges. I will move forward with an envelope function of

Helper Functions

n <- 100000mu <- 0sigma <- 1

h <- function(x){ result <- x^2 return(result)}

f <- function(x){ result <- exp(-(abs(x)^3)/3) return(result)}

g <- function(x){ result <- (1/(sigma*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma^2)) return(result)}

(a) Importance Sampling

The basis of importance sampling is from

However, since f(x) is only known up to some constant, standardized weights must be used.

Thus, is replaced with

Using this, perform an Importance Sampling Monte Carlo simulation in R.

Importance Sampling Monte Carlo Sim

impSamp <- function(n){ xi <- rnorm(n, mean = 0, sd = 1) wi <- f(xi)/g(xi) w_tot <- sum(wi) result <- sum(h(xi)*wi/w_tot)

#no 1/n due to using standardized weights return(result)}

resultA <- impSamp(1000)

σ2 = E(X2)resultA 0.7756169

Importance Sampling Monte Carlo Sim

mA <- matrix(0, nrow = 40, ncol = 2)for(j in 1:4){ n <- 100*10^j for(k in 1:10){ mA[(j-1)*10+k,1] <- impSamp(n) mA[(j-1)*10+k,2] <- n }}

To get a better idea of the results, we created a function that would run Importance Sampling over four different values of n (1,000; 10,000; 100,000; and 1,000,000) ten times each.

Run σ2 = E(X2) Run σ2 = E(X2) Run σ2 = E(X2) Run σ2 = E(X2)1 0.7516285 1 0.7634052 1 0.7734620 1 0.77636822 0.7627245 2 0.7644562 2 0.7748544 2 0.77663963 0.7859611 3 0.7859749 3 0.7771132 3 0.77728844 0.7741302 4 0.7841372 4 0.7809199 4 0.77544305 0.7377784 5 0.7742504 5 0.7750274 5 0.77643796 0.7890373 6 0.7625906 6 0.7789396 6 0.77595317 0.8097240 7 0.7629171 7 0.7750663 7 0.77534138 0.7390862 8 0.7647078 8 0.7778266 8 0.77608909 0.7835412 9 0.7754659 9 0.7711946 9 0.777644210 0.7587058 10 0.7788852 10 0.7769269 10 0.7770332

Average 0.76923172 Average 0.77167905 Average 0.77613309 Average 0.77642379Stdev 0.023337119 Stdev 0.009195734 Stdev 0.00280412 Stdev 0.000754657

n = 1,000 n = 10,000 n = 100,000 n = 1,000,000

Importance Sampling Results

As expected, the standard deviation of the results decreases with larger sample sizes. Running over multiple simulations also provides a more accurate (average) estimate of σ2=0.7764.

(b) Rejection Sampling

Rejection Sampling is performed via the following steps:

• Generate a sample, y, from g(x)

• Generate U(0,1)

• If u <= q(x)/M*g(x), accept Otherwise, reject Let x=y

• If accepted, y is included in the sample. If rejected, y is discarded

• Repeat until n accepted observations are obtained

Rejection Sampling Code

rejSamp <- function(n){ count <- 1 x <- numeric(n) while(count <= n){ y<-rnorm(1, mean = 0, sd = 1) u <- runif(1, min = 0, max = 1) if(u <= f(y)/g(y)){ x[count] <- y count <- count + 1 } } return(x)}set.seed(24)xB <- rejSamp(n)resultB <- sum(h(xB))/n

Rejection Sampling Results

par(mfrow=c(1,1))hist(xB,xlim=c(-3,3), freq=F, nclass=50, xlab="q(x)")cf <- integrate(f, -Inf, Inf)$valuef1 <- function(x) { f(x)/cf; } curve(fq1, -3, 3, add=T)

σ2 = E(X2)resultB 0.777088

(c) Riemann Sum Strategy

The Riemann Sum strategy bears some similarities to Importance Sampling, where we substitute

for

Another important difference between the two is the Riemann Sum strategy requires the data be from f(x), whereas Importance Sampling uses data from g(x). So to carry out the Riemann Sum strategy, we used the data from b) (denoted xB in the code).

Riemann Sum Code

riemSum <- function(n, x){ xC <- sort(x) #need ordered results num <- 0 denom <- 0 for(i in 1:(n-1)){ num <- num + ((xC[i+1]-xC[i])*h(xC[i])*f(xC[i])) denom <- denom + ((xC[i+1]-xC[i])*f(xC[i])) } result <- num/denom return(result)} resultC <- riemSum(n, xB)

σ2 = E(X2)resultC 0.776458

(d) Replicated Simulation & Comparison

mBC <- matrix(0, nrow = 40, ncol = 3)for(j in 1:4){ n <- 100*10^j for(k in 1:10){ x_mBC <- rejSamp(n) mBC[(j-1)*10+k,1] <- sum(h(x_mBC))/n mBC[(j-1)*10+k,2] <- riemSum(n, x_mBC) mBC[(j-1)*10+k,3] <- n }}

Using four different n values (1,000; 10,000; 100,000; and 1,000,000), we ran each of the methods from b) and c) ten times each on the same data resulting from Rejection Sampling.

Simulation & Comparison Results

Rejection

Sampling σ2

Riemann Sum σ2

Rejection

Sampling σ2

Riemann Sum σ2

1 0.71661640 0.76019220 11 0.7760067 0.77649872 0.78623330 0.76874670 12 0.778928 0.77479913 0.77522200 0.77044330 13 0.8017436 0.77382624 0.76497340 0.76107530 14 0.7691605 0.77600125 0.78816050 0.76921670 15 0.7758177 0.77745166 0.78224420 0.76567910 16 0.7869697 0.77448847 0.72086930 0.78253920 17 0.7816305 0.7759148 0.76705410 0.76386030 18 0.792519 0.77392679 0.75050460 0.76045400 19 0.7855754 0.773928710 0.73012860 0.76273990 20 0.7608644 0.7733679

Average 0.75820064 0.76649467 Average 0.78092155 0.77502025Stdev 0.0271736 0.00677236 Stdev 0.01165042 0.0014

n = 1,000 n = 10,000

Simulation & Comparison Results

Rejection

Sampling σ2

Riemann Sum σ2

Rejection

Sampling σ2

Riemann Sum σ2

21 0.77164300 0.77603970 31 0.77812120 0.7764451022 0.77228120 0.77609850 32 0.77736760 0.7764301023 0.77675030 0.77630260 33 0.77541960 0.7764518024 0.77431940 0.77621310 34 0.77644090 0.7764451025 0.77870320 0.77709090 35 0.77760840 0.7764725026 0.77863520 0.77641800 36 0.77536390 0.7764575027 0.77663800 0.77625800 37 0.77489640 0.7764597028 0.77335200 0.77632540 38 0.77615330 0.7764640029 0.77428480 0.77640100 39 0.77687400 0.7764289030 0.77670840 0.77638090 40 0.77610050 0.77644020

Average 0.77533155 0.77635281 Average 0.77643458 0.77644949Stdev 0.00251322 0.00028803 Stdev 0.00105512 1.4293E-05

n = 100,000 n = 1,000,000

Conclusions

Again the standard deviation of both methods decreases as n increases, as expected.

The other interesting thing of note, is that for each value of n, the standard deviation of the ten replications is consistently, and sometimes significantly, smaller for the Riemann Sum strategy.

For the larger n values, the average estimate is comparable in each case.

math 6630 - applied statistics assignment 3, question 6.3

Documents

multiplicative constant

envelope function of

larger constant

envolope functionit

envolope functionlooking

envolope functionnext

envolope functionnoticing

graph of qx