math 6630 - applied statistics assignment 3, question 6.3
TRANSCRIPT
MATH 6630 - Applied StatisticsAssignment 3, Question 6.3
RUNXI LIUCHAD MANNING
PETER MERCURIO
Page 2
Envolope Function
It is important to note that we are working with a density proportional to
So, f(x) is only known up to some constant. (i.e. )
Now, the first step towards running a Monte Carlo simulation is determining a suitable envelope function g(x).
Page 3
Envolope Function
Looking at the graph of q(x), we noticed q(x) is bell shaped and centered about zero. We can use a known bell-shaped distribution as the envelope function. Trying the standard normal, we find it a possible choice, but it requires a multiplicative constant.
Page 4
Envolope Function
Noticing that both are centered about zero, we tried q(0)/stdnorm(0)=2.506628 as my multiplicative constant.
This is close, but the standard normal is a bit steeper than q(x), so we need a larger constant.
Page 5
Envolope Function
Next we tried a multiplicative constant of 3.
It turns out to be a near perfect choice. It completely covers q(x) with only a minimal gap at the closest edges. I will move forward with an envelope function of
Page 6
Helper Functions
n <- 100000mu <- 0sigma <- 1
h <- function(x){ result <- x^2 return(result)}
f <- function(x){ result <- exp(-(abs(x)^3)/3) return(result)}
g <- function(x){ result <- (1/(sigma*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma^2)) return(result)}
Page 7
(a) Importance Sampling
The basis of importance sampling is from
However, since f(x) is only known up to some constant, standardized weights must be used.
Thus, is replaced with
Using this, perform an Importance Sampling Monte Carlo simulation in R.
Page 8
Importance Sampling Monte Carlo Sim
impSamp <- function(n){ xi <- rnorm(n, mean = 0, sd = 1) wi <- f(xi)/g(xi) w_tot <- sum(wi) result <- sum(h(xi)*wi/w_tot)
#no 1/n due to using standardized weights return(result)}
resultA <- impSamp(1000)
σ2 = E(X2)resultA 0.7756169
Page 9
Importance Sampling Monte Carlo Sim
mA <- matrix(0, nrow = 40, ncol = 2)for(j in 1:4){ n <- 100*10^j for(k in 1:10){ mA[(j-1)*10+k,1] <- impSamp(n) mA[(j-1)*10+k,2] <- n }}
To get a better idea of the results, we created a function that would run Importance Sampling over four different values of n (1,000; 10,000; 100,000; and 1,000,000) ten times each.
Page 10
Run σ2 = E(X2) Run σ2 = E(X2) Run σ2 = E(X2) Run σ2 = E(X2)1 0.7516285 1 0.7634052 1 0.7734620 1 0.77636822 0.7627245 2 0.7644562 2 0.7748544 2 0.77663963 0.7859611 3 0.7859749 3 0.7771132 3 0.77728844 0.7741302 4 0.7841372 4 0.7809199 4 0.77544305 0.7377784 5 0.7742504 5 0.7750274 5 0.77643796 0.7890373 6 0.7625906 6 0.7789396 6 0.77595317 0.8097240 7 0.7629171 7 0.7750663 7 0.77534138 0.7390862 8 0.7647078 8 0.7778266 8 0.77608909 0.7835412 9 0.7754659 9 0.7711946 9 0.777644210 0.7587058 10 0.7788852 10 0.7769269 10 0.7770332
Average 0.76923172 Average 0.77167905 Average 0.77613309 Average 0.77642379Stdev 0.023337119 Stdev 0.009195734 Stdev 0.00280412 Stdev 0.000754657
n = 1,000 n = 10,000 n = 100,000 n = 1,000,000
Importance Sampling Results
As expected, the standard deviation of the results decreases with larger sample sizes. Running over multiple simulations also provides a more accurate (average) estimate of σ2=0.7764.
Page 11
(b) Rejection Sampling
Rejection Sampling is performed via the following steps:
• Generate a sample, y, from g(x)
• Generate U(0,1)
• If u <= q(x)/M*g(x), accept Otherwise, reject Let x=y
• If accepted, y is included in the sample. If rejected, y is discarded
• Repeat until n accepted observations are obtained
Page 12
Rejection Sampling Code
rejSamp <- function(n){ count <- 1 x <- numeric(n) while(count <= n){ y<-rnorm(1, mean = 0, sd = 1) u <- runif(1, min = 0, max = 1) if(u <= f(y)/g(y)){ x[count] <- y count <- count + 1 } } return(x)}set.seed(24)xB <- rejSamp(n)resultB <- sum(h(xB))/n
Page 13
Rejection Sampling Results
par(mfrow=c(1,1))hist(xB,xlim=c(-3,3), freq=F, nclass=50, xlab="q(x)")cf <- integrate(f, -Inf, Inf)$valuef1 <- function(x) { f(x)/cf; } curve(fq1, -3, 3, add=T)
σ2 = E(X2)resultB 0.777088
Page 14
(c) Riemann Sum Strategy
The Riemann Sum strategy bears some similarities to Importance Sampling, where we substitute
for
Another important difference between the two is the Riemann Sum strategy requires the data be from f(x), whereas Importance Sampling uses data from g(x). So to carry out the Riemann Sum strategy, we used the data from b) (denoted xB in the code).
Page 15
Riemann Sum Code
riemSum <- function(n, x){ xC <- sort(x) #need ordered results num <- 0 denom <- 0 for(i in 1:(n-1)){ num <- num + ((xC[i+1]-xC[i])*h(xC[i])*f(xC[i])) denom <- denom + ((xC[i+1]-xC[i])*f(xC[i])) } result <- num/denom return(result)} resultC <- riemSum(n, xB)
σ2 = E(X2)resultC 0.776458
Page 16
(d) Replicated Simulation & Comparison
mBC <- matrix(0, nrow = 40, ncol = 3)for(j in 1:4){ n <- 100*10^j for(k in 1:10){ x_mBC <- rejSamp(n) mBC[(j-1)*10+k,1] <- sum(h(x_mBC))/n mBC[(j-1)*10+k,2] <- riemSum(n, x_mBC) mBC[(j-1)*10+k,3] <- n }}
Using four different n values (1,000; 10,000; 100,000; and 1,000,000), we ran each of the methods from b) and c) ten times each on the same data resulting from Rejection Sampling.
Page 17
Simulation & Comparison Results
Rejection
Sampling σ2
Riemann Sum σ2
Rejection
Sampling σ2
Riemann Sum σ2
1 0.71661640 0.76019220 11 0.7760067 0.77649872 0.78623330 0.76874670 12 0.778928 0.77479913 0.77522200 0.77044330 13 0.8017436 0.77382624 0.76497340 0.76107530 14 0.7691605 0.77600125 0.78816050 0.76921670 15 0.7758177 0.77745166 0.78224420 0.76567910 16 0.7869697 0.77448847 0.72086930 0.78253920 17 0.7816305 0.7759148 0.76705410 0.76386030 18 0.792519 0.77392679 0.75050460 0.76045400 19 0.7855754 0.773928710 0.73012860 0.76273990 20 0.7608644 0.7733679
Average 0.75820064 0.76649467 Average 0.78092155 0.77502025Stdev 0.0271736 0.00677236 Stdev 0.01165042 0.0014
n = 1,000 n = 10,000
Page 18
Simulation & Comparison Results
Rejection
Sampling σ2
Riemann Sum σ2
Rejection
Sampling σ2
Riemann Sum σ2
21 0.77164300 0.77603970 31 0.77812120 0.7764451022 0.77228120 0.77609850 32 0.77736760 0.7764301023 0.77675030 0.77630260 33 0.77541960 0.7764518024 0.77431940 0.77621310 34 0.77644090 0.7764451025 0.77870320 0.77709090 35 0.77760840 0.7764725026 0.77863520 0.77641800 36 0.77536390 0.7764575027 0.77663800 0.77625800 37 0.77489640 0.7764597028 0.77335200 0.77632540 38 0.77615330 0.7764640029 0.77428480 0.77640100 39 0.77687400 0.7764289030 0.77670840 0.77638090 40 0.77610050 0.77644020
Average 0.77533155 0.77635281 Average 0.77643458 0.77644949Stdev 0.00251322 0.00028803 Stdev 0.00105512 1.4293E-05
n = 100,000 n = 1,000,000
Page 19
Conclusions
Again the standard deviation of both methods decreases as n increases, as expected.
The other interesting thing of note, is that for each value of n, the standard deviation of the ten replications is consistently, and sometimes significantly, smaller for the Riemann Sum strategy.
For the larger n values, the average estimate is comparable in each case.