math 6630 - applied statistics assignment 3, question 6.3

19
MATH 6630 - Applied Statistics Assignment 3, Question 6.3 RUNXI LIU CHAD MANNING PETER MERCURIO

Upload: rachel-williams

Post on 12-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

MATH 6630 - Applied StatisticsAssignment 3, Question 6.3

RUNXI LIUCHAD MANNING

PETER MERCURIO

Page 2: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 2

Envolope Function

It is important to note that we are working with a density proportional to

So, f(x) is only known up to some constant. (i.e. )

Now, the first step towards running a Monte Carlo simulation is determining a suitable envelope function g(x).

Page 3: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 3

Envolope Function

Looking at the graph of q(x), we noticed q(x) is bell shaped and centered about zero. We can use a known bell-shaped distribution as the envelope function. Trying the standard normal, we find it a possible choice, but it requires a multiplicative constant.

Page 4: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 4

Envolope Function

Noticing that both are centered about zero, we tried q(0)/stdnorm(0)=2.506628 as my multiplicative constant.

This is close, but the standard normal is a bit steeper than q(x), so we need a larger constant.

Page 5: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 5

Envolope Function

Next we tried a multiplicative constant of 3.

It turns out to be a near perfect choice. It completely covers q(x) with only a minimal gap at the closest edges. I will move forward with an envelope function of

Page 6: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 6

Helper Functions

n <- 100000mu <- 0sigma <- 1

h <- function(x){ result <- x^2 return(result)}

f <- function(x){ result <- exp(-(abs(x)^3)/3) return(result)}

g <- function(x){ result <- (1/(sigma*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma^2)) return(result)}

Page 7: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 7

(a) Importance Sampling

The basis of importance sampling is from

However, since f(x) is only known up to some constant, standardized weights must be used.

Thus, is replaced with

Using this, perform an Importance Sampling Monte Carlo simulation in R.

Page 8: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 8

Importance Sampling Monte Carlo Sim

impSamp <- function(n){ xi <- rnorm(n, mean = 0, sd = 1) wi <- f(xi)/g(xi) w_tot <- sum(wi) result <- sum(h(xi)*wi/w_tot)

#no 1/n due to using standardized weights return(result)}

resultA <- impSamp(1000)

σ2 = E(X2)resultA 0.7756169

Page 9: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 9

Importance Sampling Monte Carlo Sim

mA <- matrix(0, nrow = 40, ncol = 2)for(j in 1:4){ n <- 100*10^j for(k in 1:10){ mA[(j-1)*10+k,1] <- impSamp(n) mA[(j-1)*10+k,2] <- n }}

To get a better idea of the results, we created a function that would run Importance Sampling over four different values of n (1,000; 10,000; 100,000; and 1,000,000) ten times each.

Page 10: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 10

Run σ2 = E(X2) Run σ2 = E(X2) Run σ2 = E(X2) Run σ2 = E(X2)1 0.7516285 1 0.7634052 1 0.7734620 1 0.77636822 0.7627245 2 0.7644562 2 0.7748544 2 0.77663963 0.7859611 3 0.7859749 3 0.7771132 3 0.77728844 0.7741302 4 0.7841372 4 0.7809199 4 0.77544305 0.7377784 5 0.7742504 5 0.7750274 5 0.77643796 0.7890373 6 0.7625906 6 0.7789396 6 0.77595317 0.8097240 7 0.7629171 7 0.7750663 7 0.77534138 0.7390862 8 0.7647078 8 0.7778266 8 0.77608909 0.7835412 9 0.7754659 9 0.7711946 9 0.777644210 0.7587058 10 0.7788852 10 0.7769269 10 0.7770332

Average 0.76923172 Average 0.77167905 Average 0.77613309 Average 0.77642379Stdev 0.023337119 Stdev 0.009195734 Stdev 0.00280412 Stdev 0.000754657

n = 1,000 n = 10,000 n = 100,000 n = 1,000,000

Importance Sampling Results

As expected, the standard deviation of the results decreases with larger sample sizes. Running over multiple simulations also provides a more accurate (average) estimate of σ2=0.7764.

Page 11: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 11

(b) Rejection Sampling

Rejection Sampling is performed via the following steps:

• Generate a sample, y, from g(x)

• Generate U(0,1)

• If u <= q(x)/M*g(x), accept Otherwise, reject Let x=y

• If accepted, y is included in the sample. If rejected, y is discarded

• Repeat until n accepted observations are obtained

Page 12: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 12

Rejection Sampling Code

rejSamp <- function(n){ count <- 1 x <- numeric(n) while(count <= n){ y<-rnorm(1, mean = 0, sd = 1) u <- runif(1, min = 0, max = 1) if(u <= f(y)/g(y)){ x[count] <- y count <- count + 1 } } return(x)}set.seed(24)xB <- rejSamp(n)resultB <- sum(h(xB))/n

Page 13: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 13

Rejection Sampling Results

par(mfrow=c(1,1))hist(xB,xlim=c(-3,3), freq=F, nclass=50, xlab="q(x)")cf <- integrate(f, -Inf, Inf)$valuef1 <- function(x) { f(x)/cf; } curve(fq1, -3, 3, add=T)

σ2 = E(X2)resultB 0.777088

Page 14: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 14

(c) Riemann Sum Strategy

The Riemann Sum strategy bears some similarities to Importance Sampling, where we substitute

for

Another important difference between the two is the Riemann Sum strategy requires the data be from f(x), whereas Importance Sampling uses data from g(x). So to carry out the Riemann Sum strategy, we used the data from b) (denoted xB in the code).

Page 15: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 15

Riemann Sum Code

riemSum <- function(n, x){ xC <- sort(x) #need ordered results num <- 0 denom <- 0 for(i in 1:(n-1)){ num <- num + ((xC[i+1]-xC[i])*h(xC[i])*f(xC[i])) denom <- denom + ((xC[i+1]-xC[i])*f(xC[i])) } result <- num/denom return(result)} resultC <- riemSum(n, xB)

σ2 = E(X2)resultC 0.776458

Page 16: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 16

(d) Replicated Simulation & Comparison

mBC <- matrix(0, nrow = 40, ncol = 3)for(j in 1:4){ n <- 100*10^j for(k in 1:10){ x_mBC <- rejSamp(n) mBC[(j-1)*10+k,1] <- sum(h(x_mBC))/n mBC[(j-1)*10+k,2] <- riemSum(n, x_mBC) mBC[(j-1)*10+k,3] <- n }}

Using four different n values (1,000; 10,000; 100,000; and 1,000,000), we ran each of the methods from b) and c) ten times each on the same data resulting from Rejection Sampling.

Page 17: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 17

Simulation & Comparison Results

Rejection

Sampling σ2

Riemann Sum σ2

Rejection

Sampling σ2

Riemann Sum σ2

1 0.71661640 0.76019220 11 0.7760067 0.77649872 0.78623330 0.76874670 12 0.778928 0.77479913 0.77522200 0.77044330 13 0.8017436 0.77382624 0.76497340 0.76107530 14 0.7691605 0.77600125 0.78816050 0.76921670 15 0.7758177 0.77745166 0.78224420 0.76567910 16 0.7869697 0.77448847 0.72086930 0.78253920 17 0.7816305 0.7759148 0.76705410 0.76386030 18 0.792519 0.77392679 0.75050460 0.76045400 19 0.7855754 0.773928710 0.73012860 0.76273990 20 0.7608644 0.7733679

Average 0.75820064 0.76649467 Average 0.78092155 0.77502025Stdev 0.0271736 0.00677236 Stdev 0.01165042 0.0014

n = 1,000 n = 10,000

Page 18: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 18

Simulation & Comparison Results

Rejection

Sampling σ2

Riemann Sum σ2

Rejection

Sampling σ2

Riemann Sum σ2

21 0.77164300 0.77603970 31 0.77812120 0.7764451022 0.77228120 0.77609850 32 0.77736760 0.7764301023 0.77675030 0.77630260 33 0.77541960 0.7764518024 0.77431940 0.77621310 34 0.77644090 0.7764451025 0.77870320 0.77709090 35 0.77760840 0.7764725026 0.77863520 0.77641800 36 0.77536390 0.7764575027 0.77663800 0.77625800 37 0.77489640 0.7764597028 0.77335200 0.77632540 38 0.77615330 0.7764640029 0.77428480 0.77640100 39 0.77687400 0.7764289030 0.77670840 0.77638090 40 0.77610050 0.77644020

Average 0.77533155 0.77635281 Average 0.77643458 0.77644949Stdev 0.00251322 0.00028803 Stdev 0.00105512 1.4293E-05

n = 100,000 n = 1,000,000

Page 19: MATH 6630 - Applied Statistics Assignment 3, Question 6.3

Page 19

Conclusions

Again the standard deviation of both methods decreases as n increases, as expected.

The other interesting thing of note, is that for each value of n, the standard deviation of the ten replications is consistently, and sometimes significantly, smaller for the Riemann Sum strategy.

For the larger n values, the average estimate is comparable in each case.