5 hs olfdwhr u([s or uh - haslam college of business · 2019. 11. 19. · skhw suhglfw rxw ;whvw...

Replicate or Explore?DAE UTK, October 2019

Robert B. Gramacy ([email protected] : http://bobby.gramacy.com) Department of Statistics, Virginia Tech

Joint with Mickaël Binois (INRIA), Jiangeng Huang (VT),Mike Ludkovski (UCSB) and Chris Franck (VT)

mailto:[email protected]://bobby.gramacy.com/

Stochastic simulation

Increasingly, data in geostatistics, machine learning, and computer simulationexperiments involve signal-to-noise ratios which

Stochastic (computer) simulators, from physics, business and epidemiology mayexhibit both of those features simultaneously,

With noisy processes, more samples are needed to isolate the signal.

What can be done?

may be low

and/or possibly changing over the input space.

·

·

but lets start with the first (walk before we run).·

The canonical GP surrogate buckles under the weight of even modestly big data.

We must decompose an matrix, to obtain and , at cost.

·

· N × N K −1N | |KN ( )N3

2/31

Replication

Woodbury trick

Replication can be a powerful device for separating signal from noise, and can yieldcomputational savings as well.

Long story short, if outputs are observed at unique inputs, then the Woodburyidentity provides sufficient statistics rather than .

Unlike alternatives such as stochastic kriging (Ankenman, et al., 2010), noapproximations are made or asymptotic arguments required.

The details (Binois, et al., 2018a) are for another time, but …

N n(n) (N)

This leads to matrices in likelihood and predictive equations,

and thus dramatically faster inference (via cubic decompositions) when .

· n × n

· N ≫ n

Identities abound and derivatives (for MLEs) are analytic.·

4/31

http://www.stochastickriging.net/https://arxiv.org/abs/1611.05902

For example

… consider

unique input locations, expanding to when

each has a random number of replicates .

· n = 100 N ≈ 2500· ∼ Unif{1, 2, … , 50}ai

Xbar

Comparing full- to unique-

Essentially no difference on lengthscale hyperparameter estimates …

… but a big difference in time!

N n

library(hetGP)

Lets use our new hetGP package to fit "full- " and "unique- " GPs.· N n

Lwr

Heteroskedasticity

GPs disappoint on the motorcycle data.

library(MASS); hom

Latent GP noise

Although there are many appealing methods for heteroskedastic GP modeling,predominantly from the machine learning literature (e.g., Goldberg, et al., 1998),

The key ingredient, of latent noise variables stored diagonally in ,kept smooth under a (log) GP prior, has merit.

Again, the details (Binois, et al., 2018a) are for another time, but …

most are impractical (MCMC), except on the smallest data sets.·

, , … ,δ1 δ2 δn Δn

Those latents are easily subsumed into the (Woodbury) MLE framework.

Maximization is facilitated by closed form derivatives with respect to the latent values, all in time.

·

·Δn ( )n

3

9/31

https://papers.nips.cc/paper/1444-regression-with-input-dependent-noise-a-gaussian-process-treatment.pdfhttps://arxiv.org/abs/1611.05902

Full mean and noise inference

… lets check out an example via hetGP:

Predictions and summary stats to visualize on the next slide.

het2

par(mfrow=c(1,2))

plot(mcycle$times, mcycle$accel, ylim=c(-160,90), ylab="acc", xlab="time")

lines(Xgrid, p2$mean, col=2, lwd=2)

lines(Xgrid, ql, col=2, lty=2); lines(Xgrid, qu, col=2, lty=2)

plot(Xgrid, p2$nugs, type="l", lwd=2, ylab="s2", xlab="time", ylim=c(0,2e3))

points(het2$X0, sapply(find_reps(mcycle[,1],mcycle[,2])$Zlist, var), col=3, pch=20)

11/31

Real examples

Example: Epidemics management

One method of studying disease outbreak dynamics is based on stochasticcompartmental modeling,

Here we consider the total number of newly infected individuals

under continuous time Markov dynamics with transitions and ,solved with Monte Carlo (Hu et al., 2015).

for example, so-called Susceptible, Infected & Recovered (SIR) models.·

f (x) := 𝔼{ − ∣ ( , , ) = x} = γ𝔼{ dt ∣ x}S0 limT→∞ ST S0 I0 R0 ∫∞

0

It

S + I → 2I I → R

The inputs are in 2d: ;

the resulting surface is heteroskedastic

and has some very-high noise regions.

· x = ( , )S0 I0

·

·

13/31

https://arxiv.org/abs/1509.00980

Consider a space-filling design of size unique runs, with a random numberof replicates , for .

The hetGP package provides sirEval, returning the expected number of infecteds atthe end of the simulation.

n = 200∈ {1, … , 100}ai i = 1, … , n

Xbar

SIR hetGP fit

To help with the visuals on the next slide, the code below creates a dense grid in 2Dand calls the predict method on the "hetGP"-class fit object.

fit

par(mfrow=c(1,2), mar=c(4,4,2,1)); cols

Example: assemble to order

Inventory management simulator (Hong & Nelson, 2006); ; .

Proper scores (Gneiting & Raftery, 2007) to measure mean accuracy (MSE) relative topredicted variance (larger is better).

d = 8 N ≈ 5000

17/31

http://mason.gmu.edu/~jxu13/ISC/HongNelsonCOMPASS.pdfhttp://amstat.tandfonline.com/doi/abs/10.1198/016214506000001437

The hetGP package has a saved fit on space-filling training/testing data from thisexperiment.

Reproducing the comparison on proper scores:

data("ato")

c(n=nrow(Xtrain), N=length(unlist(Ztrain)), time=out$time)

## n N time.elapsed

## 1000.000 5594.000 8583.767

phet

Sequential design

One step at a time

Model-based one-shot design is almost never appropriate in this setting.

It makes sense to slow down and take things one step at a time.

Choose the next point ( ) by exploring its impact on the predictive equations.

Designs are hyperparameter sensitive for homoskedastic processes,

which is exacerbated when additional variance processes are in play.

·

·

xN+1

20/31

IMSPE

A common criteria is integrated mean-square prediction error

IMSPE has a closed form as long as is an easily integrable domain, such as ahyperrectangle.

≡ IMSPE( , … , , ) = (x) dx.In+1 x1 xn xn+1 ∫x∈

σ 2n+1

In+1 = 𝔼{ (X)} = 𝔼{ (X, X) − (X (X)}σ 2n+1 Kθ kn+1 )⊤K −1

n+1kn+1

= 𝔼{ (X, X)} − tr( W)Kθ K −1n+1

where . E.g., in the Gaussian case,· = k( , x)k( , x) dxWij ∫x∈ xi xj

= exp{− }[erf{ }+ erf{ }] .Wij ∏k=1

d 2πθk‾ ‾‾‾‾√4

( −xi,k xj,k)2

2θk

2 − ( + )xi,k xj,k2θk‾ ‾‾‾√

+xi,k xj,k

2θk‾ ‾‾‾√

Gradient of facilitates optimization for sequential design.· ∣ , … ,In+1 x1 xn

21/31

https://arxiv.org/abs/1710.03206

A replicating relation

Binois, et al. (2018b) showed that the next point, , will be a replicate when

where and .

However, actually finding a replicate in practice is doubly challenged.

Binois, et al. (2018b) proposed an adaptive lookahead based implementation asremedy.

xN+1

r( )xN+1 ≥( ( ) − 2 ( ) +kn xN+1)⊤K −1n WnK −1n kn xN+1 w⊤n+1K

−1n kn xN+1 wn+1,n+1

tr( )Bk∗ Wn

− ( ),σ2n xN+1

= IMSPE( )k∗ argmin1≤k≤n xk =Bk( (Υ−1n ).,k Υ

−1n )k,.

−(τ2λk

( +1)ak akΥn)

−1k,k

1. Numerical precision of "continuous" optimizers in a discrete setting.

2. Myopic criteria (like IMSPE) from the perspective of replication.

22/31

https://arxiv.org/abs/1710.03206https://arxiv.org/abs/1710.03206

Simple example

fm

Starting up

Here is a fit to a small space-filling design.

Followed by a search via IMSPE with look-ahead over replication.

Update the fit to incorporate the new data.

fr

Repeat a bunch

Let's continue and gather a total of samples in this way.

Once that's done, gather a final prediction throughout the input space.

N = 500

for(i in 1:489) {

## find the next point and update

opt

Visualizing

plot(xgrid, p$mean, type="l", ylim=c(-8,17), xlab="x", ylab="y"); points(X, Y)

segments(mod$X0, rep(0, nrow(mod$X0))-8, mod$X0, (mod$mult-8)*0.65, col="gray")

lines(xgrid, qnorm(0.05, p$mean, sqrt(p$sd2 + p$nugs)), col=2, lty=2)

lines(xgrid, qnorm(0.95, p$mean, sqrt(p$sd2 + p$nugs)), col=2, lty=2)

26/31

Real example

Sequential ATO

The ato data object loaded earlier contains a second saved "hetGP"-class model

Design size and training time are quoted below.

The adaptive design has a higher proportion of unique locations

trained with an adaptive horizon IMSPE-based sequential design scheme.·

rbind(batch=c(n=nrow(out$X0), N=length(out$Z), time=out$time),

adapt=c(n=nrow(out.a$X0), N=length(out.a$Z), time=out.a$time))

## n N time.elapsed

## batch 1000 5594 8583.767

## adapt 1194 2000 38737.974

but still a nontrivial degree of replication,

resulting in many fewer overall runs of the expensive ATO simulator.

·

·

28/31

Out-of-sample comparison

With the same out-of-sample testing set from the previous score-based comparison

the code below calculates predictions and scores under IMSPE-based sequentialdesign.

·

phet.a

Summarizing

Thanks!

When the mean and the variance are changing non-linearly in the input space

The more heteroskedasticity the more replication.

Replication has the added benefit of yielding faster fitting of the GPs.

Our hetGP package, facilitating everything in this reproducible Rmarkdown talk, isavailable on CRAN.

it is still possible to get very accurate fits via coupled GPs,

and it can be advantageous to have replication in the design.

·

·

Intuitively, that must be true: both signal and noise are changing and replication isthe only reliable tool for separating the two.

·

A reproducible vignette can be found with package source.·

31/31
https://cran.r-project.org/web/packages/hetGP/vignettes/hetGP_vignette.pdf

5 hs olfdwhr u([s or uh - haslam college of business · 2019. 11. 19. · skhw suhglfw rxw ;whvw...

Documents