5 hs olfdwhr u([s or uh - haslam college of business · 2019. 11. 19. · skhw suhglfw rxw ;whvw...

31
Replicate or Explore? DAE UTK, October 2019 Robert B. Gramacy ( [email protected] : http://bobby.gramacy.com) Department of Statistics, Virginia Tech Joint with Mickaël Binois (INRIA), Jiangeng Huang (VT), Mike Ludkovski (UCSB) and Chris Franck (VT)

Upload: others

Post on 27-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Replicate or Explore?DAE UTK, October 2019

    Robert B. Gramacy ([email protected] : http://bobby.gramacy.com) Department of Statistics, Virginia Tech

    Joint with Mickaël Binois (INRIA), Jiangeng Huang (VT),Mike Ludkovski (UCSB) and Chris Franck (VT)

    mailto:[email protected]://bobby.gramacy.com/

  • Stochastic simulation

    Increasingly, data in geostatistics, machine learning, and computer simulationexperiments involve signal-to-noise ratios which

    Stochastic (computer) simulators, from physics, business and epidemiology mayexhibit both of those features simultaneously,

    With noisy processes, more samples are needed to isolate the signal.

    What can be done?

    may be low

    and/or possibly changing over the input space.

    ·

    ·

    but lets start with the first (walk before we run).·

    The canonical GP surrogate buckles under the weight of even modestly big data.

    We must decompose an matrix, to obtain and , at cost.

    ·

    · N × N K −1N | |KN ( )N3

    2/31

  • Replication

  • Woodbury trick

    Replication can be a powerful device for separating signal from noise, and can yieldcomputational savings as well.

    Long story short, if outputs are observed at unique inputs, then the Woodburyidentity provides sufficient statistics rather than .

    Unlike alternatives such as stochastic kriging (Ankenman, et al., 2010), noapproximations are made or asymptotic arguments required.

    The details (Binois, et al., 2018a) are for another time, but …

    N n(n) (N)

    This leads to matrices in likelihood and predictive equations,

    and thus dramatically faster inference (via cubic decompositions) when .

    · n × n

    · N ≫ n

    Identities abound and derivatives (for MLEs) are analytic.·

    4/31

    http://www.stochastickriging.net/https://arxiv.org/abs/1611.05902

  • For example

    … consider

    unique input locations, expanding to when

    each has a random number of replicates .

    · n = 100 N ≈ 2500· ∼ Unif{1, 2, … , 50}ai

    Xbar

  • Comparing full- to unique-

    Essentially no difference on lengthscale hyperparameter estimates …

    … but a big difference in time!

    N n

    library(hetGP)

    Lets use our new hetGP package to fit "full- " and "unique- " GPs.· N n

    Lwr

  • Heteroskedasticity

  • GPs disappoint on the motorcycle data.

    library(MASS); hom

  • Latent GP noise

    Although there are many appealing methods for heteroskedastic GP modeling,predominantly from the machine learning literature (e.g., Goldberg, et al., 1998),

    The key ingredient, of latent noise variables stored diagonally in ,kept smooth under a (log) GP prior, has merit.

    Again, the details (Binois, et al., 2018a) are for another time, but …

    most are impractical (MCMC), except on the smallest data sets.·

    , , … ,δ1 δ2 δn Δn

    Those latents are easily subsumed into the (Woodbury) MLE framework.

    Maximization is facilitated by closed form derivatives with respect to the latent values, all in time.

    ·

    ·Δn ( )n

    3

    9/31

    https://papers.nips.cc/paper/1444-regression-with-input-dependent-noise-a-gaussian-process-treatment.pdfhttps://arxiv.org/abs/1611.05902

  • Full mean and noise inference

    … lets check out an example via hetGP:

    Predictions and summary stats to visualize on the next slide.

    het2

  • par(mfrow=c(1,2))

    plot(mcycle$times, mcycle$accel, ylim=c(-160,90), ylab="acc", xlab="time")

    lines(Xgrid, p2$mean, col=2, lwd=2)

    lines(Xgrid, ql, col=2, lty=2); lines(Xgrid, qu, col=2, lty=2)

    plot(Xgrid, p2$nugs, type="l", lwd=2, ylab="s2", xlab="time", ylim=c(0,2e3))

    points(het2$X0, sapply(find_reps(mcycle[,1],mcycle[,2])$Zlist, var), col=3, pch=20)

    11/31

  • Real examples

  • Example: Epidemics management

    One method of studying disease outbreak dynamics is based on stochasticcompartmental modeling,

    Here we consider the total number of newly infected individuals

    under continuous time Markov dynamics with transitions and ,solved with Monte Carlo (Hu et al., 2015).

    for example, so-called Susceptible, Infected & Recovered (SIR) models.·

    f (x) := 𝔼{ − ∣ ( , , ) = x} = γ𝔼{ dt ∣ x}S0 limT→∞ ST S0 I0 R0 ∫∞

    0

    It

    S + I → 2I I → R

    The inputs are in 2d: ;

    the resulting surface is heteroskedastic

    and has some very-high noise regions.

    · x = ( , )S0 I0

    ·

    ·

    13/31

    https://arxiv.org/abs/1509.00980

  • Consider a space-filling design of size unique runs, with a random numberof replicates , for .

    The hetGP package provides sirEval, returning the expected number of infecteds atthe end of the simulation.

    n = 200∈ {1, … , 100}ai i = 1, … , n

    Xbar

  • SIR hetGP fit

    To help with the visuals on the next slide, the code below creates a dense grid in 2Dand calls the predict method on the "hetGP"-class fit object.

    fit

  • par(mfrow=c(1,2), mar=c(4,4,2,1)); cols

  • Example: assemble to order

    Inventory management simulator (Hong & Nelson, 2006); ; .

    Proper scores (Gneiting & Raftery, 2007) to measure mean accuracy (MSE) relative topredicted variance (larger is better).

    d = 8 N ≈ 5000

    17/31

    http://mason.gmu.edu/~jxu13/ISC/HongNelsonCOMPASS.pdfhttp://amstat.tandfonline.com/doi/abs/10.1198/016214506000001437

  • The hetGP package has a saved fit on space-filling training/testing data from thisexperiment.

    Reproducing the comparison on proper scores:

    data("ato")

    c(n=nrow(Xtrain), N=length(unlist(Ztrain)), time=out$time)

    ## n N time.elapsed

    ## 1000.000 5594.000 8583.767

    phet

  • Sequential design

  • One step at a time

    Model-based one-shot design is almost never appropriate in this setting.

    It makes sense to slow down and take things one step at a time.

    Choose the next point ( ) by exploring its impact on the predictive equations.

    Designs are hyperparameter sensitive for homoskedastic processes,

    which is exacerbated when additional variance processes are in play.

    ·

    ·

    xN+1

    20/31

  • IMSPE

    A common criteria is integrated mean-square prediction error

    IMSPE has a closed form as long as is an easily integrable domain, such as ahyperrectangle.

    ≡ IMSPE( , … , , ) = (x) dx.In+1 x1 xn xn+1 ∫x∈

    σ 2n+1

    In+1 = 𝔼{ (X)} = 𝔼{ (X, X) − (X (X)}σ 2n+1 Kθ kn+1 )⊤K −1

    n+1kn+1

    = 𝔼{ (X, X)} − tr( W)Kθ K −1n+1

    where . E.g., in the Gaussian case,· = k( , x)k( , x) dxWij ∫x∈ xi xj

    = exp{− }[erf{ }+ erf{ }] .Wij ∏k=1

    d 2πθk‾ ‾‾‾‾√4

    ( −xi,k xj,k)2

    2θk

    2 − ( + )xi,k xj,k2θk‾ ‾‾‾√

    +xi,k xj,k

    2θk‾ ‾‾‾√

    Gradient of facilitates optimization for sequential design.· ∣ , … ,In+1 x1 xn

    21/31

    https://arxiv.org/abs/1710.03206

  • A replicating relation

    Binois, et al. (2018b) showed that the next point, , will be a replicate when

    where and .

    However, actually finding a replicate in practice is doubly challenged.

    Binois, et al. (2018b) proposed an adaptive lookahead based implementation asremedy.

    xN+1

    r( )xN+1 ≥( ( ) − 2 ( ) +kn xN+1)⊤K −1n WnK −1n kn xN+1 w⊤n+1K

    −1n kn xN+1 wn+1,n+1

    tr( )Bk∗ Wn

    − ( ),σ2n xN+1

    = IMSPE( )k∗ argmin1≤k≤n xk =Bk( (Υ−1n ).,k Υ

    −1n )k,.

    −(τ2λk

    ( +1)ak akΥn)

    −1k,k

    1. Numerical precision of "continuous" optimizers in a discrete setting.

    2. Myopic criteria (like IMSPE) from the perspective of replication.

    22/31

    https://arxiv.org/abs/1710.03206https://arxiv.org/abs/1710.03206

  • Simple example

    fm

  • Starting up

    Here is a fit to a small space-filling design.

    Followed by a search via IMSPE with look-ahead over replication.

    Update the fit to incorporate the new data.

    fr

  • Repeat a bunch

    Let's continue and gather a total of samples in this way.

    Once that's done, gather a final prediction throughout the input space.

    N = 500

    for(i in 1:489) {

    ## find the next point and update

    opt

  • Visualizing

    plot(xgrid, p$mean, type="l", ylim=c(-8,17), xlab="x", ylab="y"); points(X, Y)

    segments(mod$X0, rep(0, nrow(mod$X0))-8, mod$X0, (mod$mult-8)*0.65, col="gray")

    lines(xgrid, qnorm(0.05, p$mean, sqrt(p$sd2 + p$nugs)), col=2, lty=2)

    lines(xgrid, qnorm(0.95, p$mean, sqrt(p$sd2 + p$nugs)), col=2, lty=2)

    26/31

  • Real example

  • Sequential ATO

    The ato data object loaded earlier contains a second saved "hetGP"-class model

    Design size and training time are quoted below.

    The adaptive design has a higher proportion of unique locations

    trained with an adaptive horizon IMSPE-based sequential design scheme.·

    rbind(batch=c(n=nrow(out$X0), N=length(out$Z), time=out$time),

    adapt=c(n=nrow(out.a$X0), N=length(out.a$Z), time=out.a$time))

    ## n N time.elapsed

    ## batch 1000 5594 8583.767

    ## adapt 1194 2000 38737.974

    but still a nontrivial degree of replication,

    resulting in many fewer overall runs of the expensive ATO simulator.

    ·

    ·

    28/31

  • Out-of-sample comparison

    With the same out-of-sample testing set from the previous score-based comparison

    the code below calculates predictions and scores under IMSPE-based sequentialdesign.

    ·

    phet.a

  • Summarizing

  • Thanks!

    When the mean and the variance are changing non-linearly in the input space

    The more heteroskedasticity the more replication.

    Replication has the added benefit of yielding faster fitting of the GPs.

    Our hetGP package, facilitating everything in this reproducible Rmarkdown talk, isavailable on CRAN.

    it is still possible to get very accurate fits via coupled GPs,

    and it can be advantageous to have replication in the design.

    ·

    ·

    Intuitively, that must be true: both signal and noise are changing and replication isthe only reliable tool for separating the two.

    ·

    A reproducible vignette can be found with package source.·

    31/31

    https://cran.r-project.org/web/packages/hetGP/vignettes/hetGP_vignette.pdf