brian reich north carolina state university june, 2019bjreich/talks/deepdensereg.pdf · 2019. 6....
TRANSCRIPT
Density regression via deeplearning
Brian Reich
North Carolina State University
June, 2019
Brian Reich, NC State Density regression using deep learning 1 / 55
Collaborators
This is largely work of PhD students Neal Grantham and Rui Li
Others: Howard Bondell, Eric Laber, Krishna Pacifici, Rob Dunn
Brian Reich, NC State Density regression using deep learning 2 / 55
Density regression
I Most statistical analyses are mean regressions:
E(Y |X) =
p∑j=1
Xjβj
I This is a reasonable first-order approximation and leads toa simple and interpretable model
I Density regression allows the entire distribution of theresponse to depend on covariates
I For example, a covariate might affect the mean, variance,skewness, etc.
I This provides a more comprehensive study of covariateeffects and more realistic prediction distributions
Brian Reich, NC State Density regression using deep learning 3 / 55
Density regression
I Density regression is more challenging to fit that meanregression
I This is especially true when there are many covariates
I In this talk we evoke deep learning for density regression
I Deep learning is great for prediction but poor for inference
I We apply this method to two environmental applicationswhere the objective is prediction
Brian Reich, NC State Density regression using deep learning 4 / 55
Application 1: Geolocation using microbiome data
I The microbiome is the community of microbial organisms
I Sequencing technology makes it possible to affordablyidentify microbes
I Our collaborators collected dust samples from the outerdoor frames of 1,300 homes in the US
I DNA sequencing revealed 50,000 fungal taxa
I Can we use the microbiome of the sample (X) to predict itslocation of origin (Y )?
Brian Reich, NC State Density regression using deep learning 5 / 55
Application 1: Geolocation using microbiome data
Brian Reich, NC State Density regression using deep learning 6 / 55
Application 1: Geolocation using microbiome data
Brian Reich, NC State Density regression using deep learning 7 / 55
Application 2: Solar energy forecasting
I Solar and wind energy forecasting is big business
I We use recent meteorology and numerical forecasts forshort-term prediction
I Stochastic forecasts that assess uncertainty are crucial
I This allows for
I prediction of features like exceeding thresholds
I propagating uncertainty into economic models
Brian Reich, NC State Density regression using deep learning 8 / 55
Application 2: Solar energy forecasting
Brian Reich, NC State Density regression using deep learning 9 / 55
Density regression with deep learning
I In both problems there are many predictors that mightaffect the predictive distribution
I We use the same general strategy for both problems:
1. Randomly partition the prediction domain
2. Train a deep learning classifier on the partitions
3. Repeat many times and aggregate
Brian Reich, NC State Density regression using deep learning 10 / 55
Forensic geolocation model
I Let Y ∈ D ⊂ R2 be the spatial location of a sample
I The predictors X are the p binary indicators of thepresence of each taxa in the sample
I To approximate the density of Y |X, we randomly partitionthe spatial domain into K tiles
I Let v1, ..., vKiid∼ Uniform(D) be “seeds” that define tiles
Pk = {s; ||s− vk || < ||s− vl || for all l 6= k}
I Y is reduced to the label g where g = k means Y ∈ Pk
Brian Reich, NC State Density regression using deep learning 11 / 55
A random partition
Brian Reich, NC State Density regression using deep learning 12 / 55
Forensic geolocation model
I We then regress the labels g onto X using a multi-classclassification algorithm
I Deep learning turned out to be the best classifier
I Let π̂k (X) be the fitted probability of Y ∈ Pk given X
I Assuming a uniform density within each tile gives thepredictive density
p(y |X) =K∑
k=1
π̂k (X)1|Pk |
I(y ∈ Pk )
where |Pk | is the area of tile k
Brian Reich, NC State Density regression using deep learning 13 / 55
Forensic geolocation model
I Pro: Can be fit quickly with standard software
I Cons: Reliance on the number and configuration of thetiles and the predictive density is discontinuous
I Solution: Repeat many times with a different number ofrandom seeds and average the predictive densities
I We call this method “Deep space”
I Properties: As J,K →∞ it can approximate anycontinuous conditional density function
Brian Reich, NC State Density regression using deep learning 14 / 55
The deep space algorithm
For j = 1, ..., J
1. Draw the number of tiles Kj ∼ Uniform(a,b)
2. Draw the seeds vj1, ..., vjKj ∼ Uniform(D)
3. Train a classifier to obtain tile probabilities π̂j1(X), ..., π̂jKj (X)
The final predictive density is
p(y |X) =1J
J∑j=1
Kj∑k=1
π̂jk (X)1|Pjk |
I(y ∈ Pjk )
Brian Reich, NC State Density regression using deep learning 15 / 55
National analysis
We compared the following models using cross-validation:
1. NN: Nearest neighbors analysis
2. RF: Geolocation using random forests
3. Net: Geolocation using a shallow neural network
4. DeepSpace (DS): Geolocation using a deep neuralnetwork with three hidden layers
5. State DNN: A deep neural network with US states as tiles
6. BDA: Naive Bayes classifier based on kernel-smoothedoccurrence probability for each taxa
Methods 2-4 use K ∼ Uniform(0.05n,0.50n)
Brian Reich, NC State Density regression using deep learning 16 / 55
Bayesian discriminant analysis (BDA)
Brian Reich, NC State Density regression using deep learning 17 / 55
As seen on TV!
Brian Reich, NC State Density regression using deep learning 18 / 55
Cross-validation results
Median Area match (%)Model error (km) Coverage State County CityDeepSpace 97.8 96.3 60.2 23.6 19.4Net 113.3 94.3 58.2 23.9 19.7State DNN 211.0 - 57.1 - -RF 213.7 98.6 47.6 17.0 14.2NN 247.9 90.0 44.6 14.6 12.1BDA 263.7 91.0 31.9 1.6 0.8
Brian Reich, NC State Density regression using deep learning 19 / 55
Average errors for deep space
Brian Reich, NC State Density regression using deep learning 20 / 55
Regional analysis
I We also the n = 116 samples from central North Carolinacounties of Wake, Durham and Orange
I By focusing on a small geographic area we can isolate theability of the models to predict the origin of a sample whenbiogeographic differences are held relatively constant
I In this analysis we seek to determine if there is a limit tothe resolution that one may geolocate samples usingfungal occupancy data
Brian Reich, NC State Density regression using deep learning 21 / 55
Cross-validation results
Median Area match (%)Model error (km) Coverage County CityCounty DNN 18.0 - 53.4 -Net 19.2 90.5 49.1 25.9BDA 19.5 90.5 40.5 19.0DeepSpace 20.0 90.5 40.5 18.1RF 20.2 93.1 36.2 19.0NN 20.4 84.5 43.1 24.1
Brian Reich, NC State Density regression using deep learning 22 / 55
Global analysis
I With the generous funding of the US Department ofDefense we gathered n = 399 samples from 28 countries
I The data span Eastern Europe, Middle East, Africa, Asia,Oceania, and the Americas
I There were 10− 20 sampling locations in each country.
I Samples within each country often stem from a singlemajor city
I We therefore compared the models only on their ability todetect a sample’s country of origin
Brian Reich, NC State Density regression using deep learning 23 / 55
Cross-validation results
Model Classification accuracyDeepSpace 89.5%Country DNN 84.7%Net 84.2%RF 74.9%NN 62.7%
Brian Reich, NC State Density regression using deep learning 24 / 55
Deep space confusion matrix
Brian Reich, NC State Density regression using deep learning 25 / 55
Summary
I The method works well at continental and global scales,but not at regional scales
I We have worked with the Department of Defense toimplement this method
I Future work is to:I Incorporate covariates such as climate and land-coverI Analyze samples of mixed originI Generalize and study theory (next!)
Brian Reich, NC State Density regression using deep learning 26 / 55
Non-spatial applications
I Geolocation is an important but narrow problem
I After completing this we began generalizing the method tonon-spatial problems
I In general density regression we have univariate responseY , scaled so Y ∈ [0,1]
I We apply the deep space algorithm to regress the densityof Y onto covariates X
I We call this the Deep Density Regression (DDR) method
Brian Reich, NC State Density regression using deep learning 27 / 55
The DDR algorithm
For j = 1, ..., J
1. Draw the K − 1 cutpoints vjkiid∼ Uniform(0,1)
2. Sort the cutpoints so yjk = vj(k) (yj0 = 0 and yjK = 1)3. Assign observations to bins, g = k if Y ∈ (yk−1, yk ) = Pjk
4. Train a classifier to obtain tile probabilities π̂j1(X), ..., π̂jK (X)
The final predictive density is
p(y |X) =1J
J∑j=1
K∑k=1
π̂jk (X)1|Pjk |
I(y ∈ Pjk )
Brian Reich, NC State Density regression using deep learning 28 / 55
Loss function 1 - Multiclass regression
I Parameterize the bin probabilites as πk (X;θ)
I We model πk using a deep neural network with softmax(multinormal logistic) link
I In this model θ includes the biases and weights
I These parameters are estimated to minimize
−n∑
i=1
K∑k=1
I(yk−1 < Yi < yk ) log[πk (Xi ;θ)]
I Then π̂k (X) = πk (X; θ̂)
Brian Reich, NC State Density regression using deep learning 29 / 55
Loss function - Binary cross-entropy loss
I The previous loss ignores bin ordering and is sensitive to K
I Denote the CDF as
Prob(Y ≤ yk |X) = Fk (X;θ) =k∑
l=1
πl(X;θ)
and F̄k (X;θ) = 1− Fk (X;θ)
I The loss function is then
−K∑
k=1
n∑i=1
{I(Yi ≤ yk ) log[Fk (Xi ;θ)]+I(Yi > yk ) log[F̄k (Xi ;θ)]}
I As before, π̂k (X) = πk (X; θ̂)
Brian Reich, NC State Density regression using deep learning 30 / 55
Theory without covariates
I Theory has already been worked out for the histogramwithout covariates (e.g., Wasserman 2013)
I Assume the true PDF has bounded second derivative
I The fixed-bin histogram is consistent if n,K →∞ andK/n→ 0
I The optimal number of bins is K = O(n1/3)
I This holds for the random histogram without covariates
Brian Reich, NC State Density regression using deep learning 31 / 55
Theory with covariates
I Assume there are K fixed and equally-sized binsI The true probability in bin k is πk (X) =
∫Pk
f (y |X)dyI Let π̂k (X) be an estimator of πk (X)
I Assume1. f (y |X) has bounded second derivative2. n,K →∞ and K/n→ 03. Bias(π̂k (X)) = o(1/K ) for all k4. Var(π̂k (X)) = o(1/K 2) for all k
I Then the conditional density estimator
f (y |X) =K∑
k=1
1|Pk |
I(y ∈ Pk )π̂k (X)
is consistent
Brian Reich, NC State Density regression using deep learning 32 / 55
Theory with covariates
I This theorem is model agnostic
I If we assume that
πk (X) ≈ exp(XTβk )∑Kl=1 exp(XTβl)
the parametric multinomial logistic linear model with fixednumber of covariates satisfies the theorem’s conditions
I We are still exploring the theoretical properties of deeplearning
Brian Reich, NC State Density regression using deep learning 33 / 55
Simulation study
I We compare fixed and random histograms with the twoloss function for various number of breakpoints K
I We also compare with quantile random forests
I In all cases there are n = 6,000 training observations
I Models are compared using the continuous rank probabilityscore (CRPS) averaged over 1,000 test set observations
I Coverage is not included here, but is close to the nominallevel for all methods with sufficiently large K
Brian Reich, NC State Density regression using deep learning 34 / 55
Simulation study 1 - mixture of nonlinear regressions
Y = [sin(X1) + ε1]π1 + [2 sin(1.5X1 + 1) + ε2](1− π1)
I X1 ∼ Uniform(0,10)
I π1 ∼ Bernoulli(0.5)
I ε1 ∼ Normal(0,0.09)
I ε2 ∼ Normal(0,0.64)
Brian Reich, NC State Density regression using deep learning 35 / 55
Fitted PDF - Fixed bins and X = 2
Brian Reich, NC State Density regression using deep learning 36 / 55
Fitted PDF - Random bins and X = 2
Brian Reich, NC State Density regression using deep learning 37 / 55
Fitted PDF - Random bins and X = 5
Brian Reich, NC State Density regression using deep learning 38 / 55
Fitted PDF - Random bins and X = 8
Brian Reich, NC State Density regression using deep learning 39 / 55
Fitted CDF - Random bins and X = 2
Brian Reich, NC State Density regression using deep learning 40 / 55
Fitted CDF - Random bins and X = 5
Brian Reich, NC State Density regression using deep learning 41 / 55
Fitted CDF - Random bins and X = 8
Brian Reich, NC State Density regression using deep learning 42 / 55
Simulation study 1 - Mixture of nonlinear regressions
Brian Reich, NC State Density regression using deep learning 43 / 55
Simulation study 2 - Heteroskedastic linear model
Y |β1,β2 ∼ Normal(
XTβ1, exp(XTβ2))
I X1, ...,X5iid∼ Normal(0,1)
I β1 ∼ Normal(0, I5)
I β2 ∼ Normal(0,0.45I5)
Brian Reich, NC State Density regression using deep learning 44 / 55
Simulation study 2 - Heteroskedastic linear model
Brian Reich, NC State Density regression using deep learning 45 / 55
Simulation study 3 - Mixture of nonlinear regressions
Y = [10 sin(2πX1X2) + 10X4 + ε1]π1
+[20(X3 − 0.5)2 + 5X5 + ε2
](1− π1)
I X1, ...,X10iid∼ Uniform(0,1)
I π1 ∼ Bernoulli(0.5)
I ε1 ∼ Normal(0,2.25)
I ε2 ∼ Normal(0,1)
Brian Reich, NC State Density regression using deep learning 46 / 55
Simulation study 3 - Mixture of nonlinear regressions
Brian Reich, NC State Density regression using deep learning 47 / 55
Simulation study 4 - Nonlinear non-Gaussian model
Y = 10 sin(2πX1X2) + 20(X3 − 0.5)2 + 10X4 + 5X5 + ε
I X1, ...,X10iid∼ Uniform(0,1)
I ε ∼ SkewNormal(0,1,−5)
Brian Reich, NC State Density regression using deep learning 48 / 55
Simulation study 4 - Nonlinear non-Gaussian model
Brian Reich, NC State Density regression using deep learning 49 / 55
Summary of the simulation study
I The random histogram is slightly better than fixed bins
I The binary cross entropy loss function is far superior to themulti-class loss function
I In particular, the binary cross entropy loss is much lesssensitive to K
I Coverage of predictive intervals has the nomial coveragefor large K
Brian Reich, NC State Density regression using deep learning 50 / 55
Solar power forecasting
I Global Energy Forecasting Competition 2014 was an IEEEsponsored competition on probabilistic forecasting
I The n = 25K responses Y are the amount of solar powergeneration
I The competition organizers normalized Y ∈ [0,1]
I There are 45 covariates X includingI Solar irradiance, temperature, wind speed and direction,
relative humidity, air pressure, etc.I Dummy variables for hour of the day and seasonI Dummy variables for the solar farm ID
Brian Reich, NC State Density regression using deep learning 51 / 55
Solar power forecasting cross validation
Brian Reich, NC State Density regression using deep learning 52 / 55
Solar power forecasting cross validation
Brian Reich, NC State Density regression using deep learning 53 / 55
Example forecasts
The model was trained at the beginning of each month
Brian Reich, NC State Density regression using deep learning 54 / 55
Summary
I Density regression is under-utilized
I DDR is simple to implement using existing software
I We are working on a python package
I DDR prediction intervals for deep learning
I This work was supported by the US DOD
Brian Reich, NC State Density regression using deep learning 55 / 55