nonparametric importance sampling for big dataasurtg/projects/rtgslidesnachtsheims18.pdfreal data...
TRANSCRIPT
![Page 1: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/1.jpg)
Nonparametric Importance Sampling for Big Data
Abigael C. Nachtsheim
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES
Research Training Group Spring 2018
Advisor: Dr. Stufken
![Page 2: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/2.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Motivation
• Goal: build a model that predicts well over the predictor space• Massive amounts of data increasingly available• Big data presents computational challenges• First step: some method of data reduction
2
![Page 3: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/3.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Data Reduction Overview
• Our data set consists of n observations• n is very large
• From the full data, select s observations• s << n• the s observations make up the subdata
• Carry out data analysis on subdata only
3
![Page 4: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/4.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Data Reduction Overview: Example
• Full data: 1 response, 9 predictors, 10,000,000 observations• n = 10,000,000
• Choose s = 5,000• Subdata: 1 response, 9 predictors, 5,000 observations
4
![Page 5: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/5.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Data Reduction Overview
5
Obs Y X1 X2 X3 X4 X5 X6 X7 X8 X9
1
2
3
4
5
6
7
8
9
10M
Obs Y X1 X2 X3 X4 X5 X6 X7 X8 X9
1
2
3
5K
![Page 6: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/6.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Data Reduction Overview
6
Obs Y X1 X2 X3 X4 X5 X6 X7 X8 X9
1
2
3
4
5
6
7
8
9
10M
Obs Y X1 X2 X3 X4 X5 X6 X7 X8 X9
1
2
3
5K
But how do we choose?
![Page 7: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/7.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Selecting Subdata: Approach 1
• Goal: Subdata that is similar to full data• Just take a simple random sample- Fast- Easy
• But this may not be the best sample for prediction
7
![Page 8: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/8.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Selecting Subdata: Approach 2
• Goal: select an optimal subsample- Determinant of information matrix- Mean square error for prediction
• Select subdata carefully to optimize some criterion• Improves properties of the estimator
8
![Page 9: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/9.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Approach 2: Some Methods
• Leverage-based subsampling• Shrinkage leveraging method• Unweighted leveraging estimator• Information-Based Optimal Subdata Selection (IBOSS)*
9
*Wang, H., Yang, M., & Stufken, J. (2017). Information-Based Optimal Subdata Selection for Big Data Linear Regression. Journal of the American Statistical Association
![Page 10: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/10.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Approach 2 Example: IBOSS
• Goal: maximize determinant of subdata information matrix
• Some nice properties- Unbiased estimators- Variance of estimators ! 0 as n ! ∞- Computationally efficient
10
![Page 11: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/11.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Approach 2 Example: IBOSS
• Drawback: assumes linear model
• With big data we may not be able to guess the underlying model
11
![Page 12: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/12.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Another Possibility?
• Nonparametric approach- We don’t know the underlying model
• Goal: spread the subdata out throughout full region
12
![Page 13: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/13.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Today’s Plan
1) Consider 2 new methods- Clustering- Space-filling design
2) Perform a simulation study to evaluate the methods
3) Conclusions
13
![Page 14: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/14.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
k-means Clustering
• Divide dataset into k initial clusters• Assign each point to cluster with nearest mean• Euclidean distance
• Update means• RepeatMinimizes within cluster sum of squares
14
![Page 15: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/15.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Potential Method 1: Clustering
• Cluster full data using k-means
• Choose subsample from clusters based on cluster characteristics
We consider two clustering sampling strategies
15
![Page 16: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/16.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Two Possible Strategies
1) Inversely proportional to density of cluster• Sparse cluster " sample (proportionally) more points• Dense cluster " sample (proportionally) fewer points
2) Equal subsample size from each cluster• Take s/k points from each cluster
Both are attempts at selecting subsample uniformly from the full sample
16
![Page 17: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/17.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Space Filling Designs
• Spread design points through experimental region
• Used when form of underlying model is unknown
17
![Page 18: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/18.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Some Examples
• Sphere Packing Design• Uniform Design• Fast Flexible Filling Design• Latin Hypercube Design*
18
*McKay, M., Beckman, R., & Conover, W. (1979). Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics, 21(2), 239-245.
![Page 19: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/19.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Potential Method 2: Design
• Construct Latin hypercube design with k points• Cluster full data around these points• Sample equally from each cluster
19
![Page 20: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/20.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim 20
Simulation Study: Generate X
• One dimensional, mixture of Normals, n = 1000• Z1 ~ N(-100, 10,000)• Z2 ~ N(300, 1)• wi ~ Bernoulli(0.1)
Xi = wi*Z1 + (1 – wi)*Z2
![Page 21: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/21.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim 21
Simulation Study: Generate Y
• E(Yi | Xi ) = -0.002222 * Xi 2• Y(Xi ) = E(Yi | Xi ) + 30*εiεi = independent standard normal errors
![Page 22: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/22.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Simulation Study Analysis
For each of 1000 data sets with n = 1000:• Select subdata, s = 50 using each method- Simple random sample- IBOSS- Cluster with inverse proportional size, k = 5- Cluster with equal size, k = 5- Space-filling design, k = 5
22
![Page 23: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/23.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Simulation Study Analysis
• Using subdata only, estimate a model- Use OLS- Fit quadratic model
• Compute integrated predicted mean squared error
23
![Page 24: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/24.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Simulation Results
24
10% of the data is here
![Page 25: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/25.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Simulation Results
25
90% of the the data is here10% of the data is here
![Page 26: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/26.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Simulation Results
26
90% of the the data is here10% of the data is here
This is the true response:Y = -0.002222*X2
![Page 27: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/27.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Simple Random Sample
27
![Page 28: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/28.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
IBOSS
28
![Page 29: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/29.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Cluster: Equal Sizes
29
![Page 30: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/30.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Cluster: Inverse Proportional Sizes
30
![Page 31: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/31.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Space-filling Design
31
![Page 32: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/32.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Full Data
32
![Page 33: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/33.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Toy Example: Results
33
Method Predicted RMSE
Simple Random Sample
59,498
IBOSS
25.76
Cluster: Inverse Prop.
12.46
Space-Filling Design
9.33
Cluster: Equal
9.31
Full Data
4.97
![Page 34: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/34.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Toy Example: Results
34
Method Predicted RMSE
Simple Random Sample
59,498
IBOSS
25.76
Cluster: Inverse Prop.
12.46
Space-Filling Design
9.33
Cluster: Equal
9.31
Full Data
4.97
![Page 35: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/35.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Example with Real Data
• n = 4.2 million • p = 15• 1 continuous response• Used in the IBOSS paper
35
![Page 36: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/36.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Example with Real Data
• Construct subdata of size s = 2,000• Consider 4 methods:- Simple random sample- IBOSS- Space-filling design- Cluster: Equal
36
![Page 37: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/37.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Example with Real Data
• Fit two models- First-order linear model (as in IBOSS paper)- Second-order linear model
• Compute holdout predicted mean squared error
37
![Page 38: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/38.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Real Data Results: First-Order Model
Method Predicted MSEIBOSS 434.56Simple random sample 0.0106Cluster: Equal 0.0118Space-filling design 0.0148
38
Using 2,000 observations
Using 4.2 million observations
Predicted MSE from the full data: 0.0105
![Page 39: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/39.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Real Data Results: Second-Order Model
Method Predicted MSEIBOSS 90,545.1Simple random sample 0.0085Cluster: Equal 0.0053Space-filling design 0.0038
39
Using 2,000 observations
Using 4.2 million observations
Predicted MSE from the full data: 0.0022
![Page 40: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/40.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Real Data Results: Second-Order Model
Method Predicted MSEIBOSS 90,545.1Simple random sample 0.0085Cluster: Equal 0.0053Space-filling design 0.0038
40
Using 2,000 observations
Using 4.2 million observations
Predicted MSE from the full data: 0.0022
![Page 41: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/41.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Preliminary Conclusions
• We can spread points uniformly using clustering and space-filling methods • If goal is prediction: clustering and space-filling methods as good or better than simple random sample• Space-filling design method performs best with quadratic model
41
![Page 42: Nonparametric Importance Sampling for Big Dataasurtg/Projects/RTGSlidesNachtsheimS18.pdfReal Data Results: First-Order Model Method Predicted MSE IBOSS 434.56 Simple random sample](https://reader035.vdocument.in/reader035/viewer/2022071216/6048885a4823c023001a7e76/html5/thumbnails/42.jpg)
SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Abigael C. Nachtsheim
Future work
42
1) More extensive simulation study involving• Different sizes of k• Different underlying models
2) Explore alternative methods to choose seed points• Fast Flexible Filling Design• Uniform random sample
3) Nearest neighbor to seed points rather than cluster4) Consider large sample properties