Approximate l -fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
Dr. Richard Edwards (UT, Amazon) Hao Zhang (UT) Dr. Joshua New (ORNL) Dr. Lynne Parker (UT)
2 Presentation name
Energy is the Defining Challenge of Our Time
• Buildings in U.S. – 41% of primary energy/carbon,
72% of electricity, 34% of gas
• Buildings in China – 60% of urban building floor
space in 2030 has yet to be built
• Buildings in India – 67% of all building floor space
in 2030 has yet to be built
Global energy consumption will increase 50% by 2030
“Upgrading the energy efficiency of America’s buildings is one of the fastest, easiest, and cheapest ways to save money, cut down on harmful pollution, and create good jobs…”
President Obama, December 2, 2011, while announcing Better Buildings Challenge
3 Presentation name
Figure 1. U.S. Primary energy consumption, 2006 Source: Building Energy Data Book, U.S. DOE, Prepared by D&R International, Ltd., September 2008.
4 Presentation name
The Autotune Idea Making building energy models more useful by calibrating them to data
.
.
.
E+ Input Model
5 Presentation name
ORNL High Performance Computing Resources
Multi-million dollar cost share and infrastructure on 6 supercomputers including the world’s fastest Currently use 128,000+ cores to run over 530,000 EnergyPlus simulations and write 45TB of data in 68 minutes
Jaguar: 224k cores, 360TB memory, 10PB of disk, 1.7 petaflops Cost: $104 million DOE BTO: 500k hours granted (CY12)
Nautilus: 1024 cores, shared-memory
DOE BTO: 30k hours granted (CY11) 200k hours granted (CY12) 150k hours (CY13)
Frost: 2048 SGI Altix; 136 nodes 200k hours granted (CY13)
Lens cluster: 77 nodes – 45x128GB, 32x 64GB with NVIDIA 880 and Tesla dual-GPU EVEREST visualization (CY13)
Gordon (12,608 cores): 250k hours (CY13)
Kraken (112,896 cores): 100k hours (CY13)
6 Presentation name
Titan fully utilized
7 Presentation name
Computational Complexity
E+ Input Model
Problems/Opportunities: Thousands of parameters per E+ input file We chose to vary 156 Brute-force = 5x1052 simulations
main_Tot None_Tot(
1) None_Tot(
2) HP1_in_To
t HP1_out_
Tot HP1_back
_Tot HP1_in_fa
n_Tot HP1_comp
_Tot HP2_in_To
t HP2_out_
Tot HP2_back
_Tot HP2_in_fa
n_Tot 1172.5 0 0 6.75 18.75 0 0 0 6.75 18 0 0
E+ parameters
The Universe: 13.75 billion years?
Need 2.8x1028 of those
8 Presentation name
MLSuite
Nautilus Supercomputer
• Matlab+packages, R, libSVM • Support Vector Machines • Genetic Algorithms • FF/Recurrent Neural Networks • (Non-)Linear Regression • Self-Organizing Maps • C/K-Means • Ensemble Learning
9 Presentation name
Big Data Opportunities • EnergyPlus - Whole building energy sim – 600k lines Fortran • Input: 1,000-3,000 parameters for a standard building
– Geometry, equipment, schedules, weather, ~8 properties/material – We vary a subset of these ~156
• Output: annual at 15 min intervals – ~35 MB csv file (35k rows, 96 fields)
• Four types of buildings – Residential – ZEBRAlliance house #1 : 5M simulations – Warehouses : 1M – Stand-alone retail : 1M – Medium office : 1M
• 8M simulations*35MB = 270TB, http://autotune.roofcalc.com
10 Presentation name
Richard’s slides
• Theoretical contributions to learners
11 Presentation name
Autotune calibration of building energy models
MLSuite - HPC-enabled suite of 12+ machine learning algorithms for large data mining
ASHRAE G14 Requires
Autotune Results
Using Monthly utility data
CV(RMSE) 30% 0.318% NMBE 10% 0.059%
Using Hourly utility data
CV(RMSE) 15% 0.483% NMBE 5% 0.067%
Autotune could have saved 2+ man-months of effort (over 2 calendar years) modeling 1 field demonstration building
Within 30¢/day (actual use $4.97/day)
Residential Commercial
Hourly – 8% Monthly – 15%
Average error of each input
parameter
12 Presentation name
Jibo Sanyal
Mahabir Bhandari Som
Shrestha
Joshua New Aaron Garrett
Buzz Karpay
Richard Edwards
The Autotune Team
http://autotune.roofcalc.com