data centric hpc for numerical weather forecasting
DESCRIPTION
Presentation at the HPC for Big Data Workshop in the 2014 International Conference on Parallel Processing. Paper is published by IEEE in the Proceedings of the 2014 ICCPW.TRANSCRIPT
![Page 1: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/1.jpg)
DATA CENTRIC HPC FOR NUMERICAL
WEATHER FORECASTING
James Faeldon
Delfin Jay Sabido III
Karen España
IBM Philippines, STG Labs
![Page 2: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/2.jpg)
Extreme Weather Events
• The Philippines is home to devastating typhoons.
• 19 typhoons a year and intense monsoon rains that can
cause widespread flooding.
• Research collaboration by the Philippine Government,
University of the Philippines and IBM (2013).
P The strongest typhoons group
near the Philippines
Image courtesy of NOAA
Typhoon Tracks Eastern Hemisphere
Before After
Super Typhoon Haiyan (Nov 2013)
Image courtesy of DigitalGlobe
![Page 3: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/3.jpg)
Coupled Models for Pre-Disaster Planning
Numerical weather model
forecasts typhoon track and intensity
Machine learning model predicts
affected population and damages
Optimization model recommends
relief supplies pre-positioning and
allocation
Typhoons can be forecasted a few days in advance.
But we need more reports, better visualization and data
exploration tools to reduce analysis cycles and facilitate
timely decisions.
Operations Center
![Page 4: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/4.jpg)
Operational Forecasting Schedule Runs
Data-Intensive
Compute-Intensive
Data-Intensive processes increasingly becoming the
bottleneck in operational forecasting workflow.
![Page 5: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/5.jpg)
Drivers for Increased Data Processing
Analytics Big Data
![Page 6: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/6.jpg)
Operational Forecasting Data Challenges
Quality Control Sampling
Verification Machine Learning
Ensemble Forecasts
Update relief operations plan based on new forecast
+ 7 historical days
663 Gb per forecast
Model Output
Statistics
6-hour
processing
and
analysis
window
ETL
Source Qty Unit Size Total Size
AWS 733 7Kb/day 5Mb/day
Satellite 1 480Mb/day 480Mb/day
Radar 7 9Gb/day 63Gb/day
Real-time Sensor Data
Res Cells Grid Cells Total Size
12km 5.2 M 307 x 481 x 35 81Gb/forecast
4km 8.8 M 619 x 406 x 35 138Gb/forecast
Forecast Data
![Page 7: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/7.jpg)
Project Goals
• Manage and process data arriving in time-sensitive
remote sensors and weather forecasts.
• Reduce data analysis cycles to facilitate timely decisions.
![Page 8: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/8.jpg)
Numerical Weather Model
Post-Processing
MapReduce, NoSQL Database
Stream Pre-Processing Date Warehouse, OLAP Database
Weather Sensors
Observations Structured Data
Data A
ssimilatio
n
Fo
reca
st D
ata
1 Remote sensor data
in various format.
2 Quality Control,
Interpolation,
Sampling, Filtering,
Classification
3 High Performance
Computing
4 Store structured and
unstructured data for
analysis and post-
processing
5 Business
intelligence, data
mining,
visualization,
verification 6 Dashboards and Reports
Automated End-to-End Process
Decision Support Tool
Reports
![Page 9: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/9.jpg)
Hardware Infrastructure
Traditional HPC
(BlueGene/P)
Commodity Servers
(x86)
Elastic
Cloud Computing
(Virtual Machines)
In-situ Big Data
MapReduce
Real-time
Data Processing
OLAP
Visualization
Numerical Weather
Models
MPP Jobs
![Page 10: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/10.jpg)
Weather Model
• WRF ARW v3.5 limited area model
• 3.4 hours using 2048 cores
BlueGene/P (850Mhz).
10
![Page 11: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/11.jpg)
Pre-Processing • Stream Processing, ETL, R, Python
• Multi-stage quality control of remote sensor data.
• Spatio-temporal interpolation and sampling.
• Star-schema data warehouse.
• NoSQL with MapReduce.
NetCDF,
Image,
CSV
Staging
Files
Low-latency
Stream
Processing
ETL
Custom Scripts NoSQL
Data Warehouse BI Cubes
Observations,
Forecast Raw
Data
Quality
Control,
Sampling,
Filtering
Structured point or topological data (small <1TB),
emphasis on data consistency.
Gridded high-resolution data (big >1TB), emphasis
on availability and scalability. Input to coupled
models down the line.
Data stores for post
processing…
![Page 12: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/12.jpg)
Post Processing
• Business Intelligence Cubes • Multi-dimensional analysis
• Dashboards and reports
• GIS Integration
• MapReduce Views (NoSQL) • Model Verification
• Ensemble Forecasts/MOS
• Ad-Hoc Data Mining
Multi-Dimensional Cubes
MapReduce Views
Reports and Dashboards Reports and visualization generated using BI and data visualization tools
Custom Scripts Coupled Models Model Output
Statistics Reports and Dashboards
Down-stream predictive models uses MapReduce views as data source
![Page 13: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/13.jpg)
Current Challenges and Future Directions
• Improvements in geostatistics: Gridded data to topological features. • River basins, flood prone area, political boundaries and other locations of
interests
• Generating statistics makes for very data-intensive processing
• Potential for parallelization.
• Efficient stream processing engine of larger tuples with longer sliding windows. • Complex quality control and verification requires longer time-series statistics
spanning multi-day historical observed and forecasted data.
• Strategy: can we retain data processing all in-memory, caching, etc..
• Efficient MapReduce views on array-based data models and other approaches.
• Improvements on data warehousing schema. • Ongoing improvements for handling spatio-temporal data.
![Page 14: Data Centric HPC for Numerical Weather Forecasting](https://reader034.vdocument.in/reader034/viewer/2022051817/5492db5db47959564d8b470b/html5/thumbnails/14.jpg)
Summary
• Planning for extreme weather events is a time-critical workflow that involves complex analysis of large data-sets from various sources.
• Recent advances in Big Data and HPC enables architecture of real-world disaster planning application.
• Current integration schemes uses intermediary staging files and ETL-like scripts.
• Better algorithms and techniques are needed to improve performance and integration.