precipitation nowcasting leveraging deep learning and hpc ... · introduction nowcasting predict...
TRANSCRIPT
Precipitation Nowcasting Leveraging Deep Learning and HPC Systems to Optimize the Data Pipeline
AMS 2018 Copyright 2018 Cray Inc. 2
Agenda
● Introduction● Motivation● Dataset● Prediction Modeling● Data Pipelines● Results ● Q&A
Introduction
● Nowcasting● Predict precipitation locations and rates at a
regional level over a short timeframe● Traditional Approach: Numerical Weather
Prediction● Requires an extended lead time between
newly acquired data and release of forecasts● Deep Learning
● Branch of Machine Learning based on Neural Networks
● Deep implies multiple layers of computation between inputs and output
● Pattern Matching
AMS 2018 Copyright 2018 Cray Inc. 3
Motivation
● Why is short-term nowcasting important?● Provide reliable ride-to-work forecasts● Predict rapid formation of severe precipitation events: Flash Flood Warning!
● Why Deep Learning?● Traditional Nowcasting relies on NWP, slow to respond to new data● Deep Learning learns from past rainfall patterns● Trained models are computationally cheap to utilize
● Why HPC?● A lot of data! Training can utilize decades of observations● Models can be trained and inferred for small regions in parallel
AMS 2018 Copyright 2018 Cray Inc. 4
Prototype Nowcasting System
● Small: 4 stations● KATX (Seattle), KTLX (Oklahoma City), KTLH (Tallahassee),
KBUF (Buffalo)● Variable sized dataset, as large as 7 years of historical
rainfall data● Total size (raw data): 4TB● Total size (processed): 684GB
● Examine Pipeline Performance and Bottlenecks● Explore Nowcasting Performance via Deep Learning
AMS 2018 Copyright 2018 Cray Inc. 5
Dataset Processing
6
Data Collection• Historical Radar Data
(NETCDF) • Geographical Region • Days with over 0.1 inches
of precipitation, info from NOAA – NCDC
• Radar scans every 5-10 minutes throughout the day
Transformation• Raw radial data structure
converted to evenly spaced Cartesian grid (Tensors with float 32)
• Resolution scaling and clipping
• Configure dimensionality• Sequencing• 2 channels –
Reflectivity, Velocity• Uses Py-ART package
Sampling• Time-series • Inputs and
Labels• Random
samplingBigDL
Framework• Apache Spark on
Urika-XC/GX• Implemented in
Jupyter notebooks and Python
AMS 2018 Copyright 2018 Cray Inc.
Prediction Modeling
● Convolutional Recurrent Neural Network● Convolutional Neural Network – Spatial Patterns● Recurrent Neural Network – Temporal Patterns● ConvLSTM – Convolutional Long Short-Term Memory Network
● Sequence to Sequence● Encoder Decoder● Use recent history to predict future changes
AMS 2018 Copyright 2018 Cray Inc. 7
Pipeline: Data Processing
AMS 2018 Copyright 2018 Cray Inc. 8
Distributed File-System
Feed-Forward
Compute Gradients
BigDL
Iterate on Dataset
Save As NumPy Arrays
Convert to RDDs
Pipeline: Distributed Training
AMS 2018 Copyright 2018 Cray Inc. 9
Distributed File-System
Hyper-Parameter
Optimization
Split Data By Region
KATX
KTLH
KBUF
KTLX…
Train Unique Networks for each region
Trained Networks
Idealized Training Timeline
● Station: KATX● Dataset size:
● 118,342 Sequences● 101GB
● Parameters● Systems:
● Data processing: Cray Urika-GX – 1024 cores
● Training: Cray CS-Storm –8 Nvidia P100 GPUs
AMS 2018 Copyright 2018 Cray Inc. 10
Process Wall-Time Proportion
Download 13 hours 32%
Spark 4 hours 10%
Training 24 hours 58%
Inference 10 seconds 0%
Scaling
● Tensorflow via Cray MPI Com. Plugin
● Nvidia Tesla P100 GPUs● Batchsize of 4 samples
per device● Throughput in
Samples/Second
AMS 2018 Copyright 2018 Cray Inc. 11
Device Count Throughput Scaling
Efficiency
1 25.8 1.0
2 51.6 1.0
4 102.7 .995
8 205.4 .995
16 410.5 .994
Model Performance
AMS 2018 Copyright 2018 Cray Inc. 12
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6
KTLH CSI and Average Error
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6
KTLX CSI and Average Error
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
KBUF CSI and Average Error
CSI-ConvLSTM CSI-PersistenceMAE-ConvLSTM MAE-Persistence
0.1
0.3
0.5
0.7
1 2 3 4 5 6
KATX CSI and Average Error
CSI-ConvLSTM CSI-PersistenceMAE-ConvLSTM MAE-Persistence
Effect of dataset size
● More data is correlated to higher performing prediction model
● Station: KATX (Seattle)
AMS 2018 Copyright 2018 Cray Inc. 13
0.44
0.442
0.444
0.446
0.448
0.45
0.452
0.454
0.456
0.458
0 1 2 3 4 5 6 7 8
Crit
ical
Suc
cess
Inde
xYears in Dataset
Average CSI for varying sized datasets
Years Size (GB) Sequences1 40 43,1713 109 118,3425 143 155,3377 217 235,301
Sample Prediction + Q/A
AMS 2018 Copyright 2018 Cray Inc. 14