headline der präsentationconference.tr32.de/thursday/2a_riedel_2018-04-05-comparison-mac… · 1...
TRANSCRIPT
11.04.2018
1
SELECTED COMPARISONS BETWEEN MACHINE LEARNING &
DEEP LEARNING IN EARTH SCIENCE APPLICATIONS
PROF. DR. – ING. MORRIS RIEDEL, UNIVERSITY OF ICELAND / JUELICH SUPERCOMPUTING CENTRE
HEAD OF CROSS-SECTIONAL TEAM DEEP LEARNING & RDA CO-CHAIR INTEREST GROUP BIG DATA
WARM-UP AFTER COFFEE BREAK
5th April 2018 Page 2
WARM-UP AFTER COFFEE BREAK
5th April 2018 Page 3
[8] The Deep Learning Revolution, YouTube
OUTLINE
5th April 2018 Page 4
OUTLINE
Traditional Machine Learning
Feature Engineering & Modeling
Remote Sensing Application Example
Deep Learning
Automated Feature Engineering
Network Topologies & Modeling
Remote Sensing Application Example
Advantages & Disadvantages
Modular Supercomputing Architecture
Transfer Learning & Other Models
Summary & References
5th April 2018 Page 5
TRADITIONAL MACHINE LEARNING
5th April 2018 Page 6
Feature Engineering – Modeling – Remote Sensing Application Example
11.04.2018
2
TRADITIONAL MACHINE LEARNING Overview of Techniques – Supervised Classification (focus in this talk)
5th April 2018 Page 7
Groups of data exist
New data classified
to existing groups
Classification
?
Clustering Regression
No groups of data exist
Create groups from
data close to each other
Identify a line with
a certain slope
describing the data
TRADITIONAL MACHINE LEARNING Supervised Classification Example – Remote Sensing Dataset & Code
One of the challenges in remote sensing is to classify land cover
into distinct classes based on hyperspectral datasets obtained from
airborne and satellite sensors
One dataset used is the Indian Pines AVIRIS dataset over an
agricultural site composed of agricultural fields with regular
geometry (200 spectral bands, 1417x617 pixels, spatial resolution
of 20 meter, 52 classes of different land cover, 6 discarded)
Before classification is performed the raw hyperspectral data is
used with feature engineering techniques that in this case is the
Self Dual Attribute Profile (SDAP)
The classification is performed via a tuned version of the piSVM
MPI code that consists of a parallel Support Vector Machine
(SVM) including radial basis function (RBF) kernel
[4] C.Cortes and V. Vapnik, 1995
[1] G. Cavallaro and M. Riedel, et al. , 2015
5th April 2018 Page 8
TRADITIONAL MACHINE LEARNING Supervised Classification Example – Remote Sensing Dataset & Results
Traditional Methods
Support Vector Machine (SVM)
Substantial manual feature engineering,
e.g. Self Dual Attribute Profile (SDAP)
10-fold cross-validation
Achieved 77,02 % accuracy [4] C.Cortes and V. Vapnik, 1995
[1] G. Cavallaro and M. Riedel, et al. , 2015
[6] M. Riedel, Invited YouTube Tutorial on Machine Learning with
Remote Sensing Datasets, Ghent University
1m 10m 30m
[7] G. Cavallaro et al.
5th April 2018 Page 9
DEEP LEARNING Automated Feature Engineering – Network Topologies & Modeling – Remote Sensing Application Example
5th April 2018 Page 10
SIMULATION AND DATA LABS (SDL) Juelich Supercomputing Centre & Deep Learning
Communities Research
Groups
Simulation Labs
Cross-Sectional Teams Data Life Cycle Labs Exascale co-Design
Facilities
PADC
DEEP-EST EU
PROJECT
Convergence: SDL
Cross-Sectional
Team Deep Learning
HPC Systems
JURECA & JUQUEEN
5th April 2018 Page 11
Modular Supercomputer
JUWELS
DEEP LEARNING 101 Short Overview & Role of Cross-Sectional Team Deep Learning at Juelich Supercomputing Centre
Innovative & disruptive approach in
geospatial data analysis
Provide cutting edge deep learning tools that take advantage of JSC HPC machines
Advance deep learning applications and research on HPC prototypes (e.g. DEEP-EST)
Engage with industry (industrial relations team) & support SMEs (e.g. Soccerwatch)
Offer tutorials & application enabling support for commercial & scientific users
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University
5th April 2018 Page 12
11.04.2018
3
DEEP LEARNING Supervised Classification Example – Remote Sensing Dataset & Code
One of the challenges in remote sensing is to classify land cover into
distinct classes based on hyperspectral datasets obtained from
airborne and satellite sensors over time (i.e. time-series)
One dataset as pre-study to time series that is used is the Indian Pines
AVIRIS dataset over an agricultural site composed of agricultural fields
with regular geometry (200 spectral bands, 1417x617 pixels, spatial
resolution of 20 meter, 58 instead of 52 classes of different land cover)
Instead of traditional methods that applies (semi-)manual feature
engineering before classification recent deep learning techniques are
able to learn features automatically
The classification is performed via the Tensorflow Deep Learning
framework with Convolutional Neural Networks (CNN) that work
very well for data with spatial properties [2] J. Lange, G. Cavallaro, M. Riedel, et al. , 2018
5th April 2018 Page 13
DEEP LEARNING Programming with TensorFlow
5th April 2018 Page 14
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University
Tensorflow is an open source library for deep learning models using a flow graph approach
Tensorflow nodes model mathematical operations and graph edges between the nodes are so-called tensors (also
known as multi-dimensional arrays)
The Tensorflow tool supports the use of CPUs and GPUs (much more faster than CPU versions)
Tensorflow work with the high-level deep learning tool Keras in order to create models fast
[10] Tensorflow Deep Learning
Framework
[11] A Tour of
Tensorflow
[12] Distributed & Cloud Computing Book
DEEP LEARNING Programming with TensorFlow – What is a Tensor?
5th April 2018 Page 15
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University [13] Big Data Tips, What is a Tensor?
A Tensor is nothing else than a multi-dimensional array often used in scientific & engineering environments
Tensors are best understood when comparing it with vectors or matrices and their dimensions
DEEP LEARNING Programming with TensorFlow & Keras
5th April 2018 Page 16
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University [9] Keras Python Deep Learning Library
Tool Keras supports inherently the
creation of artificial neural networks using Dense layers
and optimizers (e.g. SGD)
Includes regularization (e.g.
weight decay) or momentum
keras.layers.Dense(units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None)
keras.optimizers.SGD(lr=0.01,
momentum=0.0,
decay=0.0,
nesterov=False)
Keras is a high-level deep learning library implemented in Python that works on top of existing other rather low-
level deep learning frameworks like Tensorflow, CNTK, or Theano
The key idea behind the Keras tool is to enable faster experimentation with deep networks
Created deep learning models run seamlessly on CPU and GPU via low-level frameworks
DEEP LEARNING Programming with TensorFlow & Keras – Supervised Classification Example – Network Topology
5th April 2018 Page 17
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University
Classify pixels in a hyperspectral remote sensing image having groundtruth/labels available
Created CNN architecture for a specific hyperspectral land cover type classification problem
Used dataset of Indian Pines (compared to other approaches) using all labelled pixels/classes
Performed no manual feature engineering to obtain good results (aka accuracy)
[2] J. Lange,
G. Cavallaro,
M. Riedel, et al. ,
IGARSS 2018
DEEP LEARNING Programming with TensorFlow & Keras – Supervised Classification Example – Model Code Example
5th April 2018 Page 18
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University
[2] J. Lange, G. Cavallaro, M. Riedel, et al. , IGARSS 2018
11.04.2018
4
DEEP LEARNING Programming with TensorFlow & Keras – Supervised Classification Example – Selected Results
5th April 2018 Page 19
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University
SVM
comparison
~ 77% with
manual feature
engineering
Blue: correctly classified
Red: incorrectly classified
[2] J. Lange,
G. Cavallaro,
M. Riedel, et al. ,
IGARSS 2018
SELECTED COMPARISONS Supervised Classification Example – Remote Sensing Dataset & Results
Traditional Methods
Support Vector Machine (SVM)
Substantial manual feature engineering
10-fold cross-validation for model selection
Achieved 77,02 % accuracy
Convolutional Neural
Networks (CNNs)
Automated feature learning
Achieved 84,40 % accuracy
SVM + Feature Engineering (~3 years) vs. CNN architecture setup (~1 month)
[2] J. Lange, G. Cavallaro,
M. Riedel, et al. , 2018
5th April 2018 Page 20
RELEVANT TRENDS Transfer Learning – Modular Supercomputing Architecture
5th April 2018 Page 21
DEEP SERIES OF PROJECTS EU Projects Driven by Co-Design of HPC Applications
3 EU Exascale projects DEEP
DEEP-ER
DEEP-EST
27 partners Coordinated by JSC
EU-funding: 30 M€ JSC-part > 5,3 M€
Nov 2011 – Jun 2020
[17] M. Goetz & M. Riedel, et al. , 2015
(classification, clustering,
deep learning)
Juelich Supercomputing Centre
implements the DEEP projects designs
GPU Module Many-core Booster Cluster
Module
BN
BN
BN
BN
BN BN
BN
BN
BN
CN
CN
Data Analytics
Module
DN
Network Attached
Memory Module
NAM NAM
Array
Databases
(e.g.
Rasdaman,
SciDB)
Storage
Module
GN
GN
GN
GN
GN GN
Disk Disk Disk Disk
Intel
Nervana &
Neon
DN
JSC – MODULAR SUPERCOMPUTING ARCHITECTURE
Roadmap
ML
Training Deep
learning
Data
Models
Innovative
Ideas, e.g. trained
models in
memory
Innovative
Ideas, e.g. use of deep
learning
optimized
chip
designs
Deep
learning
ML Testing,
Inference
Data storage
module for geospatial
datasets?
Geospatial
data
5th April 2018 Page 23
JSC Juelich
Supercomputing
Centre
General Purpose Cluster
File
Server
GPFS,
Lustre
IBM Power 6
JUMP, 9 TFlop/s
IBM Blue Gene/P
JUGENE, 1 PFlop/s
HPC-FF
100 TFlop/s
JUROPA
200 TFlop/s
IBM Blue Gene/Q
JUQUEEN (2012)
5.9 PFlop/s
IBM Blue Gene/L
JUBL, 45 TFlop/s
IBM Power 4+
JUMP (2004), 9 TFlop/s
Highly scalable
Hierarchical
Storage Server JUWELS Scalable
Module (2019/20)
50+ PFlop/s
JUWELS Cluster
Module (2018)
12 PFlop/s
JURECA Cluster
(2015) 2.2 PFlop/s
JURECA Booster
(2017) 5 PFlop/s
5th April 2018 Page 24
11.04.2018
5
TRADITIONAL MACHINE LEARNING Supervised Classification – Modular Supercomputing Architecture
(1) The training dataset and testing dataset of the remote sensing
application is used many times in the process and make sense
to put into the DEEP-EST Network Attached Memory (NAM) module
(2) Training with piSVM in order to generate a model requires
powerful CPUs with good interconnection for the inherent
optimization process and thus can take advantage of the
DEEP-EST CLUSTER module (use of training dataset,
requires piSVM parameters for kernel and cost)
(3) Instead of dropping the trained SVM model
(i.e. file with support vectors) to disk it makes sense to
put this model into the DEEP-EST NAM module
(4) Testing with piSVM in order to evaluate the model
accuracy requires not powerful CPUs and not a good
interconnection but scales perfectly (i.e. nicely parallel) and thus
can take advantage of the BOOSTER module
(use of testing dataset & model file residing in NAM), prediction &
inference using models is largely usable on the BOOSTER module too
(5) If accuracy too low back to (2) to change parameters [3] E. Erlingsson, G. Cavallaro, M. Riedel, et al. , 2018
5th April 2018 Page 25
DEEP LEARNING Supervised Classification – CNN Design & Setup
(1) The training dataset and testing dataset of the remote sensing
application is used many times in the process and make sense to
put into the DEEP-EST Network Attached Memory (NAM) module
(2) Training with CNNs in Tensorflow works best fore many-core
CPUs for the inherent optimization process based on Stochastic
Gradient Descent (SGD) MPI collective available in the DEEP-EST
Global Collective Environment (GCE) and thus can take advantage
of the DEEP-EST BOOSTER module (use of training dataset,
requires CNN architectural design parameters)
(3) Trained models of selected architectural CNN setups need to
be compared and thus can be put in the DEEP-EST NAM module
(4) Testing with Tensorflow in order to evaluate the model accuracy
works also quite well for many-core architectures and scales perfectly
(i.e. nicely parallel) and thus can take advantage of the BOOSTER module
(use of testing dataset & CNN models residing in NAM)
(5) If accuracy too low back to (2) to change parameters
(Upcoming: potentially exploring the use of Intel Nervana Chips & Neon with Tensorflow)
5th April 2018 Page 26
Intel
Nervana &
Neon
TRANSFER LEARNING 101 Short Overview & Remote Sensing Application Example
Rare Data Application Example
Remote Sensing Datasets
Extremely less data to train a deep learning network
Common in remote sensing and other engineering
or academic discplines
Massively risk in overfitting the data due to
less available training data with labels
Complexity: pixel-wise classification vs. whole scene
Too costly to acquire high quality labels
(e.g. groundtruth compaigns)
5th April 2018 Page 27
pretrained network with
‘big data‘ domain A
final layers used to
train network with
‘rare data‘ domain B
Representations from very deep networks are
generic and can facilitate transfer learning between different application domains
Representation contained in the last layers of deep pretrained networks
is of major influence in classification accuracy
Earlier – the more shallow – layers
insignificantly affect the classification outcome
[14] J. Donahue et al., “Decaf: A deep convolutional
activation feature for generic visual recognition,”
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University
TRANSFER LEARNING 101 ImageNet Dataset
Dataset: ImageNet
Total number of
images: 14.197.122
Number of images with
bounding box annotations: 1.034.908
5th April 2018 Page 28
[18] J. Dean et al., ‘Large-Scale Deep Learning’
apply transfer
learning
[19] ImageNet
Web page
TRANSFER LEARNING Pre-Trained Network Overfeat Example
5th April 2018 Page 29
Using available Overfeat
as pre-trained network
Overfeat is an improved
version of AlexNet &
is trained on
1.2 million labeled
images from ImageNet
[15] D. Marmanis et al., ‘Deep Learning
Earth Obervation Classification Using
ImageNet Pretrained Networks’, 2016
[16] P. Sermanet et al., ‘OverFeat: Integrated
Recognition, Localization and Detection using
Convolutional Networks’
TRANSFER LEARNING
5th April 2018 Page 30
Results
[15] D. Marmanis et al., ‘Deep Learning Earth Obervation
Classification Using ImageNet Pretrained Networks’, 2016
Data randomly taken from various city images and used with the trained
CNN using pre-trained ImageNet
Even on unseen data from complete different datasets transfer learning is
working well
Shown for scene-wide classification, not much for pixel-wise classification
Studies reveal transferability of different layers in deep CNNs pretrained
with ImageNet
Transfer learning is relevant for all sciences & worth studying when lack of
labels exist
UC Merced Land
Dataset
apply transfer
learning
11.04.2018
6
DEEP LEARNING Supervised Classification – Transfer Learning
(1) Studies have shown that Transfer Learning works well especially for remote sensing data
without groundtruth or labelled data (i.e. unsupervised) and pre-trained networks trained on general
images like ImageNet (e.g. like Overfeat) are available and are put into the DEEP-EST NAM
module to be re-used for unsupervised deep learning CNN training
(2) Based on pre-trained features another CNN architectural setup is
trained with the real remote sensing data whereby the DEEP-EST
DATA ANALYTICS module is an interesting approach since the FPGA
might be used to compute the transformation from pre-trained
features as suitable inputs to the real training process of the CNN
based on remote sensing data
(3) Trained models of selected architectural CNN setups that have
been used with pre-trained features need to be compared and thus
can be put in the DEEP-EST NAM module
(4) Testing with Tensorflow in order to evaluate the model accuracy works also quite well for many-core architectures and scales perfectly (i.e.
nicely parallel) and thus can take advantage of the BOOSTER module (use of testing dataset & CNN models residing in NAM)
(5) Testing results are written back to the DEEP-EST NAM per CNN architectural design, the FPGA in the NAM can compute the best obtained
accuracy for all the different setups
(6) If accuracy is too low consider to move back to step (1) to change
the pre-trained network or step (2) to create a better CNN architectural
OTHER MODELS LSTM for Time Series Analysis
5th April 2018 Page 32
[5] M. Riedel, Invited
YouTube Tutorial on Deep
Learning, Ghent University
Long Short Term Memory (LSTM) networks are a special kind of Recurrent Neural Networks (RNNs)
LSTMs learn long-term dependencies in data by remembering information for long periods of time
The LSTM chain structure consists of four neural network layers interacting in a specific way
x +
tanh
x
ℴ ℴ ℴ
x
tanh
xt
ht
x +
tanh
x
ℴ ℴ ℴ
x
tanh
Xt+1
Ht+1
x +
tanh
x
ℴ ℴ ℴ
x
tanh
Xt-1
Ht-1
(each line carries an entire vector) ht
LSTM model
xt
(uses sigmoid ℴ)
OTHER MODELS LSTM – Application & Code Example
5th April 2018 Page 33
LSTM models work quite well to predict power but needs
to be trained and tuned for different power stations
Observing that some peaks can not be ‘learned‘
Ongoing Master thesis – further results pending
SUMMARY
5th April 2018 Page 34
SUMMARY
Mindset
Think traditional machine learning still relevant
Selected new approaches with specific deep learning per problem (CNN, LSTM, etc.)
Skillset
Basic knowledge of machine learning required for deep learning
Validation (i.e. model selection) and regularization still valid(!)
Toolset
Parallel versions of traditional machine learning methods exist (piSVM, HPDBSCAN)
Tensorflow & Keras just one example but offer good performance given good install(!)
Explore technology trends, e.g. specific chips for deep learning, NAM, etc.
5th April 2018 Page 35
REFERENCES
5th April 2018 Page 36
11.04.2018
7
REFERENCES (1)
[1] G. Cavallaro, M. Riedel, J.A. Benediktsson et al., ‘On Understanding Big Data Impacts in Remotely Sensed Image Classification using Support Vector
Machine Methods’, IEEE Journal of Selected Topics in Applied Earth Observation and Remote Sensing, 2015, DOI: 10.1109/JSTARS.2015.2458855
[2] J. Lange, G. Cavallaro, M. Goetz, E. Erlingsson, M. Riedel, ‘The Influence of Sampling Methods on Pixel-Wise Hyperspectral Image Classification with 3D
Convolutional Neural Networks’, Proceedings of the IGARSS 2018 Conference, to appear
[3] E. Erlingsson, G. Cavallaro, M. Riedel, H. Neukirchen, ‘Scaling Support Vector Machines Towards Exascale Computing for Classification of Large-Scale
High-Resolution Remote Sensing Images’, Proceedings of the IGARSS 2018 Conference, to appear
[4] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20(3), pp. 273–297, 1995
[5] M. Riedel, ‘Deep Learning using a Convolutional Neural Network‘, Ghent University, Invited YouTube Tutorial,
Online: https://www.youtube.com/watch?v=gOL1_YIosYk&list=PLrmNhuZo9sgZUdaZ-f6OHK2yFW1kTS2qF
[6] M. Riedel, ‘Introduction to Machine Learning Algorithms‘, Ghent University, Invited YouTube Tutorial,
Online: https://www.youtube.com/watch?v=KgiuUZ3WeP8&list=PLrmNhuZo9sgbcWtMGN0i6G9HEvh08JG0J
[7] G. Cavallaro, N. Falco, M. Dalla Mura and J. A. Benediktsson, "Automatic Attribute Profiles," in IEEE Transactions on Image Processing, vol. 26, no. 4, pp.
1859-1872, April 2017, Online: http://ieeexplore.ieee.org/document/7842555/
[8] YouTube Video, ‘The Deep Learning Revolution’,
Online: https://www.youtube.com/watch?v=Dy0hJWltsyE
[9] Keras Python Deep Learning Library,
Online: https://keras.io/
5th April 2018 Page 37
REFERENCES (2)
[10] Tensorflow Deep Learning Framework,
Online: https://www.tensorflow.org/[9] Keras Python Deep Learning Library,
[11] A Tour of Tensorflow,
Online: https://arxiv.org/pdf/1610.01178.pdf
[12] K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book,
Online: http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049
[13] Big Data Tips, ‘What is a Tensor?‘,
Online: http://www.big-data.tips/what-is-a-tensor
[14] J. Donahue et al., “Decaf: A deep convolutional activation feature for generic visual recognition,” unpublished paper, 2013,
Online: http://arxiv.org/abs/1310.1531
[15] Dimitrios Marmanis et al., ‘Deep Learning Earth Obervation Classification Using ImageNet Pretrained Networks‘, IEEE Geoscience and Remote Sensing
Letters, Volume 13 (1), 2016,
Online: http://ieeexplore.ieee.org/document/7342907/
[16] P. Sermanet et al., ‘OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks’,
Online: http://arxiv.org/abs/1312.6229
5th April 2018 Page 38
REFERENCES (3)
[17] M. Goetz, C. Bodenstein, M. Riedel, ‘HPDBSCAN – Highly Parallel DBSCAN’, in proceedings of the ACM/IEEE International Conference for High
Performance Computing, Networking, Storage, and Analysis (SC2015), Machine Learning in HPC Environments (MLHPC) Workshop, 2015,
Online: http://dx.doi.org/10.1145/2834892.2834894
[18] J. Dean et al., ‘Large scale deep learning’, Keynote GPU Technical Conference, 2015
[19] ImageNet Web page,
Online: http://image-net.org
5th April 2018 Page 39
ACKNOWLEDGEMENTS Previous & current members of the High Productivity Data Processing Research Group
Thesis
Completed
PD Dr.
G. Cavallaro
Dr. M. Goetz
(now KIT)
Thesis
Completed
Senior PhD
Student A.S. Memon
Senior PhD
Student M.S. Memon
MSc M.
Richerzhagen
Thesis
Completed
MSc
P. Glock
(now INM-1)
DEEP
Learning
Startup
MSc
C. Bodenstein
(now Soccerwatch.tv)
PhD Student
E. Erlingsson PhD Student
S. Bakarat
Starting
in Fall
2018
MSc Student
G.S. Guðmundsson
(Landsverkjun)
THANKS Talk shortly available under www.morrisriedel.de