headline der präsentationconference.tr32.de/thursday/2a_riedel_2018-04-05-comparison-mac… · 1...

11.04.2018

1

SELECTED COMPARISONS BETWEEN MACHINE LEARNING &

DEEP LEARNING IN EARTH SCIENCE APPLICATIONS

PROF. DR. – ING. MORRIS RIEDEL, UNIVERSITY OF ICELAND / JUELICH SUPERCOMPUTING CENTRE

HEAD OF CROSS-SECTIONAL TEAM DEEP LEARNING & RDA CO-CHAIR INTEREST GROUP BIG DATA

WARM-UP AFTER COFFEE BREAK

5th April 2018 Page 2

WARM-UP AFTER COFFEE BREAK


[8] The Deep Learning Revolution, YouTube

OUTLINE


OUTLINE

Traditional Machine Learning

Feature Engineering & Modeling

Remote Sensing Application Example

Deep Learning

Automated Feature Engineering

Network Topologies & Modeling

Remote Sensing Application Example

Advantages & Disadvantages

Modular Supercomputing Architecture

Transfer Learning & Other Models

Summary & References


TRADITIONAL MACHINE LEARNING


Feature Engineering – Modeling – Remote Sensing Application Example

11.04.2018

2

TRADITIONAL MACHINE LEARNING Overview of Techniques – Supervised Classification (focus in this talk)


Groups of data exist

New data classified

to existing groups

Classification

?

Clustering Regression

No groups of data exist

Create groups from

data close to each other

Identify a line with

a certain slope

describing the data

TRADITIONAL MACHINE LEARNING Supervised Classification Example – Remote Sensing Dataset & Code

One of the challenges in remote sensing is to classify land cover

into distinct classes based on hyperspectral datasets obtained from

airborne and satellite sensors

One dataset used is the Indian Pines AVIRIS dataset over an

agricultural site composed of agricultural fields with regular

geometry (200 spectral bands, 1417x617 pixels, spatial resolution

of 20 meter, 52 classes of different land cover, 6 discarded)

Before classification is performed the raw hyperspectral data is

used with feature engineering techniques that in this case is the

Self Dual Attribute Profile (SDAP)

The classification is performed via a tuned version of the piSVM

MPI code that consists of a parallel Support Vector Machine

(SVM) including radial basis function (RBF) kernel

[4] C.Cortes and V. Vapnik, 1995

[1] G. Cavallaro and M. Riedel, et al. , 2015


TRADITIONAL MACHINE LEARNING Supervised Classification Example – Remote Sensing Dataset & Results

Traditional Methods

Support Vector Machine (SVM)

Substantial manual feature engineering,

e.g. Self Dual Attribute Profile (SDAP)

10-fold cross-validation

Achieved 77,02 % accuracy [4] C.Cortes and V. Vapnik, 1995

[1] G. Cavallaro and M. Riedel, et al. , 2015

[6] M. Riedel, Invited YouTube Tutorial on Machine Learning with

Remote Sensing Datasets, Ghent University

1m 10m 30m

[7] G. Cavallaro et al.


DEEP LEARNING Automated Feature Engineering – Network Topologies & Modeling – Remote Sensing Application Example


SIMULATION AND DATA LABS (SDL) Juelich Supercomputing Centre & Deep Learning

Communities Research

Groups

Simulation Labs

Cross-Sectional Teams Data Life Cycle Labs Exascale co-Design

Facilities

PADC

DEEP-EST EU

PROJECT

Convergence: SDL

Cross-Sectional

Team Deep Learning

HPC Systems

JURECA & JUQUEEN


Modular Supercomputer

JUWELS

DEEP LEARNING 101 Short Overview & Role of Cross-Sectional Team Deep Learning at Juelich Supercomputing Centre

Innovative & disruptive approach in

geospatial data analysis

Provide cutting edge deep learning tools that take advantage of JSC HPC machines

Advance deep learning applications and research on HPC prototypes (e.g. DEEP-EST)

Engage with industry (industrial relations team) & support SMEs (e.g. Soccerwatch)

Offer tutorials & application enabling support for commercial & scientific users

[5] M. Riedel, Invited

YouTube Tutorial on Deep

Learning, Ghent University


11.04.2018

3

DEEP LEARNING Supervised Classification Example – Remote Sensing Dataset & Code

One of the challenges in remote sensing is to classify land cover into

distinct classes based on hyperspectral datasets obtained from

airborne and satellite sensors over time (i.e. time-series)

One dataset as pre-study to time series that is used is the Indian Pines

AVIRIS dataset over an agricultural site composed of agricultural fields

with regular geometry (200 spectral bands, 1417x617 pixels, spatial

resolution of 20 meter, 58 instead of 52 classes of different land cover)

Instead of traditional methods that applies (semi-)manual feature

engineering before classification recent deep learning techniques are

able to learn features automatically

The classification is performed via the Tensorflow Deep Learning

framework with Convolutional Neural Networks (CNN) that work

very well for data with spatial properties [2] J. Lange, G. Cavallaro, M. Riedel, et al. , 2018


DEEP LEARNING Programming with TensorFlow





Tensorflow is an open source library for deep learning models using a flow graph approach

Tensorflow nodes model mathematical operations and graph edges between the nodes are so-called tensors (also

known as multi-dimensional arrays)

The Tensorflow tool supports the use of CPUs and GPUs (much more faster than CPU versions)

Tensorflow work with the high-level deep learning tool Keras in order to create models fast

[10] Tensorflow Deep Learning

Framework

[11] A Tour of

Tensorflow

[12] Distributed & Cloud Computing Book

DEEP LEARNING Programming with TensorFlow – What is a Tensor?




Learning, Ghent University [13] Big Data Tips, What is a Tensor?

A Tensor is nothing else than a multi-dimensional array often used in scientific & engineering environments

Tensors are best understood when comparing it with vectors or matrices and their dimensions

DEEP LEARNING Programming with TensorFlow & Keras




Learning, Ghent University [9] Keras Python Deep Learning Library

Tool Keras supports inherently the

creation of artificial neural networks using Dense layers

and optimizers (e.g. SGD)

Includes regularization (e.g.

weight decay) or momentum

keras.layers.Dense(units,

activation=None,

use_bias=True,

kernel_initializer='glorot_uniform',

bias_initializer='zeros',

kernel_regularizer=None,

bias_regularizer=None,

activity_regularizer=None,

kernel_constraint=None,

bias_constraint=None)

keras.optimizers.SGD(lr=0.01,

momentum=0.0,

decay=0.0,

nesterov=False)

Keras is a high-level deep learning library implemented in Python that works on top of existing other rather low-

level deep learning frameworks like Tensorflow, CNTK, or Theano

The key idea behind the Keras tool is to enable faster experimentation with deep networks

Created deep learning models run seamlessly on CPU and GPU via low-level frameworks

DEEP LEARNING Programming with TensorFlow & Keras – Supervised Classification Example – Network Topology





Classify pixels in a hyperspectral remote sensing image having groundtruth/labels available

Created CNN architecture for a specific hyperspectral land cover type classification problem

Used dataset of Indian Pines (compared to other approaches) using all labelled pixels/classes

Performed no manual feature engineering to obtain good results (aka accuracy)

[2] J. Lange,

G. Cavallaro,

M. Riedel, et al. ,

IGARSS 2018

DEEP LEARNING Programming with TensorFlow & Keras – Supervised Classification Example – Model Code Example





[2] J. Lange, G. Cavallaro, M. Riedel, et al. , IGARSS 2018

11.04.2018

4

DEEP LEARNING Programming with TensorFlow & Keras – Supervised Classification Example – Selected Results





SVM

comparison

~ 77% with

manual feature

engineering

Blue: correctly classified

Red: incorrectly classified

[2] J. Lange,

G. Cavallaro,

M. Riedel, et al. ,

IGARSS 2018

SELECTED COMPARISONS Supervised Classification Example – Remote Sensing Dataset & Results

Traditional Methods

Support Vector Machine (SVM)

Substantial manual feature engineering

10-fold cross-validation for model selection

Achieved 77,02 % accuracy

Convolutional Neural

Networks (CNNs)

Automated feature learning

Achieved 84,40 % accuracy

SVM + Feature Engineering (~3 years) vs. CNN architecture setup (~1 month)

[2] J. Lange, G. Cavallaro,

M. Riedel, et al. , 2018


RELEVANT TRENDS Transfer Learning – Modular Supercomputing Architecture


DEEP SERIES OF PROJECTS EU Projects Driven by Co-Design of HPC Applications

3 EU Exascale projects DEEP

DEEP-ER

DEEP-EST

27 partners Coordinated by JSC

EU-funding: 30 M€ JSC-part > 5,3 M€

Nov 2011 – Jun 2020

[17] M. Goetz & M. Riedel, et al. , 2015

(classification, clustering,

deep learning)

Juelich Supercomputing Centre

implements the DEEP projects designs

GPU Module Many-core Booster Cluster

Module

BN

BN

BN

BN

BN BN

BN

BN

BN

CN

CN

Data Analytics

Module

DN

Network Attached

Memory Module

NAM NAM

Array

Databases

(e.g.

Rasdaman,

SciDB)

Storage

Module

GN

GN

GN

GN

GN GN

Disk Disk Disk Disk

Intel

Nervana &

Neon

DN

JSC – MODULAR SUPERCOMPUTING ARCHITECTURE

Roadmap

ML

Training Deep

learning

Data

Models

Innovative

Ideas, e.g. trained

models in

memory

Innovative

Ideas, e.g. use of deep

learning

optimized

chip

designs

Deep

learning

ML Testing,

Inference

Data storage

module for geospatial

datasets?

Geospatial

data


JSC Juelich

Supercomputing

Centre

General Purpose Cluster

File

Server

GPFS,

Lustre

IBM Power 6

JUMP, 9 TFlop/s

IBM Blue Gene/P

JUGENE, 1 PFlop/s

HPC-FF

100 TFlop/s

JUROPA

200 TFlop/s

IBM Blue Gene/Q

JUQUEEN (2012)

5.9 PFlop/s

IBM Blue Gene/L

JUBL, 45 TFlop/s

IBM Power 4+

JUMP (2004), 9 TFlop/s

Highly scalable

Hierarchical

Storage Server JUWELS Scalable

Module (2019/20)

50+ PFlop/s

JUWELS Cluster

Module (2018)

12 PFlop/s

JURECA Cluster

(2015) 2.2 PFlop/s

JURECA Booster

(2017) 5 PFlop/s


11.04.2018

5

TRADITIONAL MACHINE LEARNING Supervised Classification – Modular Supercomputing Architecture

(1) The training dataset and testing dataset of the remote sensing

application is used many times in the process and make sense

to put into the DEEP-EST Network Attached Memory (NAM) module

(2) Training with piSVM in order to generate a model requires

powerful CPUs with good interconnection for the inherent

optimization process and thus can take advantage of the

DEEP-EST CLUSTER module (use of training dataset,

requires piSVM parameters for kernel and cost)

(3) Instead of dropping the trained SVM model

(i.e. file with support vectors) to disk it makes sense to

put this model into the DEEP-EST NAM module

(4) Testing with piSVM in order to evaluate the model

accuracy requires not powerful CPUs and not a good

interconnection but scales perfectly (i.e. nicely parallel) and thus

can take advantage of the BOOSTER module

(use of testing dataset & model file residing in NAM), prediction &

inference using models is largely usable on the BOOSTER module too

(5) If accuracy too low back to (2) to change parameters [3] E. Erlingsson, G. Cavallaro, M. Riedel, et al. , 2018


DEEP LEARNING Supervised Classification – CNN Design & Setup

(1) The training dataset and testing dataset of the remote sensing

application is used many times in the process and make sense to

put into the DEEP-EST Network Attached Memory (NAM) module

(2) Training with CNNs in Tensorflow works best fore many-core

CPUs for the inherent optimization process based on Stochastic

Gradient Descent (SGD) MPI collective available in the DEEP-EST

Global Collective Environment (GCE) and thus can take advantage

of the DEEP-EST BOOSTER module (use of training dataset,

requires CNN architectural design parameters)

(3) Trained models of selected architectural CNN setups need to

be compared and thus can be put in the DEEP-EST NAM module

(4) Testing with Tensorflow in order to evaluate the model accuracy

works also quite well for many-core architectures and scales perfectly

(i.e. nicely parallel) and thus can take advantage of the BOOSTER module

(use of testing dataset & CNN models residing in NAM)

(5) If accuracy too low back to (2) to change parameters

(Upcoming: potentially exploring the use of Intel Nervana Chips & Neon with Tensorflow)


Intel

Nervana &

Neon

TRANSFER LEARNING 101 Short Overview & Remote Sensing Application Example

Rare Data Application Example

Remote Sensing Datasets

Extremely less data to train a deep learning network

Common in remote sensing and other engineering

or academic discplines

Massively risk in overfitting the data due to

less available training data with labels

Complexity: pixel-wise classification vs. whole scene

Too costly to acquire high quality labels

(e.g. groundtruth compaigns)


pretrained network with

‘big data‘ domain A

final layers used to

train network with

‘rare data‘ domain B

Representations from very deep networks are

generic and can facilitate transfer learning between different application domains

Representation contained in the last layers of deep pretrained networks

is of major influence in classification accuracy

Earlier – the more shallow – layers

insignificantly affect the classification outcome

[14] J. Donahue et al., “Decaf: A deep convolutional

activation feature for generic visual recognition,”




TRANSFER LEARNING 101 ImageNet Dataset

Dataset: ImageNet

Total number of

images: 14.197.122

Number of images with

bounding box annotations: 1.034.908


[18] J. Dean et al., ‘Large-Scale Deep Learning’

apply transfer

learning

[19] ImageNet

Web page

TRANSFER LEARNING Pre-Trained Network Overfeat Example


Using available Overfeat

as pre-trained network

Overfeat is an improved

version of AlexNet &

is trained on

1.2 million labeled

images from ImageNet

[15] D. Marmanis et al., ‘Deep Learning

Earth Obervation Classification Using

ImageNet Pretrained Networks’, 2016

[16] P. Sermanet et al., ‘OverFeat: Integrated

Recognition, Localization and Detection using

Convolutional Networks’

TRANSFER LEARNING


Results

[15] D. Marmanis et al., ‘Deep Learning Earth Obervation

Classification Using ImageNet Pretrained Networks’, 2016

Data randomly taken from various city images and used with the trained

CNN using pre-trained ImageNet

Even on unseen data from complete different datasets transfer learning is

working well

Shown for scene-wide classification, not much for pixel-wise classification

Studies reveal transferability of different layers in deep CNNs pretrained

with ImageNet

Transfer learning is relevant for all sciences & worth studying when lack of

labels exist

UC Merced Land

Dataset

apply transfer

learning

11.04.2018

6

DEEP LEARNING Supervised Classification – Transfer Learning

(1) Studies have shown that Transfer Learning works well especially for remote sensing data

without groundtruth or labelled data (i.e. unsupervised) and pre-trained networks trained on general

images like ImageNet (e.g. like Overfeat) are available and are put into the DEEP-EST NAM

module to be re-used for unsupervised deep learning CNN training

(2) Based on pre-trained features another CNN architectural setup is

trained with the real remote sensing data whereby the DEEP-EST

DATA ANALYTICS module is an interesting approach since the FPGA

might be used to compute the transformation from pre-trained

features as suitable inputs to the real training process of the CNN

based on remote sensing data

(3) Trained models of selected architectural CNN setups that have

been used with pre-trained features need to be compared and thus

can be put in the DEEP-EST NAM module

(4) Testing with Tensorflow in order to evaluate the model accuracy works also quite well for many-core architectures and scales perfectly (i.e.

nicely parallel) and thus can take advantage of the BOOSTER module (use of testing dataset & CNN models residing in NAM)

(5) Testing results are written back to the DEEP-EST NAM per CNN architectural design, the FPGA in the NAM can compute the best obtained

accuracy for all the different setups

(6) If accuracy is too low consider to move back to step (1) to change

the pre-trained network or step (2) to create a better CNN architectural

OTHER MODELS LSTM for Time Series Analysis





Long Short Term Memory (LSTM) networks are a special kind of Recurrent Neural Networks (RNNs)

LSTMs learn long-term dependencies in data by remembering information for long periods of time

The LSTM chain structure consists of four neural network layers interacting in a specific way

x +

tanh

x

ℴ ℴ ℴ

x

tanh

xt

ht

x +

tanh

x

ℴ ℴ ℴ

x

tanh

Xt+1

Ht+1

x +

tanh

x

ℴ ℴ ℴ

x

tanh

Xt-1

Ht-1

(each line carries an entire vector) ht

LSTM model

xt

(uses sigmoid ℴ)

OTHER MODELS LSTM – Application & Code Example


LSTM models work quite well to predict power but needs

to be trained and tuned for different power stations

Observing that some peaks can not be ‘learned‘

Ongoing Master thesis – further results pending

SUMMARY


SUMMARY

Mindset

Think traditional machine learning still relevant

Selected new approaches with specific deep learning per problem (CNN, LSTM, etc.)

Skillset

Basic knowledge of machine learning required for deep learning

Validation (i.e. model selection) and regularization still valid(!)

Toolset

Parallel versions of traditional machine learning methods exist (piSVM, HPDBSCAN)

Tensorflow & Keras just one example but offer good performance given good install(!)

Explore technology trends, e.g. specific chips for deep learning, NAM, etc.


REFERENCES


11.04.2018

7

REFERENCES (1)

[1] G. Cavallaro, M. Riedel, J.A. Benediktsson et al., ‘On Understanding Big Data Impacts in Remotely Sensed Image Classification using Support Vector

Machine Methods’, IEEE Journal of Selected Topics in Applied Earth Observation and Remote Sensing, 2015, DOI: 10.1109/JSTARS.2015.2458855

[2] J. Lange, G. Cavallaro, M. Goetz, E. Erlingsson, M. Riedel, ‘The Influence of Sampling Methods on Pixel-Wise Hyperspectral Image Classification with 3D

Convolutional Neural Networks’, Proceedings of the IGARSS 2018 Conference, to appear

[3] E. Erlingsson, G. Cavallaro, M. Riedel, H. Neukirchen, ‘Scaling Support Vector Machines Towards Exascale Computing for Classification of Large-Scale

High-Resolution Remote Sensing Images’, Proceedings of the IGARSS 2018 Conference, to appear

[4] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20(3), pp. 273–297, 1995

[5] M. Riedel, ‘Deep Learning using a Convolutional Neural Network‘, Ghent University, Invited YouTube Tutorial,

Online: https://www.youtube.com/watch?v=gOL1_YIosYk&list=PLrmNhuZo9sgZUdaZ-f6OHK2yFW1kTS2qF

[6] M. Riedel, ‘Introduction to Machine Learning Algorithms‘, Ghent University, Invited YouTube Tutorial,

Online: https://www.youtube.com/watch?v=KgiuUZ3WeP8&list=PLrmNhuZo9sgbcWtMGN0i6G9HEvh08JG0J

[7] G. Cavallaro, N. Falco, M. Dalla Mura and J. A. Benediktsson, "Automatic Attribute Profiles," in IEEE Transactions on Image Processing, vol. 26, no. 4, pp.

1859-1872, April 2017, Online: http://ieeexplore.ieee.org/document/7842555/

[8] YouTube Video, ‘The Deep Learning Revolution’,

Online: https://www.youtube.com/watch?v=Dy0hJWltsyE

[9] Keras Python Deep Learning Library,

Online: https://keras.io/


REFERENCES (2)

[10] Tensorflow Deep Learning Framework,

Online: https://www.tensorflow.org/[9] Keras Python Deep Learning Library,

[11] A Tour of Tensorflow,

Online: https://arxiv.org/pdf/1610.01178.pdf

[12] K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book,

Online: http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049

[13] Big Data Tips, ‘What is a Tensor?‘,

Online: http://www.big-data.tips/what-is-a-tensor

[14] J. Donahue et al., “Decaf: A deep convolutional activation feature for generic visual recognition,” unpublished paper, 2013,

Online: http://arxiv.org/abs/1310.1531

[15] Dimitrios Marmanis et al., ‘Deep Learning Earth Obervation Classification Using ImageNet Pretrained Networks‘, IEEE Geoscience and Remote Sensing

Letters, Volume 13 (1), 2016,

Online: http://ieeexplore.ieee.org/document/7342907/

[16] P. Sermanet et al., ‘OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks’,

Online: http://arxiv.org/abs/1312.6229


REFERENCES (3)

[17] M. Goetz, C. Bodenstein, M. Riedel, ‘HPDBSCAN – Highly Parallel DBSCAN’, in proceedings of the ACM/IEEE International Conference for High

Performance Computing, Networking, Storage, and Analysis (SC2015), Machine Learning in HPC Environments (MLHPC) Workshop, 2015,

Online: http://dx.doi.org/10.1145/2834892.2834894

[18] J. Dean et al., ‘Large scale deep learning’, Keynote GPU Technical Conference, 2015

[19] ImageNet Web page,

Online: http://image-net.org


ACKNOWLEDGEMENTS Previous & current members of the High Productivity Data Processing Research Group

Thesis

Completed

PD Dr.

G. Cavallaro

Dr. M. Goetz

(now KIT)

Thesis

Completed

Senior PhD

Student A.S. Memon

Senior PhD

Student M.S. Memon

MSc M.

Richerzhagen

Thesis

Completed

MSc

P. Glock

(now INM-1)

DEEP

Learning

Startup

MSc

C. Bodenstein

(now Soccerwatch.tv)

PhD Student

E. Erlingsson PhD Student

S. Bakarat

Starting

in Fall

2018

MSc Student

G.S. Guðmundsson

(Landsverkjun)

THANKS Talk shortly available under www.morrisriedel.de

http://dx.doi.org/10.1109/JSTARS.2015.2458855

https://www.youtube.com/watch?v=gOL1_YIosYk&list=PLrmNhuZo9sgZUdaZ-f6OHK2yFW1kTS2qF




https://www.youtube.com/watch?v=KgiuUZ3WeP8&list=PLrmNhuZo9sgbcWtMGN0i6G9HEvh08JG0J

https://www.youtube.com/watch?v=KgiuUZ3WeP8&list=PLrmNhuZo9sgbcWtMGN0i6G9HEvh08JG0J

http://ieeexplore.ieee.org/document/7842555/


https://www.youtube.com/watch?v=Dy0hJWltsyE

https://keras.io/

https://www.tensorflow.org/

https://arxiv.org/pdf/1610.01178.pdf

http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049

http://www.big-data.tips/what-is-a-tensor









http://arxiv.org/abs/1310.1531





http://dx.doi.org/10.1145/2834892.2834894

http://image-net.org/



http://www.morrisriedel.de/

headline der präsentationconference.tr32.de/thursday/2a_riedel_2018-04-05-comparison-mac… · 1...

Documents