building deep learning powered big data: spark summit east talk by jiao wang and yiheng wang

21
Building Deep Learning Powered Big Data Analytics using BigDL Yiheng Wang, JennieWang BDT / SSG / Intel

Upload: spark-summit

Post on 12-Apr-2017

200 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Intel® Confidential — INTERNAL USE ONLY

Building Deep Learning PoweredBig Data Analytics using BigDLYiheng Wang, Jennie Wang

BDT / SSG / Intel

2

What is BigDL?

BigDL is a distributed deep learning library for Apache Spark*

3

Big data boost deep learning Production ML/DL system is Complex

Why BigDL?

Andrew NG, Baidu, NIPS 2015 Paper

4

Why BigDL?

BigDL open sourced on Dec 30, 2016

§ Write deep learning applications as standard Spark programs

§ Run on top of existing Spark or Hadoop clusters(No change to the clusters)

§ Rich deep learning support

§ High performance powered by Intel MKL and multi-threaded programming

§ Efficient scale-out with an all-reduce communications on Spark

5

usage and examples of bigdl

6

Fraud Transaction Detection

Fraud transaction detection is very import to finance companies. A good fraud detection solution can save a lot of money.

ML solution challenge§ Data cleaning§ Feature engineering§ Unbalanced data § Hyper parameter

7

Fraud Transaction Detection

§ History data is stored on Hive

§ Easily data preprocess/cleaning with Spark-SQL

§ Spark ML pipeline for complex feature engineering

§ Under sample + Bagging solve unbalance problem

§ Grid search for hyper parameter tuning

Powered by BigDL

8

Product Defect Detection and Classification

Data source

§ Cameras installed on manufactory pipeline

Task

§ Detect defect from the photos

§ Classify the defect

9

Product Defect Detection and Classification

(KeyStone ML Pipeline)

10

Object Detection on PASCAL(http://host.robots.ox.ac.uk/pascal/VOC/)

11

Fast-RCNN

§ Faster-RCNN is a popular object detection framework

§ It share the features between detection network and region proposal network

Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.

12

Object Detection with Fast-RCNN

See the code at: https://github.com/intel-analytics/BigDL/pull/387

13

Language Model with RNN

TextPreprocessing

RNNModelTraining

SentenceGenerating

§ Sentence Tokenizer

§ Dictionary Building

§ Input Document Transformer

Generated sentences with regard to trigger words.

14

RNN Model

See the code at: https://github.com/intel-analytics/BigDL/tree/master/dl/src/main/scala/com/intel/analytics/bigdl/models/rnn

15

Learn from Shakespeare Poems Output of RNN:

Long live the King . The King and Queen , and the Strange of the Veils of the rhapsodic . and grapple, and the entreatments of the pressure .

Upon her head , and in the world ? `` Oh, the gods ! O Jove ! To whom the king : `` O friends !

Her hair, nor loose ! If , my lord , and the groundlings of the skies . jocund and Tasso in the Staggering of the Mankind . and

16

Fine-tune Caffe/Torch Model on Spark

BigDL Model

Fine-tune

Melancholy

Sunny

Macro

CaffeModel

BigDLModel

TorchModel

Load

• Train on different dataset based on pre-trained model• Predict image style instead of type• Save training time and improve accuracy

Image source: https://www.flickr.com/photos/

17

Accuracy increases 10% and converge time decreases.

Fine-tune Caffe/Torch Model on Spark

18

Integration with Spark Streaming

Spark Streaming RDDs

EvaluatorBigDLModel

StreamWriter

BigDL integarates with Spark Streaming for runtime training and prediction

HDFS/S3

Kafka

Flume

Kinesis

Twitter

Train

Predict

19

Tight Integration with SparkSQLand DataFrames

df.select($’image’).withColumn(

“image_type”, ImgClassifier(“image”))

.filter($’image_type’ == ‘dog’)

.show()

Image classification on ImageNet(http://www.image-net.org)

20

More BigDL Examples

BigDL provide examples to help developer play with bigdl and start with popular models.

https://github.com/intel-analytics/BigDL/wiki/Examples

Models(Train and Inference example code):§ LeNet, Inception, VGG, ResNet, RNN, Auto-encoder

Examples:

• Text Classification

• Image Classification

• Load Torch/Caffe model