hadoop summit 2016
Post on 15-Apr-2017
963 Views
Preview:
TRANSCRIPT
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning using Spark and DL4J for fun and profit
Adam Gibson and Dhruv Kumar
2015Version 1.0
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Who are we?
Adam Gibson- Co founder of Skymind - Wrote DeepLearning4J, ND4J
Dhruv Kumar- Sr Solutions Architect, HWX- MS Umass, Mahout, ASF
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
In this talk
- What’s Deep Learning?- Architectures - Implementation and Libraries in Real Life- Demo!
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning
• One of the many pattern recognition techniques in Data Science
• Excels at rich media applications:• Image recognition• Speech translation• Voice recognition
• Loosely inspired by human brain models• Synonymous with Artificial Neural Networks, Multi Layer
Networks
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise use cases
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Doing this in real life for enterprise
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights ReservedPage 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP FOR DATA AT REST
HDF FOR DATA IN MOTION
ACTIONABLEINTELLIGENCE
MODERN DATA APPSModern Data Applications in Enterprise: Connected, Fast, Intelligent
PERISHABLE INSIGHTS
HISTORICAL INSIGHTS
INTERNETOF
ANYTHING
How do we realize MDA in a Hadoop Centric World?
HDF
Hadoop
HDFS
HBase Hive SOLR
YARN
Storm
Service Management /
Workflow
SIEM
Spark
Raw Network Stream
Network Metadata Stream
Data Stores
Syslog
Raw Application Logs
Other Streaming Telemetry
www.hortonworks.com
NiFi 1
NiFi 2
Storm 1 Kafka 1
Storm 2 Kafka 2
Storm 3 Kafka 3
DataNode 1 HBase 1
Source 1
Source 2
Source 3
Source N
NiFi Nodes
Edge Nodes
Master NodesClients 1
Clients 2
DataNode 2 Hbase 2
DataNode 3 Hbase 3
DataNode 4 Hbase 4
DataNode 5 Hbase 5
DataNode 6 Hbase 6
DataNode 7 Hbase 7
DataNode 8 Hbase 8
DataNode 9 DataNode 10
DataNode 31 DataNode 32
Master 1
Master 2
Master 3
Master 4
Master 5
Worker Nodes
HDF
HDP
World Azure
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storm/Spark Streaming
Storm
Detailed Reference Architecture
HDF
Flume
Sink toHDFS
Transform
Interactive
UI Framework
Hive
Hive
HDFS
HDFS
SOURCE DATA
Server logs
Application Logs
Firewall Logs
CRM/ERP
Sensor
Kafka
Kafka
Stream toHDF
Forward to Storm
Real Time Storage
Spark-ML
Pig
Alerts
Bolt toHDFS
Dashboard
Silk
JMSAlerts
Hive Server
HiveServer
Reporting
BI Tools
High Speed Ingest
Real-Time
Batch Interactive
Machine LearningModels
Spark
Pig
Alerts SQOOP
Flume
Iterative ML
Hbase/Pheonix
HBaseEvent Enrichment
Spark-Thrift
Pig
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 11
For Model Building: Typical Workflow
1. Ingest training data and store it2.Split data set into: training, testing and validation sets3.Vectorize and extract features to go into next step4.Architect multi layer network, initialize5.Feed data and train6.Test and Validate7.Repeat steps 4 and 5 until desired8.Store model9.Put model in app, start generalizing on real data.
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 12
So what do you get?
1. Ingest training data and store it using Nifi or other ingest tools2.Split data set into: training, testing and validation sets3.Vectorize and extract features to go into next step4.Architect multi layer network, initialize5.Feed data and train6.Test and Validate7.Repeat steps 4 and 5 until desired8.Store model9.Put model in app, start generalizing on real data.
Steps 2, 3, 4 and 5: Use libraries such as Deeplearning4j
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 13
Deeplearning4j Architecture
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 14
DL4J: Canova for Vectorization and Ingest
• Canova uses an input/output format system (similar to how Hadoop uses MapReduce)
• Supports all major types of input data (text, CSV, audio, image and video)
• Can be extended for specialized input formats• Connects to Kafka
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 15
ND4J:
• N-dimensional vector library• Scientific computing for JVM• DL4J uses it to do linear algebra for backpropagation• Supports GPUs via CUDA and Native via Jblas • Deploys on Android• DL4J code remains unchanged whether using GPU or
CPU
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 16
How to chose a Neural Net in DL4J core?
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo!
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank Youhortonworks.com
top related