Deep Learning and Recurrent Neural Networks in the Enterprise
StampedeConSt. Louis 2016
Josh Patterson, Skymind
Presenter: Josh Patterson
Past
Research in Swarm Algorithms: Real-time optimization techniques in mesh sensor networks
TVA / NERC: Smartgrid, Sensor Collection, and Big Data
Cloudera: Principal SA, Working with Fortune 500
Patterson Consulting: Working with Fortune 500 on Big Data, ML
Today
Skymind, Director Field Engineering
[email protected] / @jpatanoogaDL4J Co-creator,
Co-Author on Upcoming Oreilly Book“Deep Learning: A Practitioner’s Approach”
Topics
• What is Deep Learning?• DL4J• Recurrent Neural Network Applications
WHAT IS DEEP LEARNING?
Defining Deep Learning
• Higher neuron counts than in previous generation neural networks
• Different and evolved ways to connect layers inside neural networks
• More computing power to train• Automated Feature Learning
Automated Feature Learning
• Deep Learning can be thought of as workflows for automated feature construction– From “feature construction” to “feature learning”
• As Yann LeCun says:– “machines that learn to represent the world”
These are the features learned at each neuron in a Restricted Boltzmann Machine (RBMS)
These features are passed to higher levels of RBMs to learn more complicated things.
Part of the “7” digit
Unreasonable Effectiveness: Benchmark Records
1. Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014) 2. Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014) 3. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014) 4. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014) 5. Medium vocabulary speech recognition (Geiger et al., Interspeech 2014) 6. English to French translation (Sutskever et al., Google, NIPS 2014) 7. Audio onset detection (Marchi et al., ICASSP 2014) 8. Social signal classification (Brueckner & Schulter, ICASSP 2014) 9. Arabic handwriting recognition (Bluche et al., DAS 2014) 10. TIMIT phoneme recognition (Graves et al., ICASSP 2013) 11. Optical character recognition (Breuel et al., ICDAR 2013) 12. Image caption generation (Vinyals et al., Google, 2014) 13. Video to textual description (Donahue et al., 2014) 14. Syntactic parsing for Natural Language Processing (Vinyals et al., Google, 2014) 15. Photo-real talking heads (Soong and Wang, Microsoft, 2014).
Four Major Architectures
• Deep Belief Networks• Convolutional Neural Networks• Recurrent Neural Networks• Recursive Neural Networks
Quick Usage Guide
• If I have Timeseries or Audio Input– I should use a Recurrent Neural Network– Examples: Fraud Detection, Anomaly Detection
• If I have Image input– I should use a Convolutional Neural Network
• If I have Video input– I should use a hybrid Convolutional + Recurrent
Architecture!
Convolutional Generated Art
The More Things Change…
• Deep Learning is still trying to answer the same fundamental questions such as:– “is this image a face?”
• The difference is Deep Learning makes hard questions easier to answer with better architectures and more computing power– We do this by matching the correct architecture
w the right problem
DL4JBuilding Deep Neural Networks with
DL4J• “The Hadoop of Deep Learning”
– Java, Scala, and Python APIs– ASF 2.0 Licensed
• Java implementation– Parallelization (Yarn + Spark)– GPU support
• Also Supports multi-GPU per host
• Runtime Neutral– Local– Hadoop / YARN + Spark
• https://github.com/deeplearning4j/deeplearning4j
DL4J Workflow Toolchain
ETL(DataVec)
Vectorization
(DataVec)
Modeling
(DL4J)
Evaluation
(Arbiter)
Execution Platforms: Spark, Single Machine
ND4J - Linear Algebra Runtime: CPU, GPU
ND4J: The Need for Speed• Javacpp (cython for java)
– Auto generate JNI bindings for C++ by parsing classes– Allows for easy maintenance and deployment of c++ binaries in java
• CPU backends– Openmp (multithreading within native operations)– Openblas or MKL (BLAS operations)– SIMD-extensions
• GPU backends– DL4J supports Cuda 7.5 at the moment, and will support 8.0
support as soon as it comes out.– Leverages cudnn as well
Prepping Data is Time Consuming
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#633ea7f67f75
Preparing Data for Modeling is Hard
DataVec
• DataVec is a tool for machine learning ETL (Extract, Transform, Load) operations. – Spark-Enabled and focused on Supporting DL4J
• Also performs vectorization– Image, CSV, Sequences (timeseries), more
• Open Source, ASF 2.0 Licensed– https://github.com/deeplearning4j/DataVec
RECURRENT NEURAL NETWORK APPLICATIONS
Using DL4J for
Source: IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.
.
Transactional Data Explosion
• 2,500 exabytes of new information in 2012 with Internet as primary driver• Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
Relational
Transactional (Logs, Sensors)
(You)
NERC Sensor Data CollectionopenPDC PMU Data Collection circa 2009
• 120 Sensors• 30 samples/second• 4.3B Samples/day• Housed in Hadoop
Sensor Timeseries Classification with RNNs
• Recurrent Neural Networks have the ability to model change of input over time
• Older techniques (mostly) do not retain time domain– Hidden Markov Models do…• but are more limited
• Key Takeaway: – For working with Timeseries data, RNNs will be
more accurate
RNN Architectures
Standard supervised learning
Imagecaptioning
Sentiment analysis
Video captioning,Natural language translation
Part of speechtagging
Generative models for text
Anomaly Detection
• Model the normal patterns in the data• Autoencoders give us the ability to look at
data that it hasn’t seen before– Find anomalous patterns in sequences– Can also use RNNs for pattern classification
• Interesting Industry Applications– Telecom– Financial Services
Audio Applications
• Text-to-Speech• Recognize specific songs / audio• Enables natural language interfaces
“Google is living a few years in the future and sending the rest of us
messages”-- Doug Cutting in 2013
• However– Most organizations are not built like Google• (and Jeff Dean does not work at your company…)
• Anyone building Next-Gen infrastructure has to consider these things
Certified on Two Hadoop Distributions
• Running Spark on Hadoop via YARN gives us– Sharing cluster resources between heterogeneous
workloads concurrently– Access to the yarn scheduler capabilities– Better control of executors in Spark– Kerberos support for security
• Certified on CDH 5.4• Certified on HDP 2.4– [ Coming later this month ]
Questions?
Thank you for your time and attention
“Deep Learning: A Practitioner’s Approach” (Oreilly, October 2016)
Running DL4J Workflows on Spark
• DataVec is built to scale out via Spark RDDs– RDD<LabeledPoint>– RDD<DataSet>
• DL4J Uses same MultiLayerConfiguration as single host version– Uses SparkDl4jMultiLayer to drive the training on spark– Performs Parameter Averaging
spark-submit --class io.skymind.spark.dl4j.datavec.BasicDataVecExample --master yarn --num-executors 1 --properties-file ./spark_extra.props ./Skymind_spark-1.0-SNAPSHOT.jar