hadoop summit 2014 distributed deep learning
DESCRIPTION
Deep Learning on Hadoop with DeepLearning4j and Metronome Deep-learning is useful in detecting anomalies like fraud, spam and money laundering; identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; recognizing faces and voices. Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models. The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.TRANSCRIPT
Deep Learning on Hadoop
Scaleout Deep Learning on YARN
Adam Gibson
Email:[email protected]
Twitter:
@agibsonccc
Github:
https://github.com/agibsonccc
Slideshare:
http://slideshare.net/agibsonccc/
Instructor at
http://zipfianacademy.com/
Wired Coverage:
http://www.wired.com/2014/06/skymind-deep-learning/
Josh Patterson
Email:[email protected]
Twitter:
@jpatanooga
Github:
https://github.com/jpatanooga
Past
Published in IAAI-09:
“TinyTermite: A Secure Routing Algorithm”
Grad work in Meta-heuristics, Ant-algorithms
Tennessee Valley Authority (TVA)
Hadoop and the Smartgrid
Cloudera
Principal Solution Architect
Today: Patterson Consulting
Overview
• What is Deep Learning?
• Deep Belief Networks
• Implementation on Hadoop/YARN
• Results
What is Deep Learning?
What is Deep Learning?
Algorithm that tries to learn simple features in lower layers
And more complex features in higher layers
Interesting Properties of Deep Learning
Reduces a problem with overfitting in neural networks.
Introduces new techniques for "unsupervised feature learning”
introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.
Chasing Nature
Learning sparse representations of auditory signals
leads to filters that closely correspond to neurons in early audio processing in mammals
When applied to speech
Learned representations showed a striking resemblance to the cochlear filters in the auditory cortext
Yann LeCunn on Deep Learning
Has become the dominant method for acoustic modeling in speech recognition
Quickly becoming the dominant method for several vision tasks such as
object recognition
object detection
semantic segmentation.
Deep Belief Networks
What is a Deep Belief Network?
Generative probabilistic model
Composed of one visible layer
Many hidden layers
Each hidden layer learns relationship between units in lower layer
Higher layer representations tend to become more complext
Restricted Boltzmann Machines
Unsupervised model: Does feature learning by repeated sampling of the input data. Learns how to reconstruct data for good feature detection. RBMs have different formulas for different kinds of data:
Binary
Continuous
DeepLearning4J
Implementation in Java
Self-contained & built on Akka, Hazelcast, Jblas
Distributed to run faster and with more features than current Theano-based implementations.
Talks to any data source, expects one format.
Vectorized Implementation
Handles lots of data concurrently.
Any number of examples at once, but the code does not change.
Faster: Allows for native/GPU execution.
One format: Everything is a matrix.
DL4J vs Theano Perf
GPUs are inherently faster than normal native.
Theano is not distributed, and GPUs have very low RAM.
DL4J allows for situations where you have to “throw CPUs at it.”
What are Good Applications for Deep Learning?
Image Processing
High MNIST Scores
Audio Processing
Current Champ on TIMIT dataset
Text / NLP Processing
Word2vec, etc
Deep Learning on
Hadoop
Past Work: Parallel Iterative Algorithms on YARN
Started with
Parallel linear, logistic regression
Parallel Neural Networks
Packaged in Metronome
100% Java, ASF 2.0 Licensed, on github
19
Parameter Averaging
McDonald, 2010
Distributed Training Strategies for the Structured Perceptron
Langford, 2007
Vowpal Wabbit
Jeff Dean’s Work on Parallel SGD
DownPour SGD
20
MapReduce vs. Parallel
IterativeInput
Output
Map Map Map
Reduce Reduce
ProcessorProcessor ProcessorProcessor ProcessorProcessor
Superstep 1Superstep 1
ProcessorProcessor ProcessorProcessor
Superstep 2Superstep 2
. . .
ProcessorProcessor
21
SGD: Serial vs Parallel
Model
Training Data
Worker 1
Master
Partial Model
Global Model
Worker 2
Partial Model
Worker N
Partial Model
Split 1 Split 2 Split 3
…
Managing Resources
Running through YARN on hadoop is important
Allows for workflow scheduling
Allows for scheduler oversight
Allows the jobs to be first class citizens on Hadoop
And share resources nicely
Parallelizing Deep Belief Networks
Two phase training
Pre Train
Fine tune
Each phase can do multiple passes over dataset
Entire network is averaged at master
PreTrain and Lots of Data
We’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks
Allows for the use of far less unlabeled data
Allows us to more easily modeled the massive amounts of structured data in HDFS
Results
DBNs on IR Performance
Faster to Train.
Parameter averaging is an automatic form of regularization.
Adagrad with IR allows for better generalization of different features and even pacing.
Scale Out Metrics
Batches of records can be processed by as many workers as there are data splits
Message passing overhead is minimal
Exhibits linear scaling
Example: 3x workers, 3x faster learning
Usage From Command Line
Run Deep Learning on Hadoop
yarn jar iterativereduce-0.1-SNAPSHOT.jar [props file]
Evaluate model
./score_model.sh [props file]
Handwriting Renders
Faces Renders
…In Which We Gather Lots of Cat Photos
Future Direction
GPUs
Better Vectorization tooling
Move YARN version back over to JBLAS for matrices
References
“A Fast Learning Algorithm for Deep Belief Nets”
Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation (2006)
“Large Scale Distributed Deep Networks”
Dean, Corrado, Monga - NIPS (2012)
“Visually Debugging Restricted Boltzmann Machine Training with a 3D Example”
Yosinski, Lipson - Representation Learning Workshop (2012)