deep learning primer - a brief introduction
DESCRIPTION
Deep learning is receiving phenomenal attention due to breakthrough results in several AI tasks and significant research investment by top technology companies like Google, Facebook, Microsoft, IBM. For someone who has not been introduced to this technology, it may be daunting to learn several concepts such as feature learning, Restricted Boltzmann Machines, Autoencoders, etc all at once and start applying it to their own AI applications. This presentation is the first of several in this series that is intended at practitioners.TRANSCRIPT
Deep Learning Primer
Anantharaman Narayana Iyer 7th June 2014
What is Deep Learning? Deep learning is a Machine Learning technique disBnguished by 2 defining characterisBcs:
1. Deep Architecture • MulBple layers of learning. • Methodologies to train these layers that gets close to global opBmum, alleviaBng the effect of local minima arising due to non-‐convex objecBve funcBon
2. Feature Learning (aka RepresentaBon Learning) • TradiBonal machine learning system design, such as LogisBc Regression, involve manual feature design. In contrast, a deep learning system automaBcally learns the features given the input.
AutomaBc Feature ExtracBon
Machine Learning System
Input
Output
Features
Why is there a phenomenal interest? • Considered the next big thing in
Machine Learning by several experts
• Breakthrough results reported in: – Speech RecogniBon
• MicrosoY Audio Video Indexing Service (MAVIS), reduced word error rates by about 30% on 4 major benchmarks
– Object RecogniBon • MNIST digits recogniBon: error rate 0.27% • Successful image recogniBon by Google
– Natural Language Processing • SENNA system that reported state of the art
results in tasks like POS tagging, Chunking, Named EnBty RecogniBon etc
• Substan3al investments on this technology recently by top technology companies
Building a deep learning system • Many ways to build a deep learning system,
with the defining characterisBcs being: – MulBple layers where each layer performs a
nonlinear transformaBon of the output generated by its preceding layer.
– AutomaBc feature learning where the features are progressively more abstract
– Hierarchical in nature. • Broad approaches/categorizaBons
– Unsupervised or generaBve models – Supervised discriminaBve models – Hybrid (use an unsupervised model as an
aid to perform superior discriminaBon) • Common building blocks for unsupervised
and hybrid approaches – Restricted Boltzmann Machines (RBM) – Autoencoders
ApplicaBon Example Problem: Suppose we need to build a deep learning system to detect if a given digital image contains a human face or not. Inputs are the image pixels and the output is a binary. • We can think of the human face to be composed of a few key facial
consBtuents such as ears, eyes, nose etc. These further can be thought of contours with well defined edges, which in turn are consBtuted by specific paderns of pixels.
• We think of this as generaBng edges from input pixels, from edges generate the facial aspects and from those detect a human face.
• The role of a hidden layer in this system is to perform a nonlinear transform of its inputs (lower level of abstracBon) and produce a more abstract output (as e.g. generaBng a nose object from the given contours).
• Thus we progressively move up in abstracBon starBng from raw pixels and ending up with a face object.
High level implementaBon steps • Suppose we implement the given applicaBon as a deep neural network as:
– Pixel values consBtute the input layer – A single output unit consBtuBng the output layer – We will have 2 hidden layers
• We will use a stacked autoencoder as the basic building block. – An autoencoder (AE) neural network learns to produce an output that is same as input using
unsupervised learning. Thus, given pixel values x as input, the goal of AE is to produce an output image to be same as input.
– As we have 2 hidden layers we will require 2 AE’s – say AE1, AE2. We will create a bodleneck by having a smaller number of hidden units compared to number of input units.
• Layerwise pretraining – Train the AE1 with the available images (that may or may not have an human image)
unsupervised. Now, the output of hidden units of AE1 consBtute the “learnt” features at an abstracBon higher than the input pixels. (e.g. Edges from pixels)
– Cascade the output of the hidden layer of the AE in the previous step with AE2 and train AE2 to learn more abstract features (e.g. facial components from edges)
• Add a logisBc regression layer as the output layer and stack the 2 AE’s and the output layer to consBtute a Neural Network
• Fine tune this network using backpropagaBon with a smaller number of labeled images