nasscom big data and analytics summit 2015:session ix:transitioning from predictive &...
TRANSCRIPT
Transitioning From Predictive Analytics To Artificial Intelligence
Presented byRajeev Rastogi
Predictions with Multimedia (Images, Audio) Data
• Applications: Object recognition, speech recognition
• Raw data (image pixels, audio signals) too low-level, lacks predictive power
• Need higher level feature representations (with predictive power)
Input Data Target
Car
“Hello World”
Image
Audio
2
Higher Level Features: Computer Vision
• Computer vision researchers have developed hand-crafted features over the past decade
• But, how do we automatically generate high-level feature representations?
SIFT HoG
3
Deep Learning: Deep Neural Networks
• Multiple hidden layers learn higher level feature representations
• Non-linear sigmoid function performs mapping between layers
• Deep architectures can model any function without exponential blowup
• Edge weights learned using backpropagationInput Data
Outputs
HiddenLayers
0
)( 1isp
1
0
jjiji
i wsbsp
)exp(1)( 11
4
Deep Learning: Model Pre-training
• Key Challenge: Modeled functions are highly non-convex with local minima. Poor initialization of weights can cause algorithm to get stuck in local minima.
• Model pre-training used to initialize weights. Weights learned one layer at a time (with previous layer as input) using unsupervised (e.g. RBMs) and supervised techniques.
Input Input
Pre-training step 1 Pre-training step 2
5
Pixels
Edges
Object parts(combination
of edges)
Object models
Andrew Ng. “Machine Learning and AI via Brain Simulations”
Training set: Alignedimages of faces
Deep Learning: Example of Learned Features
6
Deep Learning Success in Computer Vision
• ImageNet Large Scale Visual Recognition Challenge
– 1000 categories, 1.5 million labeled examples
• Deep learning model [Krizhevsky,
Hinton 2012]– 650K neurons, 832M synapses, 60M parameters– Trained with backprop on GPU
• Error rate: 15% (whenever correct class
isn’t in top 5)• Previous state-of-the-art: 25% error
7
Deep Learning Success in Speech Recognition
• [Zeiler et al. 2013]
• Several large technology companies have deployed Deep Learning-based speech recognition system in their products
Number of hidden layers Word error rate %1 162 12.84 11.48 10.9
GMM baseline: 15.4%
8
Challenges
• Deep learning models have millions of parameters – billions of training examples needed to learn parameters.
• Models are learned using backpropagation + stochastic gradient descent. This is difficult to parallelize making it difficult to scale to large datasets.
• Deep learning models have numerous hyper-parameters (# hidden layers, # hidden units per layer, regularization, learning rate) that require tuning.
9