quoc le, slides mlconf 11/15/13
DESCRIPTION
TRANSCRIPT
![Page 1: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/1.jpg)
Large Scale Deep Learning
Quoc V. Le Google & CMU
![Page 2: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/2.jpg)
Deep Learning
• Google is using Machine Learning • Machine Learning is difficult • Requires domain knowledge from human experts
Deep Learning:
• Great performances for many problems
• Works well with a large amount of data
• Requires less domain knowledge
Focus:
• Scale deep learning to bigger models and bigger problems
Quoc V. Le
![Page 3: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/3.jpg)
Deep Learning
• Google is using Machine Learning • Machine Learning is difficult • Requires domain knowledge from human experts
Deep Learning:
• Great performances for many problems
• Works well with a large amount of data
• Requires less domain knowledge
Focus:
• Scale deep learning to bigger models and bigger problems
Quoc V. Le
![Page 4: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/4.jpg)
Quoc V. Le
What is Deep Learning?
![Page 5: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/5.jpg)
Quoc V. Le
x
v = g(B u)
…
A
(images, audio, texts, etc.)
u = g(A x)
What is Deep Learning?
B
![Page 6: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/6.jpg)
Quoc V. Le
x
v = g(B u)
…
A
(images, audio, texts, etc.)
u = g(A x)
What is Deep Learning?
B
![Page 7: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/7.jpg)
Quoc V. Le
…
Pixels
High-level features by Deep Learning
Edge detectors
Face detector, Cat detector
![Page 8: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/8.jpg)
Model
Training Data
Quoc V. Le
Google’s DistBelief
Goal: Train deep learning on many machines Model: A multiple layered architecture
Forward pass to compute the features Backward pass to compute the gradient
![Page 9: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/9.jpg)
Model
DistBelief distributes a model across multiple machines and multiple cores.
Training Data
Machine (Model Partition)
Quoc V. Le
Model partition with DistBelief
![Page 10: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/10.jpg)
Model
Machine (Model Partition)
Core Training Data
Quoc V. Le
DistBelief distributes a model across multiple machines and cores.
Model partition with DistBelief
![Page 11: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/11.jpg)
Model
Training Data
Stochastic Gradient Descent (SGD)
Model parameters are partitioned
Can use up to 1000 cores
Quoc V. Le
Model partition with DistBelief
![Page 12: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/12.jpg)
Model
Training Data
But training is still slow on large data sets
Can we add more parallelism? Idea: Train multiple models on different partitions of the data, and merge them
Quoc V. Le
Model partition with DistBelief
![Page 13: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/13.jpg)
Parameter Server
Model Workers
Data Shards
p’ = p + ∆p
∆p p’
Quoc V. Le
Data partition with DistBelief
![Page 14: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/14.jpg)
Model parallelism via model partitioning
Data parallelism via data partitioning and asynchronous communications
DistBelief can scale to billion examples and use 100,000 cores or more
Thanks to its speed, DistBelief dramatically improves many applications
Quoc V. Le
Parallelism in DistBelief
![Page 15: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/15.jpg)
Quoc V. Le
Voice Search Photo Search Text Understanding
Applications
![Page 16: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/16.jpg)
label!
Voice Search
Speech frame
Hidden layers with 1000s nodes
Classifier
Quoc V. Le
![Page 17: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/17.jpg)
Quoc V. Le
Voice Search
![Page 18: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/18.jpg)
Quoc V. Le
Voice Search Photo Search Text Understanding
Applications
![Page 19: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/19.jpg)
Photo Search
![Page 20: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/20.jpg)
Quoc V. Le
Cat detector Front page of New York Times
![Page 21: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/21.jpg)
Seat-belt Boston rocker
Archery Shredder
![Page 22: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/22.jpg)
Amusement, Park
Face
Hammock
![Page 23: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/23.jpg)
Google+ PhotoSearch
![Page 24: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/24.jpg)
Quoc V. Le
Voice Search Photo Search Text Understanding
Applications
![Page 25: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/25.jpg)
Text understanding
Quoc V. Le
Very useful but also difficult
We should try to understand the meaning of words
Deep Learning can learn the meaning of words
![Page 26: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/26.jpg)
~100-D vector space
dolphin
Clinton Paris
Text understanding
whale
Obama
Quoc V. Le
![Page 27: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/27.jpg)
the! cat! sat! on! the!
E E E E Word Matrix
Hidden Layers
Classifier
Predicting the next word in a sentence
is a matrix of dimension ||Vocab|| x d E
Quoc V. Le
![Page 28: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/28.jpg)
Visualizing the word vectors
• Example nearest neighbors trained on Google News
apple Apple iPhone
![Page 29: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/29.jpg)
Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013
Quoc V. Le
Relation Extraction
![Page 30: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/30.jpg)
Quoc V. Le
Machine Translation
![Page 31: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/31.jpg)
Quoc V. Le
Model partition Data partition
Voice Search Photo Search Text Understanding
Summary
![Page 32: Quoc le, slides MLconf 11/15/13](https://reader030.vdocument.in/reader030/viewer/2022020122/54b72e914a795916198b48ab/html5/thumbnails/32.jpg)
Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato, Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke
Additional Thanks:
Greg Corrado Jeff Dean Matthieu Devin Kai Chen
Rajat Monga Andrew Ng Paul Tucker Ke Yang
Joint work with