modern systems for neural networks · 2016. 7. 14. · modern neural networks applications: nlp...
TRANSCRIPT
![Page 1: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/1.jpg)
Modern Systems for Neural
NetworksValentin Dalibard
![Page 2: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/2.jpg)
This talk
1.Practicalities of training Neural Networks
2.Leveraging heterogeneous hardware
Source: wikipedia
![Page 3: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/3.jpg)
Modern Neural Networks Applications: Image classification
![Page 4: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/4.jpg)
Modern Neural Networks Applications: NLP
Source: Andrej Karpathy: The Unreasonable
Effectiveness of Recurrent Neural Networks
Paul Graham generator:
“The surprised in investors weren’t going to raise money. I’m not
the company with the time there are all interesting quickly, don’t
have to get off the same programmers. There’s a super-angel
round fundraising, why do you can do. If you have a different
physical investment are become in people who reduced in a
startup with the way to argument the acquirer could see them just
that you’re also the founders will part of users’ affords that and an
alternation to the idea. [2] Don’t work at first member to see the
way kids will seem in advance of a bad successful startup. And if
you have to act the big company too.”
![Page 5: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/5.jpg)
Modern Neural Networks Applications: Reinforcement Learning
![Page 6: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/6.jpg)
Training Procedure: Stochastic Gradient Descent
Optimize the weights of the neurons to yield good predictions
Use “minibatches” of inputs to estimate the gradient
Source: wikipedia
![Page 7: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/7.jpg)
Software platforms
Torch
(Lua)
Theano
(Python)
Tensorflow
(Python/C++)
Caffe
(C++)
KerasLasagne
![Page 8: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/8.jpg)
Single Machine Setup:
One or a couple beefy GPUs
![Page 9: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/9.jpg)
Distribution: Parameter Server Architecture
Source: Dean et al.: Large Scale Distributed Deep Networks
![Page 10: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/10.jpg)
Trends in software architecture
Fewer bits per floating point
Integers rather than floating points
![Page 11: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/11.jpg)
Optimizing the scheduling on a heterogeneous cluster
Which machines to use as workers? As parameter servers?
↗workers => ↗computational power & ↗communication
How much work to schedule on each worker?
Must load balance
![Page 12: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/12.jpg)
Ways to do an Optimization
Random Search
Genetic algorithm /
Simulated annealing Bayesian Optimization
No overhead Slight overhead High overhead
High #evaluation Medium-high #evaluation Low #evaluation
![Page 13: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/13.jpg)
Bayesian Optimization
Bayesian Optimization
Find parameter values with high
performance in the model
Evaluate the
objective function
at that point
Update the model
with this measurement
![Page 14: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/14.jpg)
Bayesian Optimization
Predicted
Performance
Parameter
Space
Probabilistic
Model
Utility Function Performance
![Page 15: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/15.jpg)
Structured Bayesian Optimization
Predicted
Performance
Parameter
Space
Probabilistic
Parameters
Utility FunctionPerformance &
Runtime properties
Probabilistic Program
![Page 16: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/16.jpg)
Optimizing the scheduling of Neural Networks
Two separate models:
Individual machine model: How fast can a machine process k inputs
Network model: How long does it take to transfer the parameters from
parameter servers to workers
Iteratively learn the behavior
![Page 17: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/17.jpg)
Optimizing the scheduling of Neural Networks
![Page 18: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/18.jpg)
More CPU cores aren’t always better
![Page 19: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/19.jpg)
Exposing Tradeoff
![Page 20: Modern Systems for Neural Networks · 2016. 7. 14. · Modern Neural Networks Applications: NLP Source: Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks](https://reader034.vdocument.in/reader034/viewer/2022052100/6039a1864db0242a1e7ebbb1/html5/thumbnails/20.jpg)
Conclusion
Growing demand for Neural networks platforms
Can leverage heterogeneous hardware but requires tuning
Bayesian Optimization can find good scheduling in a
relatively short time