save the princess
TRANSCRIPT
![Page 2: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/2.jpg)
We will build an AI to play a silly little game by training a policy network defined using Cortex, using a hot new training algorithm we will implement from the paper first using Neanderthal and then make massively parallel using Onyx.
![Page 3: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/3.jpg)
The game• Find the shortest path to the princess
• Moves: up, down, left, right
• Don’t fall off the edge of the world
![Page 4: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/4.jpg)
The game• Find the shortest path to the princess
• Moves: up, down, left, right
• Don’t fall off the edge of the world
![Page 5: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/5.jpg)
Computers playing computer games
![Page 6: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/6.jpg)
Reinforcement learning
• Interact with the environment [embodied cognition]
• Not a single solution but an action to take given environment [model of the world + model of self, consciousness?]
• Learns via positive/negative feedback
![Page 7: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/7.jpg)
Reinforcement learning: how it’s usually done
Train a deep neural network using raw sensor data, usually pixels (ie. no feature engineering)
![Page 8: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/8.jpg)
… but there is another way
![Page 9: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/9.jpg)
![Page 10: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/10.jpg)
population
mutate crossover
next generation
solution
jitter jitter … jitter
update
populate
sample weighted
Classic evolutionary algorithm Evolution strategies
combine weighted
![Page 11: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/11.jpg)
Using ES to train a neural network
Benefits
• highly parallelizable • more robust (less hyperparameters, more
stabile, doesn’t care about the properties of reward function)
• can exploit structure• less computationally expensive
Downsides
• takes longer to converge
• noise must lead to different outcomes
Instead of backpropagation use ES on weights
![Page 12: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/12.jpg)
Let’s build it!
![Page 13: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/13.jpg)
1. ES
![Page 14: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/14.jpg)
![Page 15: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/15.jpg)
Neanderthal
• Blazing fast matrix and linear algebra library
• Based on ATLAS and LAPACK
• Runs on CPUs and GPUs
• A study in writing efficient code
• Somewhat terse API (fluokitten helps)
![Page 16: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/16.jpg)
![Page 17: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/17.jpg)
![Page 18: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/18.jpg)
![Page 19: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/19.jpg)
![Page 20: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/20.jpg)
x+y ax+y ax+by
![Page 21: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/21.jpg)
x+y ax+y ax+by
![Page 22: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/22.jpg)
x+y ax+y ax+by
![Page 23: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/23.jpg)
x+y ax+y ax+by
![Page 24: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/24.jpg)
![Page 25: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/25.jpg)
![Page 26: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/26.jpg)
1.1 ES parallelized
![Page 27: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/27.jpg)
Onyxa masterless, cloud scale, fault tolerant,
high performance distributed computation system
![Page 28: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/28.jpg)
Job =
[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]
[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]
workflow + flow conditions + catalogue [{:onyx/name :add-5
:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
![Page 29: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/29.jpg)
[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
Job =
[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]
[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]
workflow + flow conditions + catalogue
Describing computation
with data
![Page 30: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/30.jpg)
![Page 31: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/31.jpg)
in
jitter jitter … jitter
update
outmonitor
populate
![Page 32: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/32.jpg)
same channel
in
jitter jitter … jitter
update
outmonitor
populate
![Page 33: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/33.jpg)
accumulates state :(
in
jitter jitter … jitter
update
outmonitor
populate
![Page 34: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/34.jpg)
Resilience and handling state
• Activity log
• Window and trigger states checkpointed
• Resume points (transfer state from job to job)
• Configurable flux policies (continue/kill/recover)
![Page 35: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/35.jpg)
Computation graphs are a great way to structure data processing code
![Page 36: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/36.jpg)
2. Policy network
![Page 37: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/37.jpg)
Cortex• Neural networks, regression and feature learning
• Clean idiomatic Clojure API
• Computation encoded as data (and makes good use of it)
• Uses core.matrix for heavy lifting
![Page 38: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/38.jpg)
Encode princess = 1, hero = -1
![Page 39: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/39.jpg)
3. Game
![Page 40: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/40.jpg)
Simulation• Find the shortest path to the
princess
• Don’t fall off the edge of the world
![Page 41: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/41.jpg)
Reward function• Play the entire game (planning)
• Collect multiple playthoughts to lessen effects of randomness
![Page 42: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/42.jpg)
Takeouts
![Page 43: Save the princess](https://reader034.vdocument.in/reader034/viewer/2022052514/5a6eefc57f8b9a70728b6d1d/html5/thumbnails/43.jpg)
Explore
Have fun
Go on an adventure!