using onyx in anger
TRANSCRIPT
![Page 2: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/2.jpg)
Onyxa masterless, cloud scale, fault tolerant, high performance distributed computation system
… written entirely in Clojure
![Page 3: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/3.jpg)
Onyx at• In production for almost a year
• ETL
• online machine learning
• offline (batch) machine learning
• ad-hoc analysis
![Page 4: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/4.jpg)
Self-service infrastructure for data scientists
![Page 5: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/5.jpg)
1.Onyx at a glance
2.How Onyx rewired my brain
3.Putting “data is code” to work
![Page 6: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/6.jpg)
1.Onyx at a glance
2.How Onyx rewired my brain
3.Putting “data is code” to workDescribing computation
with data
![Page 7: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/7.jpg)
Onyx at a glance
![Page 8: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/8.jpg)
Job =
[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]
[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]
workflow + flow conditions + catalogue [{:onyx/name :add-5
:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
![Page 9: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/9.jpg)
Catalogue[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
Vanilla Clojure function(defn adder [n {:keys [x] :as segment}] (assoc segment :x (+ n x))))
Plugins (I/O)seq, async, Kafka, Datomic, SQL, S3, SQS, …
parameter
self-documenting
![Page 10: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/10.jpg)
Computation entirely described with data
data is
code!
![Page 11: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/11.jpg)
Everything can be run locally!
![Page 12: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/12.jpg)
Testing without mocking
![Page 13: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/13.jpg)
How Onyx rewired my brain
![Page 14: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/14.jpg)
It’s not about scaling, but clean architecture
![Page 15: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/15.jpg)
My goto architecture
KafkaDB EventsOnyx Onyx
Onyx
Persist all events to S3 • time travel • query with AWS Athena
![Page 16: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/16.jpg)
Decomplect everything
![Page 17: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/17.jpg)
Computation graphs
![Page 18: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/18.jpg)
Putting “data is code” to work
![Page 19: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/19.jpg)
Interlude: queryable data descriptions with spec
• s/registry, s/form
• Build a graph (Datomic)
Interact with your type system!co
de is d
ata!
![Page 20: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/20.jpg)
Case study: autogenerating materialised views
KafkaMaterialised views
Events External data
Automatic view generation• Event & attribute ontology
• Manual (via spec) • Inferred
• Statistical analysis (seasonality detection, outlier removal, …)
Onyx Onyx
Onyx
![Page 21: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/21.jpg)
Automatic view generation
1. Walk spec registry
2. Apply rules
1. Define new view (spec)
2. Trigger Onyx job that creates the view
⤾
![Page 22: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/22.jpg)
Code is data or
data is code?
![Page 23: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/23.jpg)
Takeouts
![Page 24: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/24.jpg)
Onyx is production ready
![Page 25: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/25.jpg)
Everything should be live and interactive
![Page 26: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/26.jpg)
Computation graphs are a great way to structure data processing code
![Page 27: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/27.jpg)
Queryable data and computation descriptions supercharge interactive development and are a great building block for automation
![Page 29: Using Onyx in anger](https://reader031.vdocument.in/reader031/viewer/2022030307/58e4a14e1a28abf5428b62b7/html5/thumbnails/29.jpg)
viebel.github.io/klipse/examples/onyx.html
onyxplatform.org
onyxplatform.org/jekyll/update/2017/02/08/Pyroclast-Preview-Simulation.html