Download - A data layer in clojure
![Page 2: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/2.jpg)
• Started in machine learning • Turned to data science and
helped 20+ companies become data-driven
• Now leading data science department at GoOpti
![Page 3: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/3.jpg)
Self-service infrastructure for data scientists
![Page 4: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/4.jpg)
The analytics chasmIdeal. Almost real-time, can be done during brainstorming without disrupting flow
< 2min < 20min project
squeeze in somewhere in the day
fail
roadmapahoy!
![Page 5: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/5.jpg)
My goto architecture
KafkaDB EventsOnyx Onyx
Onyx
Persist all events to S3 • time travel • query with AWS Athena
![Page 6: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/6.jpg)
Onyxa masterless, cloud scale, fault tolerant, high performance distributed computation system
… written entirely in Clojure
![Page 7: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/7.jpg)
Clojure at a glance• Lisp running on JVM
• Functional, dynamic, immutable
• Excellent concurrency and state management support
• Unparalleled data manipulation
• Good Java interoperability
![Page 8: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/8.jpg)
Onyx at• In production for almost a year
• ETL
• online machine learning
• offline (batch) machine learning
• ad-hoc analysis
![Page 9: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/9.jpg)
Onyx at a glance
![Page 10: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/10.jpg)
Job =
[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]
[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]
workflow + flow conditions + catalogue [{:onyx/name :add-5
:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
![Page 11: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/11.jpg)
Catalogue[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
Vanilla Clojure function(defn adder [n {:keys [x] :as segment}] (assoc segment :x (+ n x))))
Plugins (I/O)seq, async, Kafka, Datomic, SQL, S3, SQS, …
parameter
self-documenting
![Page 12: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/12.jpg)
Computation entirely described with data
data is
code!
![Page 13: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/13.jpg)
Everything can be run locally!
![Page 14: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/14.jpg)
Testing without mocking
![Page 15: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/15.jpg)
Resilience and handling state
• Activity log
• Window and trigger states checkpointed
• Resume points
• Configurable flux policies
![Page 16: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/16.jpg)
How Onyx rewired my brain
![Page 17: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/17.jpg)
It’s not about scaling, but clean architecture
![Page 18: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/18.jpg)
Decomplect everything
![Page 19: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/19.jpg)
Computation graphs
![Page 20: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/20.jpg)
Machine learning with Onyx
• Hyperparameter server build on top of Onyx parameters
• Batch & streaming mode
• Monitoring: performance metrics, side channels for partial results/introspection into computiation
• Everything is data so easy to build tools around
![Page 21: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/21.jpg)
Onyx/Pyroclast
![Page 22: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/22.jpg)
Putting “data is code” to work
![Page 23: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/23.jpg)
Describing data with clojure.spec
composing smaller parts into the whole }
code i
s data
!
![Page 24: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/24.jpg)
Queryable data descriptions
Turn spec into a graph
A fully interactive and open type system!
order
promo code
useraccount age
countryalways always
alwaysmaybe
![Page 25: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/25.jpg)
“Composition is about decomposing.”
— E. Normand
![Page 26: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/26.jpg)
Case study: autogenerating materialised views
KafkaMaterialised views
Events External data
Automatic view generation• Event & attribute ontology
• Manual (via spec) • Inferred
• Statistical analysis (seasonality detection, outlier removal, …)
Onyx Onyx
Onyx
![Page 27: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/27.jpg)
Automatic view generation
1. Walk spec registry
2. Apply rules
1. Define new view (spec)
2. Trigger Onyx job that creates the view
⤾
![Page 28: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/28.jpg)
Takeouts
![Page 29: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/29.jpg)
Everything should be live and interactive
![Page 30: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/30.jpg)
Computation graphs are a great way to structure data processing code
![Page 31: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/31.jpg)
Queryable data and computation descriptions supercharge interactive development and are a great building block for automation
![Page 33: A data layer in clojure](https://reader031.vdocument.in/reader031/viewer/2022022415/5a6eefc57f8b9a70728b6d23/html5/thumbnails/33.jpg)
viebel.github.io/klipse/examples/onyx.html
onyxplatform.org
onyxplatform.org/jekyll/update/2017/02/08/Pyroclast-Preview-Simulation.html