zendesk @ clj-melb
DESCRIPTION
SAAS companies can produce reams of data, but if your primary product is not analytics, this data can often sit under utilised in a myriad of databases optimised for CRUD operations. The question is, how can a company exploit the hidden value this data, effectively the ‘exhaust’ of your day to day business operations, and turn it into value for the customer. At Zendesk, we’ve used the power of Clojure to build a batch analytics system on top of Hadoop that helps us to gain insight into our data stores. Our presentation will provide an introduction to building such a system with Cascalog on top of Clojure for data processing and Midje for test automation. We’ll also give an overview of the tools we’re using for adhoc queries in our production system.TRANSCRIPT
Data @ ZendeskClojure, Cascalog, Hadoops and Datas
Web
Data
But…● There is (too?) much of it● I t ’ s s p r e a d o u t ● Optimised for other stuff
We has Data!
Lower barrier to entry for analytics
What we want from our data
Add value for our customers
Understandable & Concise
not
Open Source
What we want from our solution
Extensible
Customisable
Headphones
We settled on
(def cascalog “Pretty”)(ns impatient.core
(:use [cascalog.api]
[cascalog.more-taps :only (hfs-delimited)])
(:require [clojure.string :as s]
[cascalog.ops :as c])
(:gen-class))
(defmapcatop split [line]
"reads in a line of string and splits it by regex"
(s/split line #"[\[\]\\\(\),.)\s]+"))
(defn -main [in out & args]
(?<- (hfs-delimited out)
[?word ?count]
((hfs-delimited in :skip-header? true) _ ?line)
(split ?line :> ?word)
(c/count ?count)))
It was a journey, we learnt lots
● Taps & Sinks● Group By, Aggregation & Filters● Joins & Function Calls
Cascalog Basics in Gorilla
(def review-scores (repeatedly 5000 rand))
(defn grab-score [x] {:score [x]})
; BAD - stack overflow(def combine-score (partial merge-with concat)); BETTER - no stack overflow, but wait for GC(def combine-score (partial merge-with (comp doall concat))); BEST - snappy fast(def combine-score (partial merge-with into))
(defparallelagg bucket-scores :init-var #'grab-score :combine-var #'combine-score)
(defn median-scores [bucketed-scores] {:median-score (median (:score bucketed-scores))})
(??<- [?median-score] (review-scores :> ?score) (bucket-scores :< ?score :> ?bucketed-scores) (median-scores :< ?bucketed-scores :> ?median-score))
Learnings
Lazy sequences are not always your friend
Midje for Testing. And why it’s good
The Result
Bonus!
Clojure from python ( for prettier graphs)