zendesk @ clj-melb

Post on 23-Jun-2015

168 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

SAAS companies can produce reams of data, but if your primary product is not analytics, this data can often sit under utilised in a myriad of databases optimised for CRUD operations. The question is, how can a company exploit the hidden value this data, effectively the ‘exhaust’ of your day to day business operations, and turn it into value for the customer. At Zendesk, we’ve used the power of Clojure to build a batch analytics system on top of Hadoop that helps us to gain insight into our data stores. Our presentation will provide an introduction to building such a system with Cascalog on top of Clojure for data processing and Midje for test automation. We’ll also give an overview of the tools we’re using for adhoc queries in our production system.

TRANSCRIPT

Data @ ZendeskClojure, Cascalog, Hadoops and Datas

Web

Data

But…● There is (too?) much of it● I t ’ s s p r e a d o u t ● Optimised for other stuff

We has Data!

Lower barrier to entry for analytics

What we want from our data

Add value for our customers

Understandable & Concise

not

Open Source

What we want from our solution

Extensible

Customisable

Headphones

We settled on

(def cascalog “Pretty”)(ns impatient.core

(:use [cascalog.api]

[cascalog.more-taps :only (hfs-delimited)])

(:require [clojure.string :as s]

[cascalog.ops :as c])

(:gen-class))

(defmapcatop split [line]

"reads in a line of string and splits it by regex"

(s/split line #"[\[\]\\\(\),.)\s]+"))

(defn -main [in out & args]

(?<- (hfs-delimited out)

[?word ?count]

((hfs-delimited in :skip-header? true) _ ?line)

(split ?line :> ?word)

(c/count ?count)))

It was a journey, we learnt lots

● Taps & Sinks● Group By, Aggregation & Filters● Joins & Function Calls

Cascalog Basics in Gorilla

(def review-scores (repeatedly 5000 rand))

(defn grab-score [x] {:score [x]})

; BAD - stack overflow(def combine-score (partial merge-with concat)); BETTER - no stack overflow, but wait for GC(def combine-score (partial merge-with (comp doall concat))); BEST - snappy fast(def combine-score (partial merge-with into))

(defparallelagg bucket-scores :init-var #'grab-score :combine-var #'combine-score)

(defn median-scores [bucketed-scores] {:median-score (median (:score bucketed-scores))})

(??<- [?median-score] (review-scores :> ?score) (bucket-scores :< ?score :> ?bucketed-scores) (median-scores :< ?bucketed-scores :> ?median-score))

Learnings

Lazy sequences are not always your friend

Midje for Testing. And why it’s good

The Result

Bonus!

Clojure from python ( for prettier graphs)

top related