zendesk @ clj-melb

16
Data @ Zendesk Clojure, Cascalog, Hadoops and Datas

Upload: chris-hausler

Post on 23-Jun-2015

168 views

Category:

Technology


0 download

DESCRIPTION

SAAS companies can produce reams of data, but if your primary product is not analytics, this data can often sit under utilised in a myriad of databases optimised for CRUD operations. The question is, how can a company exploit the hidden value this data, effectively the ‘exhaust’ of your day to day business operations, and turn it into value for the customer. At Zendesk, we’ve used the power of Clojure to build a batch analytics system on top of Hadoop that helps us to gain insight into our data stores. Our presentation will provide an introduction to building such a system with Cascalog on top of Clojure for data processing and Midje for test automation. We’ll also give an overview of the tools we’re using for adhoc queries in our production system.

TRANSCRIPT

Page 1: Zendesk @ clj-melb

Data @ ZendeskClojure, Cascalog, Hadoops and Datas

Page 2: Zendesk @ clj-melb
Page 3: Zendesk @ clj-melb
Page 4: Zendesk @ clj-melb

Web

Data

Page 5: Zendesk @ clj-melb

But…● There is (too?) much of it● I t ’ s s p r e a d o u t ● Optimised for other stuff

We has Data!

Page 6: Zendesk @ clj-melb

Lower barrier to entry for analytics

What we want from our data

Add value for our customers

Page 7: Zendesk @ clj-melb

Understandable & Concise

not

Open Source

What we want from our solution

Extensible

Customisable

Page 8: Zendesk @ clj-melb

Headphones

Page 9: Zendesk @ clj-melb

We settled on

Page 10: Zendesk @ clj-melb

(def cascalog “Pretty”)(ns impatient.core

(:use [cascalog.api]

[cascalog.more-taps :only (hfs-delimited)])

(:require [clojure.string :as s]

[cascalog.ops :as c])

(:gen-class))

(defmapcatop split [line]

"reads in a line of string and splits it by regex"

(s/split line #"[\[\]\\\(\),.)\s]+"))

(defn -main [in out & args]

(?<- (hfs-delimited out)

[?word ?count]

((hfs-delimited in :skip-header? true) _ ?line)

(split ?line :> ?word)

(c/count ?count)))

Page 11: Zendesk @ clj-melb

It was a journey, we learnt lots

Page 12: Zendesk @ clj-melb

● Taps & Sinks● Group By, Aggregation & Filters● Joins & Function Calls

Cascalog Basics in Gorilla

Page 13: Zendesk @ clj-melb

(def review-scores (repeatedly 5000 rand))

(defn grab-score [x] {:score [x]})

; BAD - stack overflow(def combine-score (partial merge-with concat)); BETTER - no stack overflow, but wait for GC(def combine-score (partial merge-with (comp doall concat))); BEST - snappy fast(def combine-score (partial merge-with into))

(defparallelagg bucket-scores :init-var #'grab-score :combine-var #'combine-score)

(defn median-scores [bucketed-scores] {:median-score (median (:score bucketed-scores))})

(??<- [?median-score] (review-scores :> ?score) (bucket-scores :< ?score :> ?bucketed-scores) (median-scores :< ?bucketed-scores :> ?median-score))

Learnings

Lazy sequences are not always your friend

Page 14: Zendesk @ clj-melb

Midje for Testing. And why it’s good

Page 15: Zendesk @ clj-melb

The Result

Page 16: Zendesk @ clj-melb

Bonus!

Clojure from python ( for prettier graphs)