open analytics meetup alex poon (1)

Post on 07-Jul-2015

543 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Visual Revenue's p

TRANSCRIPT

Alex Poon VP of Engineering

Storm @ Visual Revenue (an Outbrain Company)

Who are we?

What we do? CustomerTraffic

WebServers

DataTransform/Aggrega8on

Databases

Dashboard Algo

Automa8on

Ka=a

Storm

•  14B page views per month

•  At peak, 8000-10000 per sec

•  Deployed Storm to production ~ 1 month ago

•  Storm cluster of ~50 instances on AWS

Before Storm •  Built our own distributed data processing

•  ZMQ

•  Batch based process

•  Hashing processing by customers

•  Advantages

•  Simple in-house system built from very basic components

•  Well understood

•  Disadvantages

•  Hard to scale, constant battle for keeping up with pings

•  Machine management was clumsy

•  Uneven distribution of traffic

•  Multiple processes doing similar work, wasting resources

Why Kafka/Storm? •  Kafka

•  open-sourced, distributed publish-subscribe messaging system

•  Storm

•  open-sourced, real-time computation system for continuous computation

•  They are awesome

•  Distributed, highly scalable, and fault tolerance

•  High throughput

•  Reliable

•  Real-time

•  Great at in-memory analytics, and real-time decision support

Data Aggregation

URL

15s

Aggregate

15s

Customer

15s

Front Page

15s

Position

5m

Arrangement

15s

Tweet

5m

Aggregate

15s

@HandleSpout

Bolt

Learning / Ideas 1. Kafka + zookeeper is extremely scalable and easy to setup. Check out the Brod library if you are doing Python

2. Use the Storm UI (Ganglia based) to monitor your cluster

3. Shell Bolts were inefficient and hard to debug (at least for us)

4. Upgrade to at least Storm version 0.8.2 which gives you capacity metrics on top of other goodies

5. Storm’s anchoring/replay capability is awesome but comes with a visible overhead

6. Use a good framework to manage your cluster, we use Salt Stack

7. Our unit tests are built in Junit. Most built in unit tests for Storm are only available in Clojure for now

Thank You

Alex Poon

@alexpoon06 @Outbrain

Yes, it is true. We are Hiring!!

www.visualrevenue.com/jobs

top related