creating added value with big data

22
CREATING ADDED VALUE WITH BIG DATA by KLAAS BOSTEELS @klbostee

Upload: klaas-bosteels

Post on 24-May-2015

413 views

Category:

Documents


2 download

DESCRIPTION

This talk essentially tells the story of the data science team at Massive Media, the company behind Netlog.com and Twoo.com. After obtaining invaluable first-hand experience in working with big data as a member of the information retrieval team at the music discovery website Last.fm, I joined Massive Media to conceive, build and lead a brand new team around big data and data science for them. In doing so, I developed a pretty clear perspective on how to introduce big data within a company and create added value from it, which is precisely what I would like to share in this talk.

TRANSCRIPT

Page 1: Creating Added Value with Big Data

CREATINGADDED VALUEWITH BIG DATA

by KLAAS BOSTEELS@klbostee

Page 2: Creating Added Value with Big Data

MY CAREER PATH SO FAR

2007: Began working with big data as PhD student

2009: Embarked on a data science career at Last.fm

2011: Joined Massive Media as Lead Data Scientist

Data company at heart; one of the earliest Hadoop adopters world-wide; inventors of Ketama; organised first “NoSQL” meetup in SF.

Huge audience and tremendous potential, but data science newcomer at the time.

Page 3: Creating Added Value with Big Data

Second big product of Massive Media, after Netlog

2011: Initial launch of Twoo.com

2012: Biggest dating site world-wide on comScore

2013: Massive Media acquired by InterActiveCorp

Page 4: Creating Added Value with Big Data

IT’S A BIG FAMILY

IAC’s main personals brands:

Some other well-known IAC brands:

Page 6: Creating Added Value with Big Data

BOOTSTRAP BY SAVING OR GAINING MONEY

You need to get some capital to get started

Saving money tends to be easier in practice

Real-world example:

• Analyzing CDN logs unveiled abuse

• Stopping the abuse greatly reduced the bills

Page 8: Creating Added Value with Big Data

HADOOP

Not the holy grail, but deserves a central role

It has a vibrant community and is proven to be:

ECONOMICAL runs on commodity hardware

SCALABLE smart distributed processing

MAINTAINABLE very robust and fault-tolerant

FLEXIBLE predefined schemas not required

Page 9: Creating Added Value with Big Data

STEP 3

BUILD DASHBOARDS

photo by Dawn Hopkins

Page 10: Creating Added Value with Big Data

STATS PIPELINE BASED ON HADOOP

MapReduce

HBase

HDFS

Log collector

Dashboardsin batches

continuous

Page 11: Creating Added Value with Big Data

STATS PIPELINE BASED ON HADOOP

Realtimeprocessing

Cfr. “lambda architecture”

coined by @nathanmarz

MapReduce

HBase

HDFS

Log collector

Dashboardsin batches

continuous

Page 12: Creating Added Value with Big Data

STATS PIPELINE BASED ON HADOOP

Ad-hoc results

Realtimeprocessing

Cfr. “lambda architecture”

coined by @nathanmarz

MapReduce

HBase

HDFS

Log collector

Dashboardsin batches

continuous

Page 13: Creating Added Value with Big Data

CUSTOM-TAILORED WEB INTERFACE

Annotation & exporting functionality

SupportsA/B testingand cohort

analysis

Various othernifty extra’s

Page 14: Creating Added Value with Big Data

STEP 4

ASSEMBLE A TEAM

photo by Jean-François Schmitz

Page 15: Creating Added Value with Big Data

THE SECRET IS IN THE MIX

Hadoop’s tricks also apply to data science teams

• Avoid specialisation to allow easy distribution and scaling

• Exploit data locality by hiring people with wide skill set

Great Data Scientists have the right mix of skills

• Hackers with solid technical background

• Analytical mind that knows statistics and machine learning

• Clever and creative in everything they do

Page 17: Creating Added Value with Big Data

STEP 5

EXPLORE & INNOVATE

photo by NASAr

Page 18: Creating Added Value with Big Data

SOME TIPS AND TRICKS

Dare to fail and/or start from estimates

Introduce data exploration/innovation days

• Basically 20% time devoted to playing with data

• Incorporate collaborative brainstorming

• Goal is to find promising new projects to work on

Communicate findings to the rest of the company

• Fun and silliness are allowed

• Prototype early and often

Page 19: Creating Added Value with Big Data

PRODUCT INSIGHTS & EXTENSIONS

E.g. recommendations and activity patterns analysis

Page 21: Creating Added Value with Big Data

FIVE SIMPLE STEPS IS ALL IT TAKES

1

2

3

4

5

FOLLOW THE MONEY

EMBRACE HADOOP

BUILD DASHBOARDS

ASSEMBLE A TEAM

EXPLORE & INNOVATE

Page 22: Creating Added Value with Big Data

FIVE SIMPLE STEPS IS ALL IT TAKES

1

2

3

4

5

FOLLOW THE MONEY

EMBRACE HADOOP

BUILD DASHBOARDS

ASSEMBLE A TEAM

EXPLORE & INNOVATE

Thanks!Questions?