winning with big data: secrets of the successful data scientist

33
WINNING WITH BIG DATA Michael Driscoll @dataspora SDForum BI SIG June 15, 2010 Secrets of the Successful Data Scientist

Upload: dataspora

Post on 06-Dec-2014

2.046 views

Category:

Technology


4 download

DESCRIPTION

The world is experiencing an Industrial Revolution of Data. In any given minute the machines around us are tracking billions of mouse clicks, credit card swipes, and GPS coordinates. And increasingly this data is being saved, aggregated, and analyzed. These massive data flows present big challenges to firms, but also new opportunities for deriving insights. Presented at the June 2010 gathering of the Bay Area's Business Intelligence Special Interest Group.

TRANSCRIPT

Page 1: Winning With Big Data:  Secrets of the Successful Data Scientist

WINNINGWITH

BIG DATA

Michael Driscoll@dataspora

SDForum BI SIGJune 15, 2010

Secrets of the Successful

Data Scientist

Page 2: Winning With Big Data:  Secrets of the Successful Data Scientist

WHY DATAMATTERSNOW

Page 3: Winning With Big Data:  Secrets of the Successful Data Scientist

THE INDUSTRIALAGE OF DATA

Page 4: Winning With Big Data:  Secrets of the Successful Data Scientist

WHAT IS BIG DATA?

Data that is distributed.

class size manage with how it fits examples

small < 10 GB Excel, Rfits in one machine’s memory

thousands of sales figures

medium 10GB-1TB indexed files, monolothic DB

fits on one machine’s disk millions of web pages

Big > 1TBHadoop,

distributed DBs

stored across many

machinesbillions of web clicks

Page 5: Winning With Big Data:  Secrets of the Successful Data Scientist

WHAT ISDATA SCIENCE?

Page 6: Winning With Big Data:  Secrets of the Successful Data Scientist

WHY DATA SCIENCEIS SEXY

Page 7: Winning With Big Data:  Secrets of the Successful Data Scientist

+ =

“The sexy job in the next ten years will be statisticians…”- Hal Varian

Page 8: Winning With Big Data:  Secrets of the Successful Data Scientist
Page 9: Winning With Big Data:  Secrets of the Successful Data Scientist

data model

1000 bytes 2 bytes

Page 10: Winning With Big Data:  Secrets of the Successful Data Scientist

9 WAYS TO WINWITH DATA

Page 11: Winning With Big Data:  Secrets of the Successful Data Scientist

1. CHOOSE THERIGHT TOOL

You don’t need a chainsaw to cut butter.

Page 12: Winning With Big Data:  Secrets of the Successful Data Scientist

2. COMPRESS EVERYTHING

The world is IO-bound.

mysqldump -u myuser -p mypass sourceDB | \ gzip | ssh [email protected] "cat - | \ gunzip | mysql -u myuser -p mypass targetDB"

Page 13: Winning With Big Data:  Secrets of the Successful Data Scientist

3. SPLIT UPYOUR DATA

Split, apply, combine.

Page 14: Winning With Big Data:  Secrets of the Successful Data Scientist

4. WORK WITH SAMPLES

Big Data is heavy, samples are light.

perl -ne "print if (rand() < 0.01)" \ data.csv > sample.csv

Page 15: Winning With Big Data:  Secrets of the Successful Data Scientist

5. USESTATISTICS

Page 16: Winning With Big Data:  Secrets of the Successful Data Scientist

6. COPYFROM OTHERS

Use open source.

git clone git://github.com/kevinweil/hadoop-lzo

Page 17: Winning With Big Data:  Secrets of the Successful Data Scientist

Charts are compositions,not containers.

7. ESCHEW CHART TYPOLOGIES

Page 18: Winning With Big Data:  Secrets of the Successful Data Scientist

8. COLOR WITH CARE

Color can enhance or insult.

Page 19: Winning With Big Data:  Secrets of the Successful Data Scientist

9. TELL A STORY

People are listening.

Page 20: Winning With Big Data:  Secrets of the Successful Data Scientist

ONE SUCCESSSTORY

Page 21: Winning With Big Data:  Secrets of the Successful Data Scientist

WHY DO TELCO CUSTOMERS LEAVE?

Sign up Leave

Goal: “less churn.”

Page 22: Winning With Big Data:  Secrets of the Successful Data Scientist

DATA:BILLIONSOF CALLS

… and millions of callers.

Page 23: Winning With Big Data:  Secrets of the Successful Data Scientist

… a difference,but not significant.

DOES CALL QUALITYMATTER?

Page 24: Winning With Big Data:  Secrets of the Successful Data Scientist

Hmmm...

WHAT ABOUTSOCIALNETWORKS?

Page 25: Winning With Big Data:  Secrets of the Successful Data Scientist

… but is it predictive?

BUILD THE CALL GRAPH

Page 26: Winning With Big Data:  Secrets of the Successful Data Scientist

April

EVOLUTION OF A CALL GRAPH

Page 27: Winning With Big Data:  Secrets of the Successful Data Scientist

May

EVOLUTION OF A CALL GRAPH

Page 28: Winning With Big Data:  Secrets of the Successful Data Scientist

June

EVOLUTION OF A CALL GRAPH

Page 29: Winning With Big Data:  Secrets of the Successful Data Scientist

July

EVOLUTION OF A CALL GRAPH

Page 30: Winning With Big Data:  Secrets of the Successful Data Scientist

when a cancellationoccurs in a call network.

700% INCREASEIN CHURN

Page 31: Winning With Big Data:  Secrets of the Successful Data Scientist

FINAL THOUGHTS

Page 32: Winning With Big Data:  Secrets of the Successful Data Scientist

Big Data Dedicated RDBMS

Analytics(R, SPSS, SAS, SAP)

Data Products (Content Filters, Rec Engines)

Data

Actions

Insights

THE BIG DATA STACK

Page 33: Winning With Big Data:  Secrets of the Successful Data Scientist

THANKS!QUESTIONS?

Michael [email protected]

@dataspora on Twitterhttp://www.dataspora.com/blog

SDForum BI SIGJune 15, 2010