the rise of big data science

74
GILAD BARKAN The Rise of Big Data Science

Upload: gilad-barkan

Post on 27-Jan-2015

119 views

Category:

Technology


4 download

DESCRIPTION

This is an introductory lecture of the buzziest domain technology nowadays. The domain encapsulates a lot of new concepts, keywords, theories and paradigm shifts, from computer science to business.

TRANSCRIPT

Page 1: The Rise of Big Data Science

GILAD BARKAN

The Rise of Big Data Science

Page 2: The Rise of Big Data Science

Big Data Science

Big Data

Data Scienc

e

Big Data

Science

Page 3: The Rise of Big Data Science

Big Data

Why ?What ?How ?

Page 4: The Rise of Big Data Science

Big Data

Why ?What ?How ?

Page 5: The Rise of Big Data Science

Why Big Data ?

It’s the flooded information era we live inIn a world where data is power, big data is

big power

Page 6: The Rise of Big Data Science

Why Big Data ?

Web 2.0

Page 7: The Rise of Big Data Science

Why should we care about Big Data ?

The big business opportunities Competitive fast moving marketplace

Capitalize on business opportunities before everyone else Existing channels to every person on the planet Maximizing revenues from customers Segment-of-1 - more personal customer

experiences

Page 8: The Rise of Big Data Science

Big Data

Why ?What ?How ?

Page 9: The Rise of Big Data Science

What is Big Data ?

Volume

Variety

Velocity

The 3 V’s

Page 10: The Rise of Big Data Science

What is Big Data ?

Volume

Variety

Velocity

The 3 V’s

Page 11: The Rise of Big Data Science

Big Data - Volume

Page 12: The Rise of Big Data Science

Big Data - Volume

Smartphone Users

Hours Spent Online

35Billion Hours

1Billion

+

Global Online

Population

2Billion

Big UsersMore Users, All the Time

Page 13: The Rise of Big Data Science

Big Data+

More Data

More Users

Page 14: The Rise of Big Data Science

What is Big Data ?

Volume

Variety

Velocity

The 3 V’s

Page 15: The Rise of Big Data Science

Heterogeneous sources of data Structured Unstructured

Tri

llio

ns

of

Gig

ab

ytes

(Zett

ab

ytes)

Text, Log Files, Click Streams, Blogs, Tweets, Audio, Video,

etc.

Big Data - Variety

Unstructured NoSQLTraditional Structured SQL

tables

5 KB / record

text

50 KB / record

images

1000 KB / image

Audio

5000 KB / song

video

700 MB / movie

Un/Semi-Structured Data

Structured Data

Page 16: The Rise of Big Data Science

What is Big Data ?

Volume

Variety

Velocity

The 3 V’s

Page 17: The Rise of Big Data Science

Big Data - Velocity

How the hell does Google return an answer in 0.28 seconds by looking at 4 Billion pages?

Page 18: The Rise of Big Data Science

Big Data - Velocity

Online Advertisement - Real Time Bidding (RTB)

Page 19: The Rise of Big Data Science

Big Data - Velocity

Recommendations

Page 20: The Rise of Big Data Science

Big Data

Why ?What ?How ?

Page 21: The Rise of Big Data Science

How is Big Data Handled ?

The challenge is huge Store, analyze and serve huge volume of variety

of data in high velocity

We can’t achieve this using a single machine, no matters how strong it is. Why? Expensive – stay tuned Load balancing requests

Outbrain serves 3,000 per second DG (MediaMind) serves 500K per second!!!

Not fault tolerant

Page 22: The Rise of Big Data Science

Distributing the Data

The Big Data Paradigms Shifts

Scale Up (Vertical)

SQL Server

Scale Out(Horizontal)

Volume

HDFS(GFS)

NodesHadoop Cluster

Page 23: The Rise of Big Data Science

Big Data –Reducing Costs

Hadoop is a 5 times cheaper infrastructure !!!TCO (purchase + maintenance) for 3 years per 300 TB:

75 nodes cluster = 1 M$DBMS server = 5 M$

Page 24: The Rise of Big Data Science

Big Data Paradigm Shift - Computing

MapReduce Computing Paradigm

Exploiting the distributed architecture for large scale computations in parallel

Page 25: The Rise of Big Data Science

MapReduce

“Hello MapReduce” – counting words

C W

5 the

0 Cow

2 quick

C W

7 the

1 Cow

0 quick

C W

9 the

1 Cow

3 quick

URL 1

URL 3

URL 2

C W

21 the

2 Cow

5 quick

MapReduc

e

+

Hadoop Cluster

Master

Mappers

Reducer

{𝑤 ,𝑐 }

{𝑤 ,𝑐 }

{𝑤 ,𝑐}

Page 26: The Rise of Big Data Science

Big Data Paradigm Shift – NoSQL

Schema-less databases to support the variety of dataComplex SQL queries (joins, etc.) in a distributed data

framework is extremely inefficient Key-Value Store NoSQL

Value Key

user_id

url

image_id

video_id

tables

text

images

video

anyAny – not single

primary as in SQL

Variety

Page 27: The Rise of Big Data Science

Big Data Paradigm Shift –

RAM-based DBs instead of traditional disk-based DBsStore critical data in memory (much more expensive)

If the data doesn't come to Alg - Alg will come to the data

Velocity

Alg

Read

traditional

Data

WriteAlg

Data

today

Read Write

Page 28: The Rise of Big Data Science

Big Data - Summary

Page 29: The Rise of Big Data Science

Big Data - Summary

BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityTechnological paradigm shifts

Page 30: The Rise of Big Data Science

Big Data Technological Paradigm Shifts

NoSQL

Value Key Scale up

Master

Mappers

Reducer

Scale Out

ReduceMap

Volume Variety

Velocity

Data

Alg

Data

Alg

Page 31: The Rise of Big Data Science

Big Data - Summary

BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityComputing and DB paradigm shiftsFlood of new (open source) technologies

Page 32: The Rise of Big Data Science

Flood of New Big Data Technologies

Open Source

Page 33: The Rise of Big Data Science

Big Data - Summary

BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityComputing and DB paradigm shiftsFlood of new (open source) technologiesIt’s definitely not just a buzz

Page 34: The Rise of Big Data Science

Big Buzz ?

Page 35: The Rise of Big Data Science

Big Data - Summary

BIG business opportunitiesThe 3 V’s: Volume, Variety, VelocityComputing and DB paradigm shiftsFlood of new (open source) technologiesIt’s definitely not just a buzz

It’s a real response to the world hectic paced evolution

reducing costs by order of magnitudeStill it doesn’t mean every business today will /

should transform its technology stack to support big data

Page 36: The Rise of Big Data Science

Big Data Science

Big Data

Data Scienc

e

Big Data

Science

Page 37: The Rise of Big Data Science

Data Science

Why ?What ?How ?

Page 38: The Rise of Big Data Science

Data Science

Why ?What ?How ?

Page 39: The Rise of Big Data Science

data scientist

s

Why Data Science ?

Page 40: The Rise of Big Data Science

Data is a real value

Facebook acquires Onavo for ~150M$

Page 41: The Rise of Big Data Science

Data Science

Why ?What ?How ?

Page 42: The Rise of Big Data Science

Welcome to the Intelligent world

Data Scienc

e

Data Analysis

Data Mining

Automatic Decisionin

g

Predictive

Analytics

Machine Learning

Data Analytics

Page 43: The Rise of Big Data Science

Data Miners are the New Gold Miners

Page 44: The Rise of Big Data Science

Search

Page 45: The Rise of Big Data Science

Online Advertisement - Real Time Bidding (RTB)

Page 46: The Rise of Big Data Science

Recommendations

Recommendations

Page 47: The Rise of Big Data Science

Text Analysis

Page 48: The Rise of Big Data Science

CRM – Customers Churn Prediction

Page 49: The Rise of Big Data Science

Time Series Analysis

Page 50: The Rise of Big Data Science

Machine Learning

ClassificationClusteringRegressionRecommendation

Page 51: The Rise of Big Data Science

Third PartyCharges

Pay Bill

Abnormal

fee

Classification

Amdocs Insight™ - why is the customer calling the Call Center ?

Bill too high

Overage

Page 52: The Rise of Big Data Science

Clustering

Market Segmentation Social Network

Analysis

Page 53: The Rise of Big Data Science

Regression

Housing price prediction

50 100 150 200 250

100

200

300

400

130

280

Size in m2

Price ($)in 1000’s 215

Page 54: The Rise of Big Data Science

The Data Scientist

Page 55: The Rise of Big Data Science

Data Scientist Skillset

Hands on tools,

languages, technologies

MsC / PhD in Math, CS,

Stats, Physics

Hands on the specific problem domain

Page 56: The Rise of Big Data Science

Data Science ≠ BI

Apply advanced statistical machine learning algorithms to: dig deeper to find patterns that traditional BI

tools may not reveal much wider domains / applications spectrum

Predictive Analytics ≠ Exploratory Analytics

Page 57: The Rise of Big Data Science

Exploratory AnalyticsBusiness Intelligence

Traditional BIExploratory Analytics

Big Data Science

Predictive Analytics Data Science Vs.

Page 58: The Rise of Big Data Science

Academia Response to Data Science

Page 59: The Rise of Big Data Science

Data Science

Why ?What ?How ?

Page 60: The Rise of Big Data Science

The Art of Data Science

We need at least one semester course for itStill…

Page 61: The Rise of Big Data Science

Data Science Life Cycle

Understand Data

Prepare Data

Model

Evaluate

Deploy

Monitor

Offline Data Analysis

Run Time

Business Goal

Page 62: The Rise of Big Data Science

Big Data

Data Scienc

e

Big Data

Science

Closing the Loop

Technically wise, what do you think? Is Big Data good or bad for Data Science ?

Page 63: The Rise of Big Data Science

The Bad - Finding a Needle in a Haystack

It’s the same treasure that hides – the problem is that the pile is now huge

Big Data Big Noise

Page 64: The Rise of Big Data Science

The Bad - Finding a Needle in a Haystack

It’s the same treasure that hides – the problem is that the pile is now huge

Big Data Big Noise

Page 65: The Rise of Big Data Science

The Good - The Statistical View

Statistics is predictive analytics’ fuel !The more data you have (Big Data) the

better your predictive models will perform

Page 66: The Rise of Big Data Science

Law of Large Numbers

Page 67: The Rise of Big Data Science

Law of Large Numbers

Page 68: The Rise of Big Data Science

Law of Large Numbers

Page 69: The Rise of Big Data Science

Law of Large Numbers

Page 70: The Rise of Big Data Science

Law of Large Numbers

Page 71: The Rise of Big Data Science

Law of Large Numbers

Page 72: The Rise of Big Data Science

Combining the Good & Bad

Data is a function of quality and quantity

Small Big

Low

High

Quantity

Quality

Page 73: The Rise of Big Data Science

Big Data Science - Summary

Big Data Big Numbers Big Opportunities Big Data is the buzziest technology nowadays

Data Scientists the ones that coax the treasures for their

companies, out of the big data Are multi-discipline skilled the new industry rock stars

Page 74: The Rise of Big Data Science

Thank You for your attention