big it data workshop pub

132
LutzFinger.com How to extract significant business value from big data September 20th 2016

Upload: lutz-finger

Post on 13-Apr-2017

171 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Big it data workshop   pub

Lu

tzFi

nger

.com

How to extract significant business value from big

dataSeptember 20th 2016

Page 2: Big it data workshop   pub

Lu

tzFi

nger

.com

Lutz & Matt

Page 3: Big it data workshop   pub

Lu

tzFi

nger

.com

Disclaimer

This presentation is solemnly our opinion and not necessarily the

opinion of my employer Harvard, Linkedin or Cornell.

Page 4: Big it data workshop   pub

Lu

tzFi

nger

.com

Agenda: 9:00 - 17:009:00 The right Ask9:45 Teamwork: Discover an Ask

10:30 Coffee Break10:45 Data is King11:15 Decision Tree

13:00 Lunch14:00 Pitfalls with Data14:30 Teamwork: Which Data?

15:30 Coffee Break15:45 Innovation & Technology16:30 Build A Team16:45 Privacy & Ethics

Page 5: Big it data workshop   pub

Lu

tzFi

nger

.comHype About Data

Page 6: Big it data workshop   pub

Lu

tzFi

nger

.com

Hyped Data Scientists

image by Mike under Creative Commons

Page 7: Big it data workshop   pub

Lu

tzFi

nger

.com

McK Study forecasted:

10 Times More Managers per Data Savvy Person

Page 8: Big it data workshop   pub

Lu

tzFi

nger

.com

?

Page 9: Big it data workshop   pub

Lu

tzFi

nger

.com

SCHOOLS COMPANIES KNOWLEDGE SKILLSMEMBERS JOBS

LinkedIn's vision is to create economic opportunity for every member of the global

workforce.

Page 10: Big it data workshop   pub

Lu

tzFi

nger

.com

Actionable Insights

Page 11: Big it data workshop   pub

Lu

tzFi

nger

.com

ASK the right Questions.

MEASURE the right data – even if it is not Big data.

Take Actions and LEARN from them.

?

Page 12: Big it data workshop   pub

Lu

tzFi

nger

.com

BIG DATA IS “BULLSHIT”

Page 13: Big it data workshop   pub

Lu

tzFi

nger

.com

To Get Data is EASYTo Get The Right Data is HARD

To Get Insights is EASYTo Make Money of Data/Insights is

HARD

Page 14: Big it data workshop   pub

Lu

tzFi

nger

.com

THE ASK is the hardest part, but there are many use-cases to get started.?

Page 15: Big it data workshop   pub

Lu

tzFi

nger

.com

The Right Question

Page 16: Big it data workshop   pub

Lu

tzFi

nger

.com

Google had the right Questionis difficult to find

Page 17: Big it data workshop   pub

Lu

tzFi

nger

.com

Fisheye Learning

Page 18: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Without Action

300+ Million Member at LinkedIn

60.000 with a Job Title that might fit

19.000 who switched after 3 to 8 years

24 who had the same career path

Page 19: Big it data workshop   pub

Lu

tzFi

nger

.com

Data by itselfis

USELESS

Information by itselfis often

USELESS

Only Action

Counts!

Data Reportingprescriptive, predictive, actionable, data science … the holy grail

Page 20: Big it data workshop   pub

Lu

tzFi

nger

.com

How To Work With Data?

Past Future

What happened?

What is happening?

What is likely to happen?

Reporting, Dashboards

Real-Time Analytics

Predictive Analytics

Forensics & Data Mining

Real-Time Data Mining

Prescriptive Analytics

Why did it happen?

Why is it happening?

What should I do about it?

Ref. Gartner

Page 21: Big it data workshop   pub

Lu

tzFi

nger

.com

Easiest - Start With Reporting

LinkedIn’s LMI Tool

Page 22: Big it data workshop   pub

Lu

tzFi

nger

.com

Be Careful with Benchmarking

Page 23: Big it data workshop   pub

Lu

tzFi

nger

.com

We Want Predictions

Page 24: Big it data workshop   pub

Lu

tzFi

nger

.com

We Want Monetization

Page 25: Big it data workshop   pub

Lu

tzFi

nger

.com

Examples At LinkedInPeople You May Know

Groups You May Like

Ads in Which You May Be Interested

Companies You May Want to Follow

Pulse

Similar Profiles

Page 26: Big it data workshop   pub

Lu

tzFi

nger

.com

Many Other Good Ideas

• Banking: Card Fraud Detection• Banking: Credit Scoring• Media: Content Recommendation • Health Care: Fraud Detection• Medicine: Image Processing• Medicine: Outliers Detection• Education: Course Improvement• Retail: Likelihood to Buy• Books: Marketing Planning• Manufacturing: Machine Failure Prediction• Manufacturing: Optimization• Insurance: Likelihood & Pricing• Transportation: Route Planning• Energy: Grid Utilization

Page 27: Big it data workshop   pub

Lu

tzFi

nger

.com

About Innovation

By

Alis

tair

Cro

ll

Page 28: Big it data workshop   pub

Lu

tzFi

nger

.com

Team Work

Photo by Creative Sustainability under the Creative Commons (CC BY 2.0)

What Would You Like To Do With Data?○ Is it Actionable? “So What?”○ Is it Reporting or Predictions?○ Is it Sustaining, Adjunct or Disruptive?

Please Stay REAL!

Page 29: Big it data workshop   pub

Lu

tzFi

nger

.com

Agenda: 9:00 - 17:009:00 The right Ask9:45 Teamwork: Discover an Ask

10:30 Coffee Break10:45 Data is King11:15 Decision Tree

13:00 Lunch14:00 Pitfalls with Data14:30 Teamwork: Which Data?

15:30 Coffee Break15:45 Innovation16:15 Technology16:45 Build A Team

Page 30: Big it data workshop   pub

Lu

tzFi

nger

.com

“Data is the new oil”- World Economic Forum

Page 31: Big it data workshop   pub

Lu

tzFi

nger

.com

“DATA IS THE NEW OIL”

Oil Mine the oil

Use the oil

Goal

Page 32: Big it data workshop   pub

Lu

tzFi

nger

.com

V OF “BIG DATA”

Data at scale(TB, PB … )

Data in many forms(Structured, unstructured ...)

Speed(Streaming, real time, near time ..)

Uncertainty(Imprecise, not always up-to-date ..)

Page 33: Big it data workshop   pub

Lu

tzFi

nger

.com

DATACategorical

• Ordinal: Monday, Tuesday, Wednesday• Nominal: Man, Woman

Quantitative:• Ratio: Kelvin, Height, Weight• Interval: Celsius, Fahrenheit

Structure:• Structured• Unstructured• Semi-structured / Meta data

Read more: “On the Theory of Scales of Measurement”S.Stevens 1946

Page 34: Big it data workshop   pub

Lu

tzFi

nger

.com

What Have Troubled The Media Industry?

Page 35: Big it data workshop   pub

Lu

tzFi

nger

.com

The Media Industry Is One Step Removed From The Customer

Photo by Norimutsu Nogami under the Creative Commons (CC BY 2.0)

They Do Not Know Who Reads What &

When?

Page 36: Big it data workshop   pub

Lu

tzFi

nger

.com

Facebook Knows

* only member - not necessarily ‘active’ members

Page 37: Big it data workshop   pub

Lu

tzFi

nger

.com

& Size MattersNetwork Size

(Proportion by Members*)

* only member - not necessarily ‘active’ members

Page 38: Big it data workshop   pub

Lu

tzFi

nger

.com

“Data is the new oil”- World Economic Forum

Photo by William Warby under the Creative Commons (CC BY 2.0)

Page 39: Big it data workshop   pub

Lu

tzFi

nger

.com

$3.2 billion

Page 40: Big it data workshop   pub

Lu

tzFi

nger

.com

Prediction

Photo by KOMUnews under the Creative Commons (CC BY 2.0)

Boring could be the New Sexy!

Page 41: Big it data workshop   pub

Lu

tzFi

nger

.com

Innovation To Get Data

from Marketing Material of Ursa Space Systems

Page 42: Big it data workshop   pub

Lu

tzFi

nger

.com

Also Governments Take Part

Page 43: Big it data workshop   pub

Lu

tzFi

nger

.com

Public Data is Not Competitive

Page 44: Big it data workshop   pub

Lu

tzFi

nger

.com

Look For Data Only You Own

taken from http://blogs.ubc.ca/mdaw15/2013/11/15/ipo-twitter-vs-facebook/

Page 45: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Might (Not) Be A Barrier To Enter

Page 46: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Might (Not) Be A Barrier To Enter

Page 47: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Is Kingbut not all data is equal.

Page 48: Big it data workshop   pub

Lu

tzFi

nger

.com

The Tale of “Social Media” DataSo

urce: ‘Ask M

easure Learn’ by O’Reilly M

edia

Page 49: Big it data workshop   pub

Lu

tzFi

nger

.comStructured Data Is Often

BetterNew York Weather in April 2013

Source: ‘Ask Measure Learn’ by O’Reilly Media

Page 50: Big it data workshop   pub

Lu

tzFi

nger

.com

Sometimes, it’s worth it.

Source: Jeffrey Breen

RE @dave_mcgregor: Publicly pledging to never fly @delta again. The worst airline ever. U have lost my patronage forever du to ur incompetence

Completely unimpressed with @continental or @united. Poor communication, goofy reservations systems and all to turn my trip into a mess.

@SouthWestAir I know you don't make the weather. But at least pretend I am not a bother when I ask if the delay will make miss my connection

Page 51: Big it data workshop   pub

Lu

tzFi

nger

.com

Agenda: 9:00 - 17:009:00 The right Ask9:45 Teamwork: Discover an Ask

10:30 Coffee Break10:45 Data is King11:15 Decision Tree

13:00 Lunch14:00 Pitfalls with Data14:30 Teamwork: Which Data?

15:30 Coffee Break15:45 Innovation & Technology16:30 Build A Team16:45 Privacy & Ethics

Page 52: Big it data workshop   pub

Lu

tzFi

nger

.com

Pregnant Or Not?

Page 53: Big it data workshop   pub

Lu

tzFi

nger

.com

Decision Trees Step by Step

by Maciej Lewandowski under Creative Commons (CC BY-SA 2.0)

Page 54: Big it data workshop   pub

Lu

tzFi

nger

.com

Split Apples & Mandarins

Page 55: Big it data workshop   pub

Lu

tzFi

nger

.com

What Is The Target Variable?

Page 56: Big it data workshop   pub

Lu

tzFi

nger

.com

What Are The Features That Describe The Target?

Page 57: Big it data workshop   pub

Lu

tzFi

nger

.com

What Are The Features That Describe The Target?

• Weight: light, medium, heavy - or x gram• Size: round or not• Color: green, orange, red• Surface: flat or porous surface• …

Page 58: Big it data workshop   pub

Lu

tzFi

nger

.com

Which Feature Works Best?

● The variable with the most important information about the target variable.

● Which variable can split the group as homogeneous with respect to the target variable?

(pure vs. impure)

Page 59: Big it data workshop   pub

Lu

tzFi

nger

.com

Color Red?

Color Orange?

Split on Color Red vs. Split on Color Orange

Which One Is Better?

Page 60: Big it data workshop   pub

Lu

tzFi

nger

.com

We Need A Way To Describe Chaos

"Cla

ude

Elw

ood

Sha

nnon

(191

6-20

01)"

by

Sou

rce.

Lic

ense

d un

der F

air u

se v

ia

Wik

iped

ia

Page 61: Big it data workshop   pub

Lu

tzFi

nger

.com

ENTROPYEntropy is a measure of disorder.

Entropy only tells us how impure one individual subset is.

Page 62: Big it data workshop   pub

Lu

tzFi

nger

.com

ENTROPY & PROBABILITY

entropy = -p1 * log (p1) - p2 * log (p2) - ….

Page 63: Big it data workshop   pub

Lu

tzFi

nger

.com

● Highest Entropy Reduction

● Highest Information Gain

Page 64: Big it data workshop   pub

Lu

tzFi

nger

.com

1st. Entropy Without Splitentropy = -p1 * log (p1) - p2 * log (p2)

Apple: 8 out of 15 p(apple)= 8/15

Mandarines: 7 out of 15 p(mandarine)= 7/15

ENTROPY (Without Split):

-p(apple)*log(p(apple)) -p(mandarins)*log(p(mandarines))

= 0.996791632 = 1

very impure

Page 65: Big it data workshop   pub

Lu

tzFi

nger

.com

Color Red?

Color Orange?

entropy = -p1 * log (p1) - p2 * log (p2)

ENTROPY (After Split on Red):

= 8/15* ENTROPY (Split on Red=’no’) + 7/15* ENTROPY (Split on Red=’yes’)

= 0.43 + 0.28 = 0.71

INFORMATION GAIN= Entropy (Before) - Entropy (After) = 1 - 0.71 = 0.29

ENTROPY (Split on Red=’no’):= -6/8*(log2(6/8))-2/8*(log2(2/8))= 0.81

ENTROPY (Split on Red=’yes’):= -6/7*(log2(6/7)) -1/7*(log2(1/7))= 0.59

ENTROPY (Split on Orange=’yes’):= -6/6*(log2(6/6))= 0

ENTROPY (Split on Orange=’no’):= -8/9*(log2(8/9))-1/9*(log2(1/9))= 0.50

ENTROPY (After Split on Orange):

= 6/15* ENTROPY (Split on Orange=’no’) + 9/15* ENTROPY (Split on Orange=’yes’)

= 0 + 0.23 = 0.23

INFORMATION GAIN= Entropy (Before) - Entropy (After) = 1 - 0.23 = 0.77

Page 66: Big it data workshop   pub

Lu

tzFi

nger

.com

INFORMATION GAIN (IG)Information Gain measures how much a

given feature improves (decreases) entropy over the whole segmentation it creates.

How important is this feature for the prediction?

Page 67: Big it data workshop   pub

Lu

tzFi

nger

.com

Decision Tree

Color Orange? ROOT NODE

LEAFS

Page 68: Big it data workshop   pub

Lu

tzFi

nger

.com

Decision Tree

Color Orange?

Decision Tree Structure

Page 69: Big it data workshop   pub

Lu

tzFi

nger

.com

Which Feature Would Be Better?

Page 70: Big it data workshop   pub

Lu

tzFi

nger

.com

Heavy?

Always Start With Highest IG

Page 71: Big it data workshop   pub

Lu

tzFi

nger

.com

BIG ML

Competitors:

● Algorithms.io● SnapAnalytx● Wise.io● Predixion Software● Google Prediction

API

Page 72: Big it data workshop   pub

Lu

tzFi

nger

.com

Pregnant Or Not?

Page 73: Big it data workshop   pub

Lu

tzFi

nger

.com

• Drag & Drop• Often by Connecting

Get Source

Page 74: Big it data workshop   pub

Lu

tzFi

nger

.com

One Click DataBase

• Sense Check• Any Outliers / Anything Strange

Page 75: Big it data workshop   pub

Lu

tzFi

nger

.com

Split Training & Testing

Page 76: Big it data workshop   pub

Lu

tzFi

nger

.com

Configure Model

Select The Objective Field - What To Train The Model On?

Page 77: Big it data workshop   pub

Lu

tzFi

nger

.com

Done

Page 78: Big it data workshop   pub

Lu

tzFi

nger

.com

Right hand column displaying scroll over for this high confidence node

Page 79: Big it data workshop   pub

Lu

tzFi

nger

.com

Highest Information Gain

Page 80: Big it data workshop   pub

Lu

tzFi

nger

.com

Now What?

Page 81: Big it data workshop   pub

Lu

tzFi

nger

.com

Predicting

Page 82: Big it data workshop   pub

Lu

tzFi

nger

.com

Predicting

Page 83: Big it data workshop   pub

Lu

tzFi

nger

.com

Half Pregnant?

Page 84: Big it data workshop   pub

Lu

tzFi

nger

.com

CONFUSION MATRIX

Bought Did Not Buy

Bought (A) true positive

(B) false positive

Did Not Buy (C) false negative

(D) true negativeC

lass

ifier

Reality

Page 85: Big it data workshop   pub

Lu

tzFi

nger

.com

Business Decision: Cut-Off Value

It depends on the Ask

Page 86: Big it data workshop   pub

Lu

tzFi

nger

.com

TRUE NEGATIVE Specificity

# of true negative / truthalso: Specificity = 1 - False positive rate

Bought Did Not Buy

Bought true positive false positive

Did Not Buy false negative

true negativeCla

ssifi

erTruth

Page 87: Big it data workshop   pub

Lu

tzFi

nger

.com

PRECISION

# of true positives / Total in this prediction class

Bought Did Not Buy

Bought true positive false positive

Did Not Buy false negative

true negativeCla

ssifi

erTruth

Page 88: Big it data workshop   pub

Lu

tzFi

nger

.com

ROC CURVE

Better Model

Worse Model

Page 89: Big it data workshop   pub

Lu

tzFi

nger

.com

Using The Model

Page 90: Big it data workshop   pub

Lu

tzFi

nger

.com

Using The Model

Page 91: Big it data workshop   pub

Lu

tzFi

nger

.com

Now How Can I Improve the Quality?

Page 92: Big it data workshop   pub

Lu

tzFi

nger

.com

Agenda: 9:00 - 17:009:00 The right Ask9:45 Teamwork: Discover an Ask

10:30 Coffee Break10:45 Data is King11:15 Decision Tree

13:00 Lunch14:00 Pitfalls with Data14:30 Teamwork: Which Data?

15:30 Coffee Break15:45 Innovation & Technology16:30 Build A Team16:45 Privacy & Ethics

Page 93: Big it data workshop   pub

Lu

tzFi

nger

.com

The Tale of Big Data

Page 94: Big it data workshop   pub

Lu

tzFi

nger

.com

Overfitting

To tailor a model to training data at the expense of being generalizable for previously unseen data

points. The model becomes perfect in describing noise and spurious correlations.

TRADE OFF

Complexity of a Model & Overfitting Likelihood

Page 95: Big it data workshop   pub

Lu

tzFi

nger

.com

The More Nodes - The More Likely To Overfit

Page 96: Big it data workshop   pub

Lu

tzFi

nger

.com

The Story of MORE DataDecision Trees are good in identifying LOCAL

patterns, but they often need more data.

by Claudia Perlich et. al., “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis”, Journal of Machine Learning Research 4 (2003) 211-255

Page 97: Big it data workshop   pub

Lu

tzFi

nger

.com

Correlation vs. Causation

Page 98: Big it data workshop   pub

Lu

tzFi

nger

.com

Team Work

Photo by Creative Sustainability under the Creative Commons (CC BY 2.0)

○ do only you have this data?○ do you have a positive feedback loop? ○ is the data sustainable?○ who else could get the data?○ how much data is needed?

Page 99: Big it data workshop   pub

Lu

tzFi

nger

.com

Agenda: 9:00 - 17:009:00 The right Ask9:45 Teamwork: Discover an Ask

10:30 Coffee Break10:45 Data is King11:15 Decision Tree

13:00 Lunch14:00 Pitfalls with Data14:30 Teamwork: Which Data?

15:30 Coffee Break15:45 Innovation & Technology16:30 Build A Team16:45 Privacy & Ethics

Page 100: Big it data workshop   pub

Lu

tzFi

nger

.com

How Was Big Data Infrastructure Invented?

Page 101: Big it data workshop   pub

Lu

tzFi

nger

.com

Issue Of YahooCENTRALIZED SYSTEMS ARE EXPENSIVE

• diminishing returns in power (overhead issue)• exponential cost to scale• slow to transport (ETL) the data

Scan 1000 TB Datasets on a 1000 node cluster:• Remote Storage @ 10 MB’s = 165 min• Local Storage @ 200 MB’s = 8 min

MAKE SYSTEMS FAULT TOLERANT1000 nodes - a machine a day will break

Page 102: Big it data workshop   pub

Lu

tzFi

nger

.com

The VisionCHEAP Systems

• can run on commodity hardware

Computation are done DECENTRAL• ability to ‘dispatch’ a task• parallelize work-streams

Fault TOLERANTno matter where and when, is not an issue

Page 103: Big it data workshop   pub

Lu

tzFi

nger

.com

Page 104: Big it data workshop   pub

Lu

tzFi

nger

.com

Typical Workflow

· Load data into the cluster (HDFS writes)· Analyze the data (Map Reduce)· Store results in the cluster (HDFS writes)· Read the results from the cluster (HDFS reads) Sample Scenario:

Huge file containing all emails sentto customer service

Ref. Brad Hedlund .com

How many times did our customers type the word “Refund” into emails sent to customer service?

File. Txt

Page 105: Big it data workshop   pub

Lu

tzFi

nger

.com

How To Access HDFS

Hadoop Storage (HDFS / HBase / Solr)

Map Reduce

Page 106: Big it data workshop   pub

Lu

tzFi

nger

.com

Via The Normal Languages

Hadoop Storage (HDFS / HBase / Solr)

Map Reduce

Map

Red

uce

Hiv

e

Pig

/Cas

scad

ing

Gira

ph

Mah

out

SQL Like

Scripting Like

Graph Oriented

ML Engine

Page 107: Big it data workshop   pub

Lu

tzFi

nger

.com

Pro & Con

Hadoop Storage (HDFS / HBase / Solr)

Map Reduce

Map

Red

uce

Hiv

e

Pig

/Cas

scad

ing

Gira

ph

Mah

out

SQL Like

Scripting Like

Graph Oriented

ML Engine

Store

ETL: Extract / Transform / Load

DB / Key Value Store

Visualize

Pro:way better than traditional BI

Con:Heavy tech involvement. 12-18 month for non-tech company to implement a schema

Page 108: Big it data workshop   pub

Lu

tzFi

nger

.com

Hadoop 2.0

Hadoop Storage (HDFS / HBase / Solr)

Map Reduce Spark Tez

Map

Red

uce

Hiv

e

Pig

/Cas

scad

ing

Gira

ph

Mah

out

Spa

rk

Hiv

e

Pig

/Cas

scad

ing

Gira

ph

Mah

out

Tez

Pig

/Cas

scad

ing

Hiv

e

Impa

la /

Pre

sto

H2O

/ O

ryx

SQL Like

Scripting Like

Graph Oriented

ML Engine

Store in DB

Visualize

Visualize

Page 109: Big it data workshop   pub

Lu

tzFi

nger

.com

Why Is It So Hard To Become Data Driven

Page 110: Big it data workshop   pub

Lu

tzFi

nger

.com

Ingredients of Data Products

The question?

Ask

The need?

The Why? MeasureThe Data?

The features?

Team

All of them are necessary - None of them are sufficient!

The algorithms?

The right Skills?

Collaboration

110

Page 111: Big it data workshop   pub

Lu

tzFi

nger

.com

How To Ingest Ideas

Hack - Days & IncubatorInternal Process

External Competition

Close Collaboration between Business & Data Scientists“All we do is Data” - Jeff Weiner

111

Page 112: Big it data workshop   pub

Lu

tzFi

nger

.com

Agenda: 9:00 - 17:009:00 The right Ask9:45 Teamwork: Discover an Ask

10:30 Coffee Break10:45 Data is King11:15 Decision Tree

13:00 Lunch14:00 Pitfalls with Data14:30 Teamwork: Which Data?

15:30 Coffee Break15:45 Innovation & Technology16:30 Build A Team16:45 Privacy & Ethics

Page 113: Big it data workshop   pub

Lu

tzFi

nger

.com

Old vs. New

Old School Today / Big data

Data Amount

IT Infrastructure

Data Types

Schema

When and How is the ASK formulated?

Page 114: Big it data workshop   pub

Lu

tzFi

nger

.com

Old vs. New

Old School Today / Big data

Data Amount Gigabytes & Terabytes Petabytes & Exabytes

IT Infrastructure

Data Types

Schema

When and How is the ASK formulated?

Page 115: Big it data workshop   pub

Lu

tzFi

nger

.com

Old vs. New

Old School Today / Big data

Data Amount Gigabytes & Terabytes Petabytes & Exabytes

IT Infrastructure Centralized Decentralized / Parallelized

Data Types

Schema

When and How is the ASK formulated?

Page 116: Big it data workshop   pub

Lu

tzFi

nger

.com

Old vs. New

Old School Today / Big data

Data Amount Gigabytes & Terabytes Petabytes & Exabytes

IT Infrastructure Centralized Decentralized / Parallelized

Data Types Structured Structured & Unstructured

Schema

When and How is the ASK formulated?

Page 117: Big it data workshop   pub

Lu

tzFi

nger

.com

Old vs. New

Old School Today / Big data

Data Amount Gigabytes & Terabytes Petabytes & Exabytes

IT Infrastructure Centralized Decentralized / Parallelized

Data Types Structured Structured & unstructured

Schema Stable schema Schema on the fly

When and How is the ASK formulated?

Page 118: Big it data workshop   pub

Lu

tzFi

nger

.com

Old vs. New

Old School Today / Big data

Data Amount Gigabytes & Terabytes Petabytes & Exabytes

IT Infrastructure Centralized Decentralized / Parallelized

Data Types Structured Structured & unstructured

Schema Stable schema Schema on the fly

When and How is the ASK formulated?

Set ask Ad-hoc ask

Page 119: Big it data workshop   pub

Lu

tzFi

nger

.com

How to build a Data Team

Page 120: Big it data workshop   pub

Lu

tzFi

nger

.com

Page 121: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Scientist

Page 122: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Scientist

BI Analyst

Page 123: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Scientist

BI Analyst

Engineer

Page 124: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Scientist

BI Analyst

Engineer

Product Manager

Page 125: Big it data workshop   pub

Lu

tzFi

nger

.com

Data Scientist

BI Analyst

Engineer

Product Manager

Communication Skills Domain Knowledge

Page 126: Big it data workshop   pub

Lu

tzFi

nger

.comThere Is NO Data Science

Shortage

Source: World Economic Forum - Human Capital Report 2016

Page 127: Big it data workshop   pub

Lu

tzFi

nger

.com

There are 9 Million Data Enabled People

Page 128: Big it data workshop   pub

Lu

tzFi

nger

.com

Agenda: 9:00 - 17:009:00 The right Ask9:45 Teamwork: Discover an Ask

10:30 Coffee Break10:45 Data is King11:15 Decision Tree

13:00 Lunch14:00 Pitfalls with Data14:30 Teamwork: Which Data?

15:30 Coffee Break15:45 Innovation & Technology16:30 Build A Team16:45 Privacy & Ethics

Page 129: Big it data workshop   pub

Lu

tzFi

nger

.com

In the EU, insurers will no longer be allowed to take the gender of their customers into account for insurance premiums:

● young men's premiums will fall by up to 10%

● young women's premiums will rise by up to 30%

by: BBC News: http://www.bbc.com/news/business-12608777

Not Everything That Is Possible Is Legal

Page 130: Big it data workshop   pub

Lu

tzFi

nger

.com

Let me analyze your Social Network Connections. If they

are “trustworthy” you will become easier a Credit.

Ethical or Not?

by: BBC News: http://www.bbc.com/news/business-12608777

How About Community Profiling

Page 131: Big it data workshop   pub

Lu

tzFi

nger

.com

Nobel Worthy!

Muhammad YunusPhoto by University of Salford under Creative Commons CC BY 2.0

Page 132: Big it data workshop   pub

Lu

tzFi

nger

.com

Thank You