Transcript

MK99 – Big Data 1

Big data &

cross-platform analytics MOOC lectures Pr. Clement Levallois

MK99 – Big Data 2

How to create value from data

MK99 – Big Data 3

How to read these slides • Not a closed list, not a recipe!

• Rather, these are essential building blocks for

a strategy of value creation based on data.

MK99 – Big Data 4

predict 1. Domain-specific historical data 2. Data mashups 3. Feedback data on accuracy of

prediction

1. Hard to start (no data yet…) 2. Risk missing the long tail 3. Over reliance

1. Data scientists for predictive algo 2. Econometricians for modelling 3. Domain specialists for heuristics

and quality control

1. Predicting credit score 2. Predicting crime 3. Predicting deals

The data you need

The people you need The ones doing it

The hard part

MK99 – Big Data 5

suggest (rec sys) 1. A rich static dataset and / or

historical data on past transactions 2. A feedback mechanism providing

data to improve on the suggestion procedure

1. Getting a proper feedback mechanism

2. Managing the serendipity problem

3. Finding the value proposition which goes beyond the simple “you purchased this, you’ll like that”

1. Data scientists 2. Domain specialists for insights

into the suggestion logic

Amazon’s product recommendation system Google’s “Related searches…” Retailer’s personalized recommendations

The data you need

The people you need The ones doing it

The hard part

MK99 – Big Data 6

curate 1. Dirty (inaccuracies, missing values, bad formatting)

2. Cheap but hard to collect 3. Preferably unstructured

1. Slow progress 2. Must maintain continuity 3. Scaling up / right incentives for

the workforce 4. Quality control

1. Cheap but not unskilled labor to

perform curation (hint: curators can be the users of a free service you provide to them)

2. Data scientists for quality control 3. Domain specialists for quality control

1. Thomson Reuters curating and selling scientific data

2. Nielsen and IRI curating and selling retail data

3. ImDB curating and selling movie data

The data you need

The people you need The ones doing it

The hard part

MK99 – Big Data 7

enrich

1. Clean 2. Diverse

1. Knowing which cocktail of data is valued by the market

2. Limit replicability 3. Establish legitimacy

1. Specialists of APIs / data mashups / data integration

2. Possibly: specialists of linked data 3. Domain specialists for quality

control

1. Selling samples from the enriched dataset

2. Selling aggregated indicators 3. Selling dashboards

The data you need

The people you need The ones doing it

The hard part

MK99 – Big Data 8

rank / match / compare

1. Varied data 1. Many attributes 2. Large range of values

1. Finding emergent, implicit attributes

2. Insuring consistency of the ranking 3. Avoid gaming of the system by the

users

1. Data scientists 2. Hacker mentality to imagine how

unstructured data can contribute to ranking

3. Domain specialists for quality control

1. Search engines ranking results

2. Yelp, Travelocity, etc… ranking destinations

3. Any system that needs to filter out best quality entities among a crowd of candidates

The data you need

The people you need The ones doing it already

The hard part

MK99 – Big Data 9

segment

1. Varied data 1. Many attributes 2. Large range of values

1. Choosing the relevant association measure

2. Judging the quality of the segmentation

3. Dealing with boundary cases

4. Choosing between supervised vs unsupervised methods

5. Deciding whether overlapping segments (instead of disjoint segments) are allowed.

1. Data scientists (including network analysts)

2. Domain specialists for quality control

1. All industries, when doing: 1. Marketing research (segment

markets) 2. Product development (segment

portfolio of products) 3. Advertising (target ads to groups)

The data you need

The people you need The ones doing it

The hard part

MK99 – Big Data 10

classify 1. Any dataset, the richer the

better 2. A feedback mechanism providing

data to improve on the classification procedure

1. Evaluating the quality of the comparison

2. Dealing with boundary cases

3. Choosing between a supervised and unsupervised approach (how many categories?)

1. Data scientists 2. Domain specialists for insights

into the classification logic

Detectors of spam (this email: spam or not?) Algorithmic trading (this piece of news: buy or sell?) Medical apps: computer-aided diagnosis based on a set of measurements

The data you need

The people you need The ones doing it

The hard part

MK99 – Big Data 11

Combos!

Curate Enrich

Segment Rank (within segments)

Enrich Rank

Enrich Suggest

(how to find most valuable customers in a CRM)

MK99 – Big Data 12

This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com) Contact Clement Levallois (levallois [at] em-lyon.com) for more information.


Top Related