an explanation of machine learning for business

8

Click here to load reader

Upload: clement-levallois

Post on 19-Jun-2015

811 views

Category:

Business


1 download

DESCRIPTION

Slides of the course on big data by Clement Levallois from EMLYON Business School. For business students. Check the online video connected with these slides. -> Machine learning explained in simple terms to a business audience: what is a training set, a test set, and how does machine learning differ from statistics.

TRANSCRIPT

Page 1: An explanation of machine learning for business

MK99 – Big Data 1

Big data &

cross-platform analytics MOOC lectures Pr. Clement Levallois

Page 2: An explanation of machine learning for business

MK99 – Big Data 2

A short note on machine learning for business

Page 3: An explanation of machine learning for business

MK99 – Big Data 3

Machine Learning • Family of techniques to formulate predictions, based on

data

• Why is it called Machine learning? – Machine: it is about algorithms running on computers, not

equations solved with pen and paper

– Learning: the algorithms start with zero accuracy. Then, they get more accurate while being fed with data: the algorithm refines its parameters, it “learns”.

Page 4: An explanation of machine learning for business

MK99 – Big Data 4

Typical set up 1. We start with a training set

Data already collected: we know the actual values to be found Ex: a list of consumers, their characteristics and their associated credit score

2. The algorithms are trained on this set

-> A series of algorithms run on the training set. Their parameters get adjusted so that the actual values get progressively predicted the most accurately possible.

3. A test set (“fresh data”) is brought -> List of consumer characteristics. Their credit score is known but hidden.

4. Running the trained algo on the test set -> Predict the credit score for each consumer in the test set, using the algorithms that were trained on phase 1

5. A measure of accuracy - Given the correct values to be predicted in the test set, how accurate were the algorithms? -> Where the credit scores accurately predicted?

Actual values

Page 5: An explanation of machine learning for business

MK99 – Big Data 5

Vocabulary

• Data scientists “train” their model and then test it

• They are concerned by “out-of-sample” prediction

– The fact that their model predicts accurately data points in the training set (the “sample”) is trivial

– This is the accuracy on the test set that matters!

– This is called an “out-of-sample” prediction

Page 6: An explanation of machine learning for business

MK99 – Big Data 6

Why is machine learning (ML) so different from statistics?

• ML does not focus on causality – just prediction! – Note: for this reason, ML cannot predict the effect of

intervention - it has no causal model.

• ML has a special concern for out-of-sample prediction

– Will be especially careful about over-fitting

• ML picks its algorithms from diff academic disciplines

– Text, network relations, clustering, not just traditional statistics

• Coming from comput. sciences, ML has affinities with big data – Procedures optimized for speed and scale

But the best data scientists often started as statisticians / econometricians: See Hal Varian: Chief Economist at Google

Page 7: An explanation of machine learning for business

MK99 – Big Data 7

• Kaggle is a website hosting ML competitions, anybody can join

• Goal: make the best prediction on a dataset, with cash prizes

• From predicting clicks on ads to epileptic seizures

• Always the same setup: a training set, a test set, a scoring based on accuracy.

Page 8: An explanation of machine learning for business

MK99 – Big Data 8

This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com)

Contact Clement Levallois (levallois [at] em-lyon.com) for more information.