an explanation of machine learning for business

Post on 19-Jun-2015

811 Views

Category:

Business

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides of the course on big data by Clement Levallois from EMLYON Business School. For business students. Check the online video connected with these slides. -> Machine learning explained in simple terms to a business audience: what is a training set, a test set, and how does machine learning differ from statistics.

TRANSCRIPT

MK99 – Big Data 1

Big data &

cross-platform analytics MOOC lectures Pr. Clement Levallois

MK99 – Big Data 2

A short note on machine learning for business

MK99 – Big Data 3

Machine Learning • Family of techniques to formulate predictions, based on

data

• Why is it called Machine learning? – Machine: it is about algorithms running on computers, not

equations solved with pen and paper

– Learning: the algorithms start with zero accuracy. Then, they get more accurate while being fed with data: the algorithm refines its parameters, it “learns”.

MK99 – Big Data 4

Typical set up 1. We start with a training set

Data already collected: we know the actual values to be found Ex: a list of consumers, their characteristics and their associated credit score

2. The algorithms are trained on this set

-> A series of algorithms run on the training set. Their parameters get adjusted so that the actual values get progressively predicted the most accurately possible.

3. A test set (“fresh data”) is brought -> List of consumer characteristics. Their credit score is known but hidden.

4. Running the trained algo on the test set -> Predict the credit score for each consumer in the test set, using the algorithms that were trained on phase 1

5. A measure of accuracy - Given the correct values to be predicted in the test set, how accurate were the algorithms? -> Where the credit scores accurately predicted?

Actual values

MK99 – Big Data 5

Vocabulary

• Data scientists “train” their model and then test it

• They are concerned by “out-of-sample” prediction

– The fact that their model predicts accurately data points in the training set (the “sample”) is trivial

– This is the accuracy on the test set that matters!

– This is called an “out-of-sample” prediction

MK99 – Big Data 6

Why is machine learning (ML) so different from statistics?

• ML does not focus on causality – just prediction! – Note: for this reason, ML cannot predict the effect of

intervention - it has no causal model.

• ML has a special concern for out-of-sample prediction

– Will be especially careful about over-fitting

• ML picks its algorithms from diff academic disciplines

– Text, network relations, clustering, not just traditional statistics

• Coming from comput. sciences, ML has affinities with big data – Procedures optimized for speed and scale

But the best data scientists often started as statisticians / econometricians: See Hal Varian: Chief Economist at Google

MK99 – Big Data 7

• Kaggle is a website hosting ML competitions, anybody can join

• Goal: make the best prediction on a dataset, with cash prizes

• From predicting clicks on ads to epileptic seizures

• Always the same setup: a training set, a test set, a scoring based on accuracy.

MK99 – Big Data 8

This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com)

Contact Clement Levallois (levallois [at] em-lyon.com) for more information.

top related