an explanation of machine learning for business
Post on 19-Jun-2015
811 Views
Preview:
DESCRIPTION
TRANSCRIPT
MK99 – Big Data 1
Big data &
cross-platform analytics MOOC lectures Pr. Clement Levallois
MK99 – Big Data 2
A short note on machine learning for business
MK99 – Big Data 3
Machine Learning • Family of techniques to formulate predictions, based on
data
• Why is it called Machine learning? – Machine: it is about algorithms running on computers, not
equations solved with pen and paper
– Learning: the algorithms start with zero accuracy. Then, they get more accurate while being fed with data: the algorithm refines its parameters, it “learns”.
MK99 – Big Data 4
Typical set up 1. We start with a training set
Data already collected: we know the actual values to be found Ex: a list of consumers, their characteristics and their associated credit score
2. The algorithms are trained on this set
-> A series of algorithms run on the training set. Their parameters get adjusted so that the actual values get progressively predicted the most accurately possible.
3. A test set (“fresh data”) is brought -> List of consumer characteristics. Their credit score is known but hidden.
4. Running the trained algo on the test set -> Predict the credit score for each consumer in the test set, using the algorithms that were trained on phase 1
5. A measure of accuracy - Given the correct values to be predicted in the test set, how accurate were the algorithms? -> Where the credit scores accurately predicted?
Actual values
MK99 – Big Data 5
Vocabulary
• Data scientists “train” their model and then test it
• They are concerned by “out-of-sample” prediction
– The fact that their model predicts accurately data points in the training set (the “sample”) is trivial
– This is the accuracy on the test set that matters!
– This is called an “out-of-sample” prediction
MK99 – Big Data 6
Why is machine learning (ML) so different from statistics?
• ML does not focus on causality – just prediction! – Note: for this reason, ML cannot predict the effect of
intervention - it has no causal model.
• ML has a special concern for out-of-sample prediction
– Will be especially careful about over-fitting
• ML picks its algorithms from diff academic disciplines
– Text, network relations, clustering, not just traditional statistics
• Coming from comput. sciences, ML has affinities with big data – Procedures optimized for speed and scale
But the best data scientists often started as statisticians / econometricians: See Hal Varian: Chief Economist at Google
MK99 – Big Data 7
• Kaggle is a website hosting ML competitions, anybody can join
• Goal: make the best prediction on a dataset, with cash prizes
• From predicting clicks on ads to epileptic seizures
• Always the same setup: a training set, a test set, a scoring based on accuracy.
MK99 – Big Data 8
This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com)
Contact Clement Levallois (levallois [at] em-lyon.com) for more information.
top related