demystifying machine learning - wordpress.com · machine learning • a computer program is said to...
TRANSCRIPT
Demystifying Machine
LearningProfessor Paul Kennedy
Centre for Artificial Intelligence
School of Software, Faculty of Engineering & IT
• What is ML?
• Different types of ML & what they can and can’t do
• Some examples
• Brief overview of some common ML approaches
• How to go about solving problems with ML
What is Machine
Learning?
Machine Learning
• A computer program is said to learn from experience E with respect to tasks
T and performance measure P if its performance at tasks T measured by P
improves with experience E
• T = tasks
• e.g., predict if customer will take up offer
• P = performance measure
• e.g., % correctly predicted
• E = experience
• e.g., past examples of customers who did and didn’t take up offer.
Features Model
Task
Domain
objects Output
Flach 2012
Features Model
Learning
Algorithm
Task
Learning problem
Domain
objects Output
Training
data
Flach 2012
Different types of ML &
what it can and can’t do
Machine Learning• Unsupervised methods
• Make sense of the data
• Supervised methods
• Learn a relationship between inputs and outputs from old
data.
• Use to predict output for new inputs.
• Others: reinforcement learning, semi-supervised learning,
transfer learning, one-class learning, …
Unsupervised Learning
• Make sense of the
dataset by representing
it in another form.
• Identify clusters or
groups in the data.
• Identify frequent
patterns in the data.
• e.g. clustering,
association rule mining,
neural networks, …
Supervised Learning
• Using existing data, learn the
relationship between some
‘inputs’ to predict a known
‘output’ or target value.
• Then it can be used to make
predictions for new ‘input’
data.
• e.g. classification and
regression
• Decision trees, neural
networks, support vector
machines, random forest, …
Supervised Learning
• Using existing data, learn the
relationship between some
‘inputs’ to predict a known
‘output’ or target value.
• Then it can be used to make
predictions for new ‘input’
data.
• e.g. classification and
regression
• Decision trees, neural
networks, support vector
machines, random forest, …
What it can do• Unsupervised:
• can divide the data points into groups, that hopefully match reality
and assess how well the clusters make sense.
• Supervised:
• if there are enough data points with few enough attributes …
• that match the scope of the real world domain …
• and there is a relationship between the inputs and outputs, …
• these methods can find a pattern that can generalise and we can
estimate the quality.
What it can’t do
• It’s not magic!
• Heavily reliant on the quality and amount of input /
training data
• Usually doesn’t ‘understand’ the problem in a ‘human’
way
• Sometimes cannot explain why the decision is made
Some examples
Predicting the 2012 US
election result
• Nate Silver used predictive analytics
& statistics to correctly predict
outcomes of 50 out of 50 states
from polling and related data.
• Republican pundits were confident
in their landslide-win predictions.
Democrat pundits predicted razor-
thin victory.
• Shows the power of a data-centric
approach over “gut-feeling”.
AlexNet
• Deep convolutional neural
network using GPUs
• Famously won the 2012
ImageNet LSVRC-2012
competition by a large margin
- 15.3% vs 26.2% (second
place) error rates
• Boosted deep learning
research
• Beaten in 2015 by Microsoft’s
ResNet.
Krizhevsky et al, Communications of the ACM. 60 (6): 84–90.
Locally Interpretable Model-Agnostic
Explanations
LIME, Ribeiro et al, 2016
LIME, Ribeiro et al, 2016
Brown et al., Adversarial Patch, arXiv:1712.09665v2 [cs.CV] 17 May 2018
Brief overview of
methods
Unsupervised techniques
• Association analysis / Market Basket Analysis
• Identify frequent and/or interesting transactions from databases.
• e.g. if someone buys bread they are also likely to buy butter.
• Also measures of how often the rule appears and/or is true
Unsupervised techniques
• Clustering
• Identify groups within
data where data points
in the group are similar
to one another but
different to those in
other groups.
• hierarchical, k-means,
k-medoids, EM,
DBScan, BIRCH, …
Classification & prediction
methods• Linear regression, logistic regression, sparse variants …
• k-nearest neighbour classifiers, …
• Decision trees
• Random forest
• Artificial neural networks: multilayer perceptrons, deep networks,
convolutional neural networks, recurrent neural networks, …
• Support vector machines, …
• Naive Bayes, Bayesian networks, …
• Ensemble methods: gradient boosting, Adaboost, XGBoost, …
Decision Tree
Random Forest
?
Random Forest
?
Random Forest
Random Forest
Neural Nets
Yes
No
Support Vector Machines
How to go about
solving problems
Fitting to the business
• Understand the business context, and stronger, framing a business
question.
• Translating the business question into a data analytics question.
• Collecting, understanding and processing data from across the
business and possibly externally.
• Build models and evaluate them.
• Deploying the results in the business to deliver benefits.
• Iterative process.
CRISP-DM viewCRoss-Industry Standard Process for Data Mining (CRISP-
DM) methodologySource: Kenneth Jensen / Wikimedia Commons / Public Domain
Validation
• Need to evaluate the
quality of models.
• Many approaches
• Hold out sets
• Bootstrap validation
• K-fold cross validation
Train
Validation
Test
Used to
train the
model
Used to tune
parameters
Used to
evaluate model
Ways it can go wrong• Answering the wrong business question
• Not deploying properly
• Model goes stale - underlying problem is non stationary
• Overfitting - the model has a high training accuracy, but doesn’t work well in real world
• Underfitting - the model has a low training accuracy
• p>>n aka the curse of dimensionality
• Too many attributes for the number of data points
• Imbalanced classes - biased towards the major class, often not the one you’re interested in
• Bias - training data is biased or not representative of the real world situation
• Insufficient data cleaning