demystifying machine learning - wordpress.com · machine learning • a computer program is said to...

Post on 18-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Demystifying Machine

LearningProfessor Paul Kennedy

paul.kennedy@uts.edu.au

Centre for Artificial Intelligence

School of Software, Faculty of Engineering & IT

• What is ML?

• Different types of ML & what they can and can’t do

• Some examples

• Brief overview of some common ML approaches

• How to go about solving problems with ML

What is Machine

Learning?

Machine Learning

• A computer program is said to learn from experience E with respect to tasks

T and performance measure P if its performance at tasks T measured by P

improves with experience E

• T = tasks

• e.g., predict if customer will take up offer

• P = performance measure

• e.g., % correctly predicted

• E = experience

• e.g., past examples of customers who did and didn’t take up offer.

Features Model

Task

Domain

objects Output

Flach 2012

Features Model

Learning

Algorithm

Task

Learning problem

Domain

objects Output

Training

data

Flach 2012

Different types of ML &

what it can and can’t do

Machine Learning• Unsupervised methods

• Make sense of the data

• Supervised methods

• Learn a relationship between inputs and outputs from old

data.

• Use to predict output for new inputs.

• Others: reinforcement learning, semi-supervised learning,

transfer learning, one-class learning, …

Unsupervised Learning

• Make sense of the

dataset by representing

it in another form.

• Identify clusters or

groups in the data.

• Identify frequent

patterns in the data.

• e.g. clustering,

association rule mining,

neural networks, …

Supervised Learning

• Using existing data, learn the

relationship between some

‘inputs’ to predict a known

‘output’ or target value.

• Then it can be used to make

predictions for new ‘input’

data.

• e.g. classification and

regression

• Decision trees, neural

networks, support vector

machines, random forest, …

Supervised Learning

• Using existing data, learn the

relationship between some

‘inputs’ to predict a known

‘output’ or target value.

• Then it can be used to make

predictions for new ‘input’

data.

• e.g. classification and

regression

• Decision trees, neural

networks, support vector

machines, random forest, …

What it can do• Unsupervised:

• can divide the data points into groups, that hopefully match reality

and assess how well the clusters make sense.

• Supervised:

• if there are enough data points with few enough attributes …

• that match the scope of the real world domain …

• and there is a relationship between the inputs and outputs, …

• these methods can find a pattern that can generalise and we can

estimate the quality.

What it can’t do

• It’s not magic!

• Heavily reliant on the quality and amount of input /

training data

• Usually doesn’t ‘understand’ the problem in a ‘human’

way

• Sometimes cannot explain why the decision is made

Some examples

Predicting the 2012 US

election result

• Nate Silver used predictive analytics

& statistics to correctly predict

outcomes of 50 out of 50 states

from polling and related data.

• Republican pundits were confident

in their landslide-win predictions.

Democrat pundits predicted razor-

thin victory.

• Shows the power of a data-centric

approach over “gut-feeling”.

AlexNet

• Deep convolutional neural

network using GPUs

• Famously won the 2012

ImageNet LSVRC-2012

competition by a large margin

- 15.3% vs 26.2% (second

place) error rates

• Boosted deep learning

research

• Beaten in 2015 by Microsoft’s

ResNet.

Krizhevsky et al, Communications of the ACM. 60 (6): 84–90.

Locally Interpretable Model-Agnostic

Explanations

LIME, Ribeiro et al, 2016

LIME, Ribeiro et al, 2016

Brown et al., Adversarial Patch, arXiv:1712.09665v2 [cs.CV] 17 May 2018

Brief overview of

methods

Unsupervised techniques

• Association analysis / Market Basket Analysis

• Identify frequent and/or interesting transactions from databases.

• e.g. if someone buys bread they are also likely to buy butter.

• Also measures of how often the rule appears and/or is true

Unsupervised techniques

• Clustering

• Identify groups within

data where data points

in the group are similar

to one another but

different to those in

other groups.

• hierarchical, k-means,

k-medoids, EM,

DBScan, BIRCH, …

Classification & prediction

methods• Linear regression, logistic regression, sparse variants …

• k-nearest neighbour classifiers, …

• Decision trees

• Random forest

• Artificial neural networks: multilayer perceptrons, deep networks,

convolutional neural networks, recurrent neural networks, …

• Support vector machines, …

• Naive Bayes, Bayesian networks, …

• Ensemble methods: gradient boosting, Adaboost, XGBoost, …

Decision Tree

Random Forest

?

Random Forest

?

Random Forest

Random Forest

Neural Nets

Yes

No

Support Vector Machines

How to go about

solving problems

Fitting to the business

• Understand the business context, and stronger, framing a business

question.

• Translating the business question into a data analytics question.

• Collecting, understanding and processing data from across the

business and possibly externally.

• Build models and evaluate them.

• Deploying the results in the business to deliver benefits.

• Iterative process.

CRISP-DM viewCRoss-Industry Standard Process for Data Mining (CRISP-

DM) methodologySource: Kenneth Jensen / Wikimedia Commons / Public Domain

Validation

• Need to evaluate the

quality of models.

• Many approaches

• Hold out sets

• Bootstrap validation

• K-fold cross validation

Train

Validation

Test

Used to

train the

model

Used to tune

parameters

Used to

evaluate model

Ways it can go wrong• Answering the wrong business question

• Not deploying properly

• Model goes stale - underlying problem is non stationary

• Overfitting - the model has a high training accuracy, but doesn’t work well in real world

• Underfitting - the model has a low training accuracy

• p>>n aka the curse of dimensionality

• Too many attributes for the number of data points

• Imbalanced classes - biased towards the major class, often not the one you’re interested in

• Bias - training data is biased or not representative of the real world situation

• Insufficient data cleaning

Q&Apaul.kennedy@uts.edu.au

top related