machine learning –en introduktion - sas retrieval ai computational neuroscience data mining data...

26
Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – en introduktion Josefin Rosén, Senior Analytical Expert, SAS Institute [email protected] Twitter: @rosenjosefin #SASFORUMSE

Upload: vanquynh

Post on 22-Apr-2018

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Machine learning – en introduktion

Josefin Rosén, Senior Analytical Expert, SAS Institute

[email protected]

Twitter: @rosenjosefin

#SASFORUMSE

Page 2: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Machine learning – en introduktion

Agenda

Vad är machine learning?

När, var och hur används machine learning?

Exempel – deep learning

Machine learning i SAS

Page 3: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Machine learning – vad är det?

Wikipedia: Machine learning, a branch of artificial intelligence,

concerns the construction and study of systems that can learn

from data.

SAS: Machine learning is a branch of artificial intelligence that

automates the building of systems that learn from data, identify

patterns, and make decisions – with minimal human intervention.

Page 4: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Databases

Statistics

Information Retrieval

AI

Computational Neuroscience

Data Mining

Data Science

MachineLearning

PatternRecognition

Vad är vad egentligen?

Page 5: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

”Komplicerade metoder,

men användbara resultat”

Machine learning – vad är det?

Page 6: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

När används machine learning?

När modellens prediktionsnoggrannhet är viktigare än tolkningen

av modellen

När traditionella tillvägagångssätt inte passar, t ex när man har:

fler variabler än observationer

många korrelerade variabler

ostrukturerad data

fundamentalt ickelinjära eller ovanliga fenomen

Page 7: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Träningsdata

Regression

Beslutsträd

Neuralt nätverk

Page 8: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Var används machine learning?

Några exempel:

Rekommendationsapplikationer

Fraud detection

Prediktivt underhåll

Textanalys

Mönster och bildigenkänning

Den självkörande Google-bilen

Page 9: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Databases

Statistics

Information Retrieval

AI

Computational Neuroscience

Data Mining

Data Science

MachineLearning

PatternRecognition

Page 10: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Dat

a M

inin

gMachine Learning

*In semi-supervised learning, supervised prediction and classification algorithms are often combined with clustering.

SEMI-SUPERVISED LEARNING

Prediction and classification*Clustering*EM TSVMManifoldregularization Autoencoders

Multilayer perceptronRestricted Boltzmannmachines

SUPERVISED LEARNING

RegressionLASSO regressionLogistic regressionRidge regression

Decision treeGradient boostingRandom forests

Neural networks SVMNaïve BayesNeighborsGaussianprocesses

UNSUPERVISEDLEARNING

A priori rulesClustering

k-means clusteringMean shift clustering Spectral clustering

Kernel densityestimationNonnegative matrixfactorizationPCA

Kernel PCASparse PCA

Singular valuedecompositionSOM

Don’t know y

Know ySometimes

know y

Page 11: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Deep learning

Deep learning – att använda neurala nätverk med fler än två gömda lager

Används framgångsrikt bl a inom mönsterigenkänning

Bra på att extrahera features från ett dataset

Page 12: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

MNIST träningsdata

784 variabler bildar en 28x28 digital grid

784-dimensionell inputvektor X = (x1,…,x784)

Varierande gråskala från 0 till 255

60,000 träningsbilder med label

10,000 testbilder utan label

Page 13: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

MNIST exempel

Träna en stacked denoising autoencoder

Extrahera representativa features från MNIST data

Jämföra med PCA, två PCs

Page 14: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Stacked

denoising

autoencoder

h1

h2

h3

h4

h5

Partially Corrupted Input Features

Hidden Neurons

Hidden Neurons

Hidden Neurons

Hidden Neurons

Hidden Neurons

Uncorrupted Output Features Target Layer

Input Layer

Extractable FeaturesHidden layers

Page 15: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Record ID Pixel 1 Pixel 2 Pixel 3 Pixel 4 Pixel 5 Pixel 6 Pixel 7 Pixel 8 Pixel 9 Pixel 10 …

1 0 0 0 0 0 5 8 11 6 3 …

2 0 0 0 0 10 20 45 46 36 24 …

3 0 25 37 32 40 64 107 200 67 46 …

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Record ID Hidden Unit 1 Hidden Unit 2

1 0.98754 0.32453

2 0.76854 0.87345

3 0.87435 0.05464

⋮ ⋮ ⋮

h1

h2

h3

Partially Corrupted Input Features

Hidden Neurons

Hidden Neurons

Hidden Neurons

Input Layer

Extractable Features

Page 16: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Feature extraction – denoising autoencoder

Page 17: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Feature extraction - PCA

Page 18: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

SAS machine learning algoritmer Neural networks

Decision trees

Random forests

Associations and sequence

discovery

Gradient boosting and bagging

Support vector machines

Nearest-neighbor mapping

K-means clustering

DBSCAN

Self-organizing maps

Local search optimization techniques

such as genetic algorithms

Expectation maximization

Multivariate adaptive regression

splines

Bayesian networks

Kernel density estimation

Principal components analysis

Singular value decomposition

Gaussian mixture models

Sequential covering rule building

Model ensembles

Recommendations

Page 19: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

SAS-produkter som använder machine learning

SAS Enterprise Miner

SAS Text Miner

SAS In-Memory Statistics for Hadoop

SAS Visual Statistics

SAS/STAT

SAS/OR

SAS Factory Miner

Page 20: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Su

pe

rvis

ed

lea

rnin

g a

lgo

ritme

r

Algoritm SAS EM-noder SAS procedurer

Regression High Performance Regression

LARS

Partial Least Squares

Regression

ADAPTIVEREG

GAM

GENMOD

GLMSELECT

HPGENSELECT

HPLOGISTIC

HHPQUANTSELECT

HPREG

LOGISTIC

QUANTREG

QUANTSELECT

REG

Beslutsträd Decision Tree

High Performance Tree

ARBORETUM

HPSPLIT

Random forest High Performance Tree HPFOREST

Gradient boosting Gradient Boosting ARBORETUM

Neurala nätverk AutoNeural

DMNeural

High Performance Neural

Neural Network

HPNEURAL

NEURAL

Support vector machine High Performance Support Vector Machine HPSVM

Naïve Bayes HPBNET*

Neighbors Memory Based Reasoning DISCRIM

*PROC HPBNET kan lära sig olika nätverksstrukturer (naïve, TAN, PC, och MB) och automatiskt välja den bästa modellen

Page 21: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Unsupervised learning algoritmer

Algoritm SAS EM-noder SAS procedurer

A priori rules Association

Link Analysis

K-means klustring Cluster

High Performance Cluster

FASTCLUS

HPCLUS

Spektral klustring Custom lösning genom Base SAS och procedurerna

DISTANCE och PRINCOMP

Kernel density estimation KDE

Kernel PCA Custom lösning genom Base SAS och procedurerna

CORR, PRINCOMP och SCORE

Singular value decomposition HPTMINE

IML

Self organizing maps SOM/Kohonen

Page 22: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Semi-Supervised learning algoritmer

Algoritm SAS EM-noder SAS procedurer

Denoising autoencoders HPNEURAL

NEURAL

Page 23: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Big data

Beräkningsresurser

Kraftfulla datorer

Billig datalagring

Varför har machine learning fått ökat intresse?

“Space is big. You just won't believe how

vastly, hugely, mind-bogglingly big it is”

Douglas Adams i ”Liftarens guide till galaxen”

Page 24: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Page 25: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Mer läsning

• White papers

http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html

http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf

• SAS-länkar

http://www.sas.com/en_us/insights/analytics/machine-learning.html

http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-we-

knew.html

• SAS Data Mining Community https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/

• Big Data Matters Webinar Series: www.sas.com/bigdatamatters

Page 26: Machine learning –en introduktion - SAS Retrieval AI Computational Neuroscience Data Mining Data Science Machine Learning Pattern Recognition Vad är vad egentligen? Copyright ©

Copyright © 2015, SAS Institute Inc. All rights reserved.

Tack!