a to z of ai/ml: a quick introduction to artificial ... · a quick introduction to arti cial...
TRANSCRIPT
-
A to Z of AI/ML:A Quick Introduction to Artificial Intelligence and
Machine Learning Capabilities and ToolsEngCon 2017
Mark CrowleyAssistant Professor
Electrical and Computer EngineeringUniversity of [email protected]
Sep 23, 2017
Mark Crowley A to Z of AI/ML Sep 23, 2017 1 / 112
-
Introduction
Outline
Introduction
What is AI?
Neural Networks
Convolutional Neural Networks
Do you need AI/ML?
Mark Crowley A to Z of AI/ML Sep 23, 2017 2 / 112
-
Introduction
My Background
Waterloo : Assistant Professor, ECE Department since 2015
PhD at UBC in Computer Science with Prof. David Poole
Postdoc at Oregon State University
UW ECE ML Lab:https://uwaterloo.ca/scholar/mcrowley/lab
Waterloo Institute for Complexity and Innovation (WICI)
Research Fellow at ElementAI
Pattern Analysis and Machine Intelligence (PAMI)
http:\waterloo.ai
List of facultyResearch projects (co-op/internships)List of spinoff companies from UWaterloo (good place for project ideas)
Mark Crowley A to Z of AI/ML Sep 23, 2017 3 / 112
https://uwaterloo.ca/scholar/mcrowley/labhttp:\waterloo.ai
-
What is AI?
What do you think of when you hear?
Artificial Intelligence Machine Learning
Mark Crowley A to Z of AI/ML Sep 23, 2017 4 / 112
-
What is AI? Landscape of Big Data/AI/ML
Data, Big Data, Machine Learning, AI, etc, etc,
Machine Learning
Artificial Intelligence
Big Data Tools
Human Decision Making
Automated Decision MakingData
Data Analysis
X f1 f2 fkx1 .5 104.2 China
x2 .34 92.0 USA
xn .2 85.2 Canada
Reports, statistics,Charts, trends
Policies, Decision Rules,Summaries
Classification,Patterns, PredictionsProbabilities
Mark Crowley A to Z of AI/ML Sep 23, 2017 9 / 112
-
What is AI? Landscape of Big Data/AI/ML
Data, Big Data, Machine Learning, AI, etc, etc,
Machine Learning
Artificial Intelligence
Big Data Tools
Human Decision Making
Automated Decision Making
DataData Analysis Classification,
Patterns, PredictionsProbabilities
X f1 f2 fkx1 .5 104.2 China
x2 .34 92.0 USA
xn .2 85.2 Canada
Reports, statistics,Charts, trends
Policies, Decision Rules,Summaries
Multi-agent
systems
CNN DQN
Game Theory
Deep Learning
ReinforcementLearning
RNN
LSTM
ILP
Probabilistic Programming
ConstraintProgramming
SAT
SMP
A3C
Nat Lang Processing
RoboticsVision
Mark Crowley A to Z of AI/ML Sep 23, 2017 10 / 112
features
Dat
a po
ints
-
What is AI? Landscape of Big Data/AI/ML
Major Types/Areas of AI
Artificial Intellgience: some algorithm to enable computers to performactions we define as requireing intelligence.
This is a moving target.
Search Based Heuristic Optimization (A*)
Evolutionary computation (genetic algorithms)
Logic Programming (inductive logic programming, fuzzy logic)
Probabilistic Reasoning Under Uncertainty (bayesian networks)
Computer Vision
Natural Language Processing
Robotics
Machine Learning
Mark Crowley A to Z of AI/ML Sep 23, 2017 11 / 112
-
What is AI? Landscape of Big Data/AI/ML
Major Types/Areas of AI
Artificial Intellgience: some algorithm to enable computers to performactions we define as requireing intelligence. This is a moving target.
Search Based Heuristic Optimization (A*)
Evolutionary computation (genetic algorithms)
Logic Programming (inductive logic programming, fuzzy logic)
Probabilistic Reasoning Under Uncertainty (bayesian networks)
Computer Vision
Natural Language Processing
Robotics
Machine Learning
Mark Crowley A to Z of AI/ML Sep 23, 2017 11 / 112
-
What is AI? Landscape of Big Data/AI/ML
Major Types/Areas of AI
Artificial Intellgience: some algorithm to enable computers to performactions we define as requireing intelligence. This is a moving target.
Search Based Heuristic Optimization (A*)
Evolutionary computation (genetic algorithms)
Logic Programming (inductive logic programming, fuzzy logic)
Probabilistic Reasoning Under Uncertainty (bayesian networks)
Computer Vision
Natural Language Processing
Robotics
Machine Learning
Mark Crowley A to Z of AI/ML Sep 23, 2017 11 / 112
-
What is AI? Landscape of Big Data/AI/ML
Types of Machines Learning
Machine Learning: Detect patterns in data, use the uncovered patternsto predict future data or other outcomes of interest Kevin Murphy,Google Research.
Mark Crowley A to Z of AI/ML Sep 23, 2017 12 / 112
-
What is AI? Landscape of Big Data/AI/ML
Deep Learning
Deep Learning: methods which perform machine learning through the useof multilayer neural networks of some kind. Deep Learning can be appliedin any of the three main types of ML:
Supervised Learning : very common, enourmous improvement inrecent years
Unsupervised Learning : just beginning, lots of potential
Reinforement Learning : recent (past 3 years) this has exploded,exspecially for video games
Mark Crowley A to Z of AI/ML Sep 23, 2017 14 / 112
-
What is AI? Landscape of Big Data/AI/ML
Increasing Complexity of Supervised ML Methods
1 mean, mode, max, min - basic statistics and patterns2 prediction/regression - least squares, ridge regression3 linear classification - use distances and separation of data points.
(logistic regression, SVM, KNN)4 Kernel Based Classification - define a mapping from original data to a
new space, allow nonlinear divisions to be found5 Decision trees - learn rules that divide data arbitrarily (C4.5, Random
Forests, AdaBoost)6 Neural Networks - learn function using neurons7 Deep Neural Networks - same, but deep :)8 Recurrent Neural Networks - adding links to past timesteps, learning
with memory of the past9 Convolutional Neural Networks - adding convolutional filters, good for
images10 Inception Resnets, Long-Term Short-Term Networks, Voxception
Networks, .... oh it keeps going...Mark Crowley A to Z of AI/ML Sep 23, 2017 15 / 112
-
What is AI? Classification
One Example of ML: Classification
Mark Crowley A to Z of AI/ML Sep 23, 2017 16 / 112
-
What is AI? Classification
Clustering vs. Classification
Clustering
Unsupervised
Uses unlabeled data
Organize patterns w.r.t. anoptimization criteria
Requires a definition of similarity
Hard to evaluate
Examples: K-means, FuzzyC-means, HierarchicalClustering, DBScan
Classification
Supervised
Uses labeled data
Requires training phase
Domain sensitive
Easy to evaluate (you know thecorrect answer)
Examples: Naive Bayes, KNN,SVM, Decision Trees, RandomForests
Mark Crowley A to Z of AI/ML Sep 23, 2017 18 / 112
-
What is AI? Classification
Classification Performance Depends on the Algorithm
A good example of this choices is Support Vector Machines (SVMs).
popular until dawn of deep learning in past few years
core idea: find a dividing hyperplane
many variations: plane can be linear, polynomial, gaussian,high-dimensional
So what is the right approach? Experimentation!
Mark Crowley A to Z of AI/ML Sep 23, 2017 20 / 112
-
What is AI? Classification
Classification Performance Depends on the Algorithm
A good example of this choices is Support Vector Machines (SVMs).
popular until dawn of deep learning in past few years
core idea: find a dividing hyperplane
many variations: plane can be linear, polynomial, gaussian,high-dimensional
So what is the right approach?
Experimentation!
Mark Crowley A to Z of AI/ML Sep 23, 2017 20 / 112
-
What is AI? Classification
Classification Performance Depends on the Algorithm
A good example of this choices is Support Vector Machines (SVMs).
popular until dawn of deep learning in past few years
core idea: find a dividing hyperplane
many variations: plane can be linear, polynomial, gaussian,high-dimensional
So what is the right approach? Experimentation!
Mark Crowley A to Z of AI/ML Sep 23, 2017 20 / 112
-
What is AI? Classification
Classification Performance Depends on the Algorithm
So choose carefully...See http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
Mark Crowley A to Z of AI/ML Sep 23, 2017 21 / 112
http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.htmlhttp://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
-
Neural Networks Building Upon Classic Machine Learning
Outline
Introduction
What is AI?
Neural NetworksBuilding Upon Classic Machine LearningHistory Of Neural NetworksImproving Performance
Convolutional Neural Networks
Do you need AI/ML?
Mark Crowley A to Z of AI/ML Sep 23, 2017 22 / 112
-
Neural Networks Building Upon Classic Machine Learning
Linear Regression vs. Logistic Regression
A simple type of Generalized Linear Model
Linear regression learns a function to predict a continuous variableoutput of continous or discrete input variables
Y = b0 +
(biXi ) +
Logistic regression predicts the probability of an outcome, theappropriate class for an input vector or the odds of one outcomebeing more likely than another.
Mark Crowley A to Z of AI/ML Sep 23, 2017 23 / 112
-
Neural Networks Building Upon Classic Machine Learning
Logistic Regression as a Graphical Model
o(x) = (wT xi ) = (w0 +i
wixi ) =1
1 + exp((w0 +
i wixi ))
Mark Crowley A to Z of AI/ML Sep 23, 2017 25 / 112
-
Neural Networks Building Upon Classic Machine Learning
Logistic Regression Used as a Classifier
Logistic Regression can be used as a simple linear classifier.
Compare probabilities of each class P(Y = 0|X ) and P(Y = 1|X ).Treat the halfway point on the sigmoid as the decision boundary.
P(Y = 1|X ) > 0.5 classify X in class 1w0 +
i
wixi = 0
Mark Crowley A to Z of AI/ML Sep 23, 2017 26 / 112
-
Neural Networks Building Upon Classic Machine Learning
Training Logistic Regression Model via Gradient Descent
Cant easily perform Maximum Likelihood Estimation
The negative log-likehood of the logistic function is given by NLL andits gradient by g
NLL(w) =Ni=1
log
(1 + exp((w0 +
i
wixi ))
)
g =
w=i
((wT xi ) yi )xi
Then we can update the parameters iteratively
k+1 = k kgk
where k is the learning rate or step size.Mark Crowley A to Z of AI/ML Sep 23, 2017 27 / 112
-
Neural Networks History Of Neural Networks
Neural Networks to learn f : X Yf can be a non-linear function
X (vector of) continuous and/or discrete variablesY (vector of) continuous and/or discrete variables
Feedforward Neural networks - Represent f by network of non-linear(logistic/sigmoid/ReLU) units:
Input layer, X
Output layer, Y
Hidden layer, H
Nonlinear UnitSigmoid/ReLU/ELU
Mark Crowley A to Z of AI/ML Sep 23, 2017 37 / 112
-
Neural Networks History Of Neural Networks
Basic Three Layer Neural Network
Input Layer
vector data, each input collects one feature/dimension of the dataand passes it on to the (first) hidden layer.
Hidden Layer
Each hidden unit computes a weighted sum of all the units from theinput layer (or any previous layer) and passes it through a nonlinearactivation function.
Output Layer
Each output unit computes a weighted sum of all the hidden unitsand passes it through a (possibly nonlinear) threshold function.
Mark Crowley A to Z of AI/ML Sep 23, 2017 38 / 112
-
Neural Networks History Of Neural Networks
Properties of Neural Networks
Universality: Given a large enough layer of hidden units (or multiplelayers) a NN can represent any function.
Representation Learning: classic statistical machine learning is aboutlearning functions to map input data to output. But Neural Networks,and especially Deep Learning, are more about learning arepresentation in order to perform classification or some other task.
Mark Crowley A to Z of AI/ML Sep 23, 2017 40 / 112
-
Neural Networks History Of Neural Networks
Hidden Layer: Adding Nonlinearity
Each hidden unit emits an output that is a nonlinear activationfunction of its net activiation.
yj = f (netj)
This is essential to neural networks power, if its linear then it allbecomes just linear regression.
The output is thus thresholded through this nonlinear activationfunction.
Mark Crowley A to Z of AI/ML Sep 23, 2017 44 / 112
-
Neural Networks History Of Neural Networks
Activation Functions
tanh was another common function.
sigmoid is now discourage except for final layer to obtainprobabilities. Can over-saturate easily.
ReLU is the new standard activation function to use.
Mark Crowley A to Z of AI/ML Sep 23, 2017 45 / 112
-
Neural Networks History Of Neural Networks
Rectified Linear Activation
CHAPTER 6. DEEP FEEDFORWARD NETWORKS
0
z
0
g(z
)=
max{0
,z}
Figure 6.3: The rectified linear activation function. This activation function is the defaultactivation function recommended for use with most feedforward neural networks. Applyingthis function to the output of a linear transformation yields a nonlinear transformation.However, the function remains very close to linear, in the sense that is a piecewise linearfunction with two linear pieces. Because rectified linear units are nearly linear, theypreserve many of the properties that make linear models easy to optimize with gradient-based methods. They also preserve many of the properties that make linear modelsgeneralize well. A common principle throughout computer science is that we can buildcomplicated systems from minimal components. Much as a Turing machines memoryneeds only to be able to store 0 or 1 states, we can build a universal function approximatorfrom rectified linear functions.
175
(Goodfellow 2016)
Rectified Linear Units (ReLU) have become standard max(0, netj)strong signals are alwasy easy to distinguishmost values are zero, deritive is mostly zerothey do not saturate as easily as sigmoid
new Exponential linear units - evidence that they perform better thanReLU in some situations.
Mark Crowley A to Z of AI/ML Sep 23, 2017 46 / 112
-
Neural Networks History Of Neural Networks
Gradient Descent
d
E Error function, MSE, cross-entropy loss
For Neural Networks,
E[w] no longer convex in w
Error Function: Mean Squared Error, cross-entropy loss, etc.
Gradient: 5E [w] =[
Ew0
, Ew1 , . . . ,Ewd
]Training Update Rule: wi = Ewi where is the training rate.Note: For regression, others, this gradient is convex. In ANNs it is not. Sowe must solve iteratively
(Slides from Tom Mitchell ML Course, CMU, 2010)
Mark Crowley A to Z of AI/ML Sep 23, 2017 53 / 112
-
Neural Networks History Of Neural Networks
Gradient Descent
d
E Error function, MSE, cross-entropy loss
For Neural Networks,
E[w] no longer convex in w
Error Function: Mean Squared Error, cross-entropy loss, etc.
Gradient: 5E [w] =[
Ew0
, Ew1 , . . . ,Ewd
]Training Update Rule: wi = Ewi where is the training rate.Note: For regression, others, this gradient is convex. In ANNs it is not. Sowe must solve iteratively
(Slides from Tom Mitchell ML Course, CMU, 2010)
Mark Crowley A to Z of AI/ML Sep 23, 2017 53 / 112
-
Neural Networks History Of Neural Networks
Incremental Gradient Descent
Let error function be : El [w] =12 (y
l o l)2Do until satisfied:
For each training example l in D1 Compute the gradient 5E [w]2 update weights : w = w 5 E [w]
Note: can also use batch gradient descent on many points at once.
Mark Crowley A to Z of AI/ML Sep 23, 2017 54 / 112
-
Neural Networks History Of Neural Networks
Backpropagation Algorithm
We need an iterative algorithm for getting the gradient efficiently.For each training example:
1 Forward propagation: Input the training example to the network andcompute outputs
2 Compute output units errors:
lk = olk(1 o lk)(y lk o lk)
3 Compute hidden units errors:
lh = olh(1 o lh)
k
wh,klk
4 Update network weights:
wi ,j = wi ,j + wli ,j = wi ,j +
ljo
li
Mark Crowley A to Z of AI/ML Sep 23, 2017 56 / 112
-
Neural Networks History Of Neural Networks
A Short History
40s Early work in NN goes back to the 40s with a simple modelof the neuron by McCulloh and Pitt as a summing andthresholding devices.
1958 Rosenblatt in 1958 introduced the Perceptron,a two layernetwork (one input layer and one output node with a bias inaddition to the input features.
1969 Marvin Minsky: 1969. Perceptrons are just linear, AI goeslogical, beginning of AI Winter
1980s Neural Network resurgence: Backpropagation (updatingweights by gradient descent)
1990s SVMs! Kernals can do anything! (no, they cant)
Mark Crowley A to Z of AI/ML Sep 23, 2017 32 / 112
-
Neural Networks History Of Neural Networks
A Short History
1993 LeNet 1 for digit recognition2003 Deep Learning (Convolutional Nets Dropout/RBMs, Deep
Belief Networks)1986, 2006 Restricted Boltzman Machines
2006 Neural Network outperform RBF SVM on MNISThandwriting dataset (Hinton et al.)
2012 AlexNet for ImageNet challenge - this algorithm beatcompetition by error rate of 16% vs 26% for next best
ImageNet : contains 15 million annotated images inover 22,000 categories.ZFNet paper (2013) extends this and has gooddescription of network structure
2012-present Google Cat Youtube, speech recognition, self driving cars,computer defeats regional Go champion, ...
2014 GoogLeNet added many layers and introduced inceptionmodules (allows parallel computation rather than seriallyconnected CNN layers)Mark Crowley A to Z of AI/ML Sep 23, 2017 33 / 112
-
Neural Networks History Of Neural Networks
A Short History
2014 Generative Adversarial Networks (GANs) introduced.
2015 Microsoft algorithm beats human performance at ImageNetchallenge.
2016 AlphaGo defeats one of best world players of Go Lee Sedolusing Deep Reinforcement Learning.
2016 Deep Mind introduces A3C Deep RL algorithm that canlearn to play Atari games from images by playing with noinstructions.
Mark Crowley A to Z of AI/ML Sep 23, 2017 34 / 112
-
Neural Networks History Of Neural Networks
Outline
Introduction
What is AI?
Neural NetworksBuilding Upon Classic Machine LearningHistory Of Neural NetworksImproving Performance
Convolutional Neural Networks
Do you need AI/ML?
Mark Crowley A to Z of AI/ML Sep 23, 2017 58 / 112
-
Neural Networks History Of Neural Networks
Problems with ANNs
Overfitting
Very inneficient for images, timeseries, large numbers ofinputs-outputs
Slow to train
Hard to interpret the resulting model
Overfitting
Mark Crowley A to Z of AI/ML Sep 23, 2017 59 / 112
-
Neural Networks Improving Performance
Heuristics for Improving Backpropagation
There are a number of useful heuristics for training Neural Networks thatare useful in practice (maybe well learn more today):
Less hidden nodes, just enough complexity to work, not too much tooverfit.
Train multiple networks with different sizes and search for the bestdesign.
Validation set: train on training set until error on validation set startsto rise, then evaluate on evaluation set.
Try different activiation functions: tanh, ReLU, ELU, ...?
Dropout (Hinton 2014) - randomly ignore certain units duringtraining, dont update them via gradient descent, leads to hiddenunits that specialize
Modify learning rate over time (cooling schedule)
Mark Crowley A to Z of AI/ML Sep 23, 2017 60 / 112
-
Neural Networks Improving Performance
Dropout
Dropout (Hinton 2014) - randomly ignore certain units duringtraining, dont update them via gradient descent, leads to hiddenunits that specialize.With probability p dont include a weight in the gradient updates.Reduces overfitting by encouraging robustness of weights in thenetwork.
Mark Crowley A to Z of AI/ML Sep 23, 2017 64 / 112
-
Neural Networks Improving Performance
Large, Shallow Models Overfit More
(Goodfellow 2016)
CHAPTER 6. DEEP FEEDFORWARD NETWORKS
0.0 0.2 0.4 0.6 0.8 1.0
Number of parameters
108
91
92
93
94
95
96
97
Test
accuracy
(percent)
3, convolutional
3, fully connected
11, convolutional
Figure 6.7: Deeper models tend to perform better. This is not merely because the model islarger. This experiment from Goodfellow et al. (2014d) shows that increasing the numberof parameters in layers of convolutional networks without increasing their depth is notnearly as effective at increasing test set performance. The legend indicates the depth ofnetwork used to make each curve and whether the curve represents variation in the size ofthe convolutional or the fully connected layers. We observe that shallow models in thiscontext overfit at around 20 million parameters while deep ones can benefit from havingover 60 million. This suggests that using a deep model expresses a useful preference overthe space of functions the model can learn. Specifically, it expresses a belief that thefunction should consist of many simpler functions composed together. This could resulteither in learning a representation that is composed in turn of simpler representations (e.g.,corners defined in terms of edges) or in learning a program with sequentially dependentsteps (e.g., first locate a set of objects, then segment them from each other, then recognizethem).
203
Mark Crowley A to Z of AI/ML Sep 23, 2017 67 / 112
-
Convolutional Neural Networks
Outline
Introduction
What is AI?
Neural Networks
Convolutional Neural NetworksMotivationOther Types of Deep Neural Networks
Do you need AI/ML?
Mark Crowley A to Z of AI/ML Sep 23, 2017 68 / 112
-
Convolutional Neural Networks Motivation
Convolutional Network Structure
input data: image (eg. 256x256 pixels x3 channels RGB)
output : categorical label
Mark Crowley A to Z of AI/ML Sep 23, 2017 70 / 112
-
Convolutional Neural Networks Motivation
Example Applications of CNNs
(Karpathy Blog, Oct, 25, 2015 - http:karpathy.github.io20151025selfie)
Mark Crowley A to Z of AI/ML Sep 23, 2017 71 / 112
-
Convolutional Neural Networks Motivation
Parameter sharingCHAPTER 9. CONVOLUTIONAL NETWORKS
x1x1 x2x2 x3x3
s2s2s1s1 s3s3
x4x4
s4s4
x5x5
s5s5
x1x1 x2x2 x3x3 x4x4 x5x5
s2s2s1s1 s3s3 s4s4 s5s5
Figure 9.5: Parameter sharing: Black arrows indicate the connections that use a particularparameter in two different models. (Top)The black arrows indicate uses of the centralelement of a 3-element kernel in a convolutional model. Due to parameter sharing, thissingle parameter is used at all input locations. (Bottom)The single black arrow indicatesthe use of the central element of the weight matrix in a fully connected model. This modelhas no parameter sharing so the parameter is used only once.
for every location, we learn only one set. This does not affect the runtime offorward propagationit is still O(k n)but it does further reduce the storagerequirements of the model to k parameters. Recall that k is usually several ordersof magnitude less than m. Since m and n are usually roughly the same size, k ispractically insignificant compared to mn. Convolution is thus dramatically moreefficient than dense matrix multiplication in terms of the memory requirementsand statistical efficiency. For a graphical depiction of how parameter sharing works,see figure 9.5.
As an example of both of these first two principles in action, figure 9.6 showshow sparse connectivity and parameter sharing can dramatically improve theefficiency of a linear function for detecting edges in an image.
In the case of convolution, the particular form of parameter sharing causes thelayer to have a property called equivariance to translation. To say a function isequivariant means that if the input changes, the output changes in the same way.Specifically, a function f(x) is equivariant to a function g if f(g(x)) = g(f(x)).In the case of convolution, if we let g be any function that translates the input,i.e., shifts it, then the convolution function is equivariant to g. For example, let Ibe a function giving image brightness at integer coordinates. Let g be a function
338
Convolution shares the same
parameters across all spatial
locations
Traditional matrix
multiplication does not share any parameters
(Goodfellow 2016)
Mark Crowley A to Z of AI/ML Sep 23, 2017 73 / 112
-
Convolutional Neural Networks Motivation
2D Convolution
(Goodfellow 2016)
CHAPTER 9. CONVOLUTIONAL NETWORKS
a b c d
e f g h
i j k l
w x
y z
aw + bx +ey + fzaw + bx +ey + fz
bw + cx +fy + gzbw + cx +fy + gz
cw + dx +gy + hzcw + dx +gy + hz
ew + fx +iy + jzew + fx +iy + jz
fw + gx +jy + kzfw + gx +jy + kz
gw + hx +ky + lzgw + hx +ky + lz
InputKernel / Tensor
Output
Figure 9.1: An example of 2-D convolution without kernel-flipping. In this case we restrictthe output to only positions where the kernel lies entirely within the image, called validconvolution in some contexts. We draw boxes with arrows to indicate how the upper-leftelement of the output tensor is formed by applying the kernel to the correspondingupper-left region of the input tensor.
334
Mark Crowley A to Z of AI/ML Sep 23, 2017 76 / 112
-
Convolutional Neural Networks Motivation
A simple example
Edge detection by convolution with a kernal that subtracts the value fromthe neighbouring pixel on the left for every pixel.
Edge Detection by Convolution
CHAPTER 9. CONVOLUTIONAL NETWORKS
Figure 9.6: Efficiency of edge detection. The image on the right was formed by takingeach pixel in the original image and subtracting the value of its neighboring pixel on theleft. This shows the strength of all of the vertically oriented edges in the input image,which can be a useful operation for object detection. Both images are 280 pixels tall.The input image is 320 pixels wide while the output image is 319 pixels wide. Thistransformation can be described by a convolution kernel containing two elements, andrequires 319 280 3 = 267, 960 floating point operations (two multiplications andone addition per output pixel) to compute using convolution. To describe the sametransformation with a matrix multiplication would take 320 280 319 280, or overeight billion, entries in the matrix, making convolution four billion times more efficient forrepresenting this transformation. The straightforward matrix multiplication algorithmperforms over sixteen billion floating point operations, making convolution roughly 60,000times more efficient computationally. Of course, most of the entries of the matrix would bezero. If we stored only the nonzero entries of the matrix, then both matrix multiplicationand convolution would require the same number of floating point operations to compute.The matrix would still need to contain 2 319 280 = 178, 640 entries. Convolutionis an extremely efficient way of describing transformations that apply the same lineartransformation of a small, local region across the entire input. (Photo credit: PaulaGoodfellow)
340
CHAPTER 9. CONVOLUTIONAL NETWORKS
Figure 9.6: Efficiency of edge detection. The image on the right was formed by takingeach pixel in the original image and subtracting the value of its neighboring pixel on theleft. This shows the strength of all of the vertically oriented edges in the input image,which can be a useful operation for object detection. Both images are 280 pixels tall.The input image is 320 pixels wide while the output image is 319 pixels wide. Thistransformation can be described by a convolution kernel containing two elements, andrequires 319 280 3 = 267, 960 floating point operations (two multiplications andone addition per output pixel) to compute using convolution. To describe the sametransformation with a matrix multiplication would take 320 280 319 280, or overeight billion, entries in the matrix, making convolution four billion times more efficient forrepresenting this transformation. The straightforward matrix multiplication algorithmperforms over sixteen billion floating point operations, making convolution roughly 60,000times more efficient computationally. Of course, most of the entries of the matrix would bezero. If we stored only the nonzero entries of the matrix, then both matrix multiplicationand convolution would require the same number of floating point operations to compute.The matrix would still need to contain 2 319 280 = 178, 640 entries. Convolutionis an extremely efficient way of describing transformations that apply the same lineartransformation of a small, local region across the entire input. (Photo credit: PaulaGoodfellow)
340
-1 -1
Input
KernelOutput
(Goodfellow 2016)
Mark Crowley A to Z of AI/ML Sep 23, 2017 77 / 112
-
Convolutional Neural Networks Motivation
Other CNN Modification
Pooling: Nearby pixels tend to represent the same thing/class/object.So, pool responses from nearby nodes. (eg. mean, median,min, max)
Strides: number of pixels overlap between adjacent filters
Zero padding: removing edge pixels from filter scan, can reduce size ofnetwork and deal with edge effects
Connectivity: Alternate local connectivity options, partial connectivity
Mark Crowley A to Z of AI/ML Sep 23, 2017 90 / 112
-
Convolutional Neural Networks Other Types of Deep Neural Networks
Other Types of Deep Neural Networks
RBM: Restricted Boltzman Machines (RBM) - older directed deepmodel.
RNN: Recurrent Neural Networks (RNN) - allow links from outputsback to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form ofRNN
DeepRL: Deep Reinforcement Learning
GAN: General Adversarial Networks - train two networks at once
Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112
-
Convolutional Neural Networks Other Types of Deep Neural Networks
Other Types of Deep Neural Networks
RBM: Restricted Boltzman Machines (RBM) - older directed deepmodel.
RNN: Recurrent Neural Networks (RNN) - allow links from outputsback to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form ofRNN
integrate strategically remembered particularinformation from the pastformalizes a process for forgetting information overtime.useful if you need to learn patterns over time and yourdata feautres
DeepRL: Deep Reinforcement Learning
GAN: General Adversarial Networks - train two networks at once
Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112
-
Convolutional Neural Networks Other Types of Deep Neural Networks
Other Types of Deep Neural Networks
RBM: Restricted Boltzman Machines (RBM) - older directed deepmodel.
RNN: Recurrent Neural Networks (RNN) - allow links from outputsback to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form ofRNN
DeepRL: Deep Reinforcement Learning
CNNs + Fully Connected Deep Network for learning arepresentation of a policyReinforcement Learning for updating the policy throughexperience to make improved decision decisionsRequires a value/reward function
GAN: General Adversarial Networks - train two networks at once
Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112
-
Convolutional Neural Networks Other Types of Deep Neural Networks
Other Types of Deep Neural Networks
RBM: Restricted Boltzman Machines (RBM) - older directed deepmodel.
RNN: Recurrent Neural Networks (RNN) - allow links from outputsback to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form ofRNN
DeepRL: Deep Reinforcement Learning
GAN: General Adversarial Networks - train two networks at once
Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112
-
Convolutional Neural Networks Other Types of Deep Neural Networks
General Adversarial Networks
One network produces/hallucinates new answers (generative)
Second network distinguishes between the real and the generatedanswers (adversary/critic)
Blog withCode: GANS in 50 lines of code PyTorch code. easy wayto get started
Mark Crowley A to Z of AI/ML Sep 23, 2017 100 / 112
-
Do you need AI/ML?
Outline
Introduction
What is AI?
Neural Networks
Convolutional Neural Networks
Do you need AI/ML?Defining Your QuestionsDesigning Your AI/ML SystemLanguages and LibrariesDeep Learning FrameworksCompute Resources
Mark Crowley A to Z of AI/ML Sep 23, 2017 101 / 112
-
Do you need AI/ML? Defining Your Questions
Defining Your Questions
Is it a decision to be made?
Is there a pattern to detect?
Do you have data?
What kinds of questions do you have about the data?
Yes/No questions - Did X happen? Are A and B correlated?Timing - When did X happen?
Anomaly detection - Is X strange/abnormal/unexpected?Classification - What kind of Y is X?Prediction - Weve seen lots of (X,Y) now we want to know (X,?)Do you have labels?
Can you give the right answer for some portion of the data?Collecting labels: Automatic? Manual? Crowd-sourced? (eg. AmazonMechanical Turk) YYes Supervised Learning - Lots of optionsNo Unsupervised Learning - Some options (getting better all thetime)
Mark Crowley A to Z of AI/ML Sep 23, 2017 102 / 112
-
Do you need AI/ML? Defining Your Questions
Answers and Constraints
What kind of answer do you need? (increasing difficulty)
Find patterns which are present in the data and view them
Most likely explanation for a pattern
Probability of (fact about X,A,B...) being true
A policy for actions to take in the future to maximize benefit
Guarantees that X will (or will not) happen (very hard)
How big is your data?
Is it static?
MB, GB, TB?
Is it streaming?
KB/sec, MB/sec
How many data points/rows/events will there be?
Mark Crowley A to Z of AI/ML Sep 23, 2017 103 / 112
-
Do you need AI/ML? Designing Your AI/ML System
How to Design your AI/ML Question
Define your task:
Prediction, Clustering, Classification,Anomaly Detection?
Define objectives, error metrics,performance standards
Collect Data:
Set up data stream (storage, inputflow, parallelization, Hadoop)
Preprocessing:
Noise/Outlier Filtering
Completing missing data (histograms,interpolation)
Normalization (scaling data)
Mark Crowley A to Z of AI/ML Sep 23, 2017 104 / 112
-
Do you need AI/ML? Designing Your AI/ML System
How to Design your AI/ML Question
Dimensionality Reduction / FeatureSelection:
Choose features to use/extractfrom data
PCA/LDA/LLE/GDA
Choose Algorithm:
Consider goals, questions
Tractability
Experimental Design:
train/validate/test data sets
cross-validation
Run it! :
Deployment
Maintenance
Updating to new advances
Mark Crowley A to Z of AI/ML Sep 23, 2017 105 / 112
-
Do you need AI/ML? Languages and Libraries
Language Choices
Any language can be used for implementing/using AI/ML algorithms, butsome make it much easier
C++: you can do it, may need to implement many things yourself
Java: many of libraries for ML (Weka is a good open source one,Deeplearning4j)
Scala: leaner, functional language that compile to JVM bytecode,good for prototyping, can reuse libraries for Java(Deeplearning4j)
R: focussed on statisical methods, more and more machinelearning libraries implemented for this
Matlab: good for all the calculations, if you have the right librariesits great (not cheap or very portable beyond school)
Python: most commonly used right now for deep learning, weregonna need another slide ...
Mark Crowley A to Z of AI/ML Sep 23, 2017 106 / 112
-
Do you need AI/ML? Languages and Libraries
Python
numpy - numerical libraries, implementation of matrix and linearalgerbra datastructures, graphing tools
pandas - table datastructure, statistical analysis tools (implementsmany useful features from R)
scipy - includes all of the above and more, full installation ofscientific libraries, basically turns Python into R+Matlab
scikit-learn - many standard machine learning algorithms implemented aseasy-to-use Python APIs
jupyter notebooks - these are powerful web-based interfaces to python fordata analysis and machine learning.
Mark Crowley A to Z of AI/ML Sep 23, 2017 107 / 112
-
Do you need AI/ML? Deep Learning Frameworks
Deep Learning Frameworks
Caffe - older, easy to set up mockups, harder to install?
Theano - made out of University of Montreal, great theoretical setup,very flexible, python only
Tensorflow - made by Google, scales to many GPUs, servers, lots ofoptimization, requires planning of the whole systembeforehand, most languages
PyTorch - easier to mock things up, try different designs, not asoptimized for large scale performance as tensorflow
MXNet - made by Microsoft, supports most languages and OSs
Deeplearning4j - Java focussed framework
Keras - open interface to create models in multiple frameworks(tensorflow, theano, MXNet)
Mark Crowley A to Z of AI/ML Sep 23, 2017 108 / 112
-
Do you need AI/ML? Compute Resources
Cloud Services
There are several powerful, free services you can access via a studentaccount which you can request directly.
AWS: Amazon Web Service - very large, has accessible APIs toconnect to, many options for hardware to run on (but thebest ones will cost extra)
Azure: Microsoft - lots of visual tools for composing AI/MLcomponents.
Google Cloud ML Engine: - uses all the latest tools and tensorflow models
None of these provide GPU servers for free, that will costextra. (It will still work, just be slower for deep learning.)
Mark Crowley A to Z of AI/ML Sep 23, 2017 109 / 112
-
Do you need AI/ML? Compute Resources
Summary
Introduction
What is AI?Landscape of Big Data/AI/MLClassification
Neural NetworksBuilding Upon Classic Machine LearningHistory Of Neural NetworksImproving Performance
Convolutional Neural NetworksMotivationOther Types of Deep Neural Networks
Do you need AI/ML?Defining Your QuestionsDesigning Your AI/ML SystemLanguages and LibrariesDeep Learning FrameworksCompute Resources
Mark Crowley A to Z of AI/ML Sep 23, 2017 110 / 112
-
Do you need AI/ML? Compute Resources
Useful Books
A book for of three eras of Machine Learning:
[Goodfellow, 2016]Goodfellow, Bengio and Courville. Deep Learning, MIT Press, 2016.
http://www.deeplearningbook.org/Website has free copy of book as pdfs.
[Murphy, 2012]Kevin Murphy, Machine Learning: A Probabilistic Perspective, MITPress, 2012.
[Duda, Pattern Classification, 2001]R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification (2nded.), John Wiley and Sons, 2001.
Mark Crowley A to Z of AI/ML Sep 23, 2017 111 / 112
http://www.deeplearningbook.org/
-
Do you need AI/ML? Compute Resources
Useful Papers and Blogs
[lecun2015]Y. LeCun, Y. Bengio, G. Hinton, L. Y., B. Y., and H. G., Deeplearning, Nature, vol. 521, no. 7553, pp. 436444, 2015. Greatreferences at back with comments on seminal papers.
[bengio2009]Y. Bengio, Learning Deep Architectures for AI, Foundations andTrends in Machine Learning, vol. 2, no. 1. 2009. An earlier generalreferenceon the fundamentals of Deep Learning.
[krizhevsky2012]A. Krizhevsky, G. E. Hinton, and I. Sutskever,ImageNet Classificationwith Deep Convolutional Neural Networks, Adv. Neural Inf. Process.Syst. pp. 19, 2012. The beginning of the current craze.
[Karpathy, 2015]Andrej Karpathys Blog - http://karpathy.github.io Easy tofollow explanations with code.
[desphande2016]A. Desphande. The 9 Deep Learning Papers You Need To KnowAbout. github.io blog post, 2016.https://adeshpande3.github.io/adeshpande3.github.io/
The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
Mark Crowley A to Z of AI/ML Sep 23, 2017 112 / 112
http://karpathy.github.iohttps://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
IntroductionWhat is AI?Landscape of Big Data/AI/MLClassification
Neural NetworksBuilding Upon Classic Machine LearningHistory Of Neural NetworksImproving Performance
Convolutional Neural NetworksMotivationOther Types of Deep Neural Networks
Do you need AI/ML?Defining Your QuestionsDesigning Your AI/ML SystemLanguages and LibrariesDeep Learning FrameworksCompute Resources