a to z of ai/ml: a quick introduction to artificial ... · a quick introduction to arti cial...

A to Z of AI/ML:A Quick Introduction to Artificial Intelligence and

Machine Learning Capabilities and ToolsEngCon 2017

Mark CrowleyAssistant Professor

Electrical and Computer EngineeringUniversity of [email protected]

Sep 23, 2017

Mark Crowley A to Z of AI/ML Sep 23, 2017 1 / 112

Introduction

Outline

Introduction

What is AI?

Neural Networks

Convolutional Neural Networks

Do you need AI/ML?


Introduction

My Background

Waterloo : Assistant Professor, ECE Department since 2015

PhD at UBC in Computer Science with Prof. David Poole

Postdoc at Oregon State University

UW ECE ML Lab:https://uwaterloo.ca/scholar/mcrowley/lab

Waterloo Institute for Complexity and Innovation (WICI)

Research Fellow at ElementAI

Pattern Analysis and Machine Intelligence (PAMI)

http:\waterloo.ai

List of facultyResearch projects (co-op/internships)List of spinoff companies from UWaterloo (good place for project ideas)


https://uwaterloo.ca/scholar/mcrowley/labhttp:\waterloo.ai

What is AI?

What do you think of when you hear?

Artificial Intelligence Machine Learning


What is AI? Landscape of Big Data/AI/ML

Data, Big Data, Machine Learning, AI, etc, etc,

Machine Learning

Artificial Intelligence

Big Data Tools

Human Decision Making

Automated Decision MakingData

Data Analysis

X f1 f2 fkx1 .5 104.2 China

x2 .34 92.0 USA

xn .2 85.2 Canada

Reports, statistics,Charts, trends

Policies, Decision Rules,Summaries

Classification,Patterns, PredictionsProbabilities



Data, Big Data, Machine Learning, AI, etc, etc,

Machine Learning

Artificial Intelligence

Big Data Tools

Human Decision Making

Automated Decision Making

DataData Analysis Classification,

Patterns, PredictionsProbabilities

X f1 f2 fkx1 .5 104.2 China

x2 .34 92.0 USA

xn .2 85.2 Canada

Reports, statistics,Charts, trends

Policies, Decision Rules,Summaries

Multi-agent

systems

CNN DQN

Game Theory

Deep Learning

ReinforcementLearning

RNN

LSTM

ILP

Probabilistic Programming

ConstraintProgramming

SAT

SMP

A3C

Nat Lang Processing

RoboticsVision


features

Dat

a po

ints


Major Types/Areas of AI

Artificial Intellgience: some algorithm to enable computers to performactions we define as requireing intelligence.

This is a moving target.

Search Based Heuristic Optimization (A*)

Evolutionary computation (genetic algorithms)

Logic Programming (inductive logic programming, fuzzy logic)

Probabilistic Reasoning Under Uncertainty (bayesian networks)

Computer Vision

Natural Language Processing

Robotics

Machine Learning



Major Types/Areas of AI

Artificial Intellgience: some algorithm to enable computers to performactions we define as requireing intelligence. This is a moving target.

Search Based Heuristic Optimization (A*)

Evolutionary computation (genetic algorithms)

Logic Programming (inductive logic programming, fuzzy logic)

Probabilistic Reasoning Under Uncertainty (bayesian networks)

Computer Vision

Natural Language Processing

Robotics

Machine Learning



Types of Machines Learning

Machine Learning: Detect patterns in data, use the uncovered patternsto predict future data or other outcomes of interest Kevin Murphy,Google Research.



Deep Learning

Deep Learning: methods which perform machine learning through the useof multilayer neural networks of some kind. Deep Learning can be appliedin any of the three main types of ML:

Supervised Learning : very common, enourmous improvement inrecent years

Unsupervised Learning : just beginning, lots of potential

Reinforement Learning : recent (past 3 years) this has exploded,exspecially for video games



Increasing Complexity of Supervised ML Methods

1 mean, mode, max, min - basic statistics and patterns2 prediction/regression - least squares, ridge regression3 linear classification - use distances and separation of data points.

(logistic regression, SVM, KNN)4 Kernel Based Classification - define a mapping from original data to a

new space, allow nonlinear divisions to be found5 Decision trees - learn rules that divide data arbitrarily (C4.5, Random

Forests, AdaBoost)6 Neural Networks - learn function using neurons7 Deep Neural Networks - same, but deep :)8 Recurrent Neural Networks - adding links to past timesteps, learning

with memory of the past9 Convolutional Neural Networks - adding convolutional filters, good for

images10 Inception Resnets, Long-Term Short-Term Networks, Voxception

Networks, .... oh it keeps going...Mark Crowley A to Z of AI/ML Sep 23, 2017 15 / 112

What is AI? Classification

One Example of ML: Classification



Clustering vs. Classification

Clustering

Unsupervised

Uses unlabeled data

Organize patterns w.r.t. anoptimization criteria

Requires a definition of similarity

Hard to evaluate

Examples: K-means, FuzzyC-means, HierarchicalClustering, DBScan

Classification

Supervised

Uses labeled data

Requires training phase

Domain sensitive

Easy to evaluate (you know thecorrect answer)

Examples: Naive Bayes, KNN,SVM, Decision Trees, RandomForests



Classification Performance Depends on the Algorithm

A good example of this choices is Support Vector Machines (SVMs).

popular until dawn of deep learning in past few years

core idea: find a dividing hyperplane

many variations: plane can be linear, polynomial, gaussian,high-dimensional

So what is the right approach? Experimentation!








So what is the right approach?

Experimentation!








So what is the right approach? Experimentation!




So choose carefully...See http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html


http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.htmlhttp://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

Neural Networks Building Upon Classic Machine Learning

Outline

Introduction

What is AI?

Neural NetworksBuilding Upon Classic Machine LearningHistory Of Neural NetworksImproving Performance


Do you need AI/ML?



Linear Regression vs. Logistic Regression

A simple type of Generalized Linear Model

Linear regression learns a function to predict a continuous variableoutput of continous or discrete input variables

Y = b0 +

(biXi ) +

Logistic regression predicts the probability of an outcome, theappropriate class for an input vector or the odds of one outcomebeing more likely than another.



Logistic Regression as a Graphical Model

o(x) = (wT xi ) = (w0 +i

wixi ) =1

1 + exp((w0 +

i wixi ))



Logistic Regression Used as a Classifier

Logistic Regression can be used as a simple linear classifier.

Compare probabilities of each class P(Y = 0|X ) and P(Y = 1|X ).Treat the halfway point on the sigmoid as the decision boundary.

P(Y = 1|X ) > 0.5 classify X in class 1w0 +

i

wixi = 0



Training Logistic Regression Model via Gradient Descent

Cant easily perform Maximum Likelihood Estimation

The negative log-likehood of the logistic function is given by NLL andits gradient by g

NLL(w) =Ni=1

log

(1 + exp((w0 +

i

wixi ))

)

g =

w=i

((wT xi ) yi )xi

Then we can update the parameters iteratively

k+1 = k kgk

where k is the learning rate or step size.Mark Crowley A to Z of AI/ML Sep 23, 2017 27 / 112

Neural Networks History Of Neural Networks

Neural Networks to learn f : X Yf can be a non-linear function

X (vector of) continuous and/or discrete variablesY (vector of) continuous and/or discrete variables

Feedforward Neural networks - Represent f by network of non-linear(logistic/sigmoid/ReLU) units:

Input layer, X

Output layer, Y

Hidden layer, H

Nonlinear UnitSigmoid/ReLU/ELU



Basic Three Layer Neural Network

Input Layer

vector data, each input collects one feature/dimension of the dataand passes it on to the (first) hidden layer.

Hidden Layer

Each hidden unit computes a weighted sum of all the units from theinput layer (or any previous layer) and passes it through a nonlinearactivation function.

Output Layer

Each output unit computes a weighted sum of all the hidden unitsand passes it through a (possibly nonlinear) threshold function.



Properties of Neural Networks

Universality: Given a large enough layer of hidden units (or multiplelayers) a NN can represent any function.

Representation Learning: classic statistical machine learning is aboutlearning functions to map input data to output. But Neural Networks,and especially Deep Learning, are more about learning arepresentation in order to perform classification or some other task.



Hidden Layer: Adding Nonlinearity

Each hidden unit emits an output that is a nonlinear activationfunction of its net activiation.

yj = f (netj)

This is essential to neural networks power, if its linear then it allbecomes just linear regression.

The output is thus thresholded through this nonlinear activationfunction.



Activation Functions

tanh was another common function.

sigmoid is now discourage except for final layer to obtainprobabilities. Can over-saturate easily.

ReLU is the new standard activation function to use.



Rectified Linear Activation

CHAPTER 6. DEEP FEEDFORWARD NETWORKS

0

z

0

g(z

)=

max{0

,z}

Figure 6.3: The rectified linear activation function. This activation function is the defaultactivation function recommended for use with most feedforward neural networks. Applyingthis function to the output of a linear transformation yields a nonlinear transformation.However, the function remains very close to linear, in the sense that is a piecewise linearfunction with two linear pieces. Because rectified linear units are nearly linear, theypreserve many of the properties that make linear models easy to optimize with gradient-based methods. They also preserve many of the properties that make linear modelsgeneralize well. A common principle throughout computer science is that we can buildcomplicated systems from minimal components. Much as a Turing machines memoryneeds only to be able to store 0 or 1 states, we can build a universal function approximatorfrom rectified linear functions.

175

(Goodfellow 2016)

Rectified Linear Units (ReLU) have become standard max(0, netj)strong signals are alwasy easy to distinguishmost values are zero, deritive is mostly zerothey do not saturate as easily as sigmoid

new Exponential linear units - evidence that they perform better thanReLU in some situations.



Gradient Descent

d

E Error function, MSE, cross-entropy loss

For Neural Networks,

E[w] no longer convex in w

Error Function: Mean Squared Error, cross-entropy loss, etc.

Gradient: 5E [w] =[

Ew0

, Ew1 , . . . ,Ewd

]Training Update Rule: wi = Ewi where is the training rate.Note: For regression, others, this gradient is convex. In ANNs it is not. Sowe must solve iteratively

(Slides from Tom Mitchell ML Course, CMU, 2010)



Incremental Gradient Descent

Let error function be : El [w] =12 (y

l o l)2Do until satisfied:

For each training example l in D1 Compute the gradient 5E [w]2 update weights : w = w 5 E [w]

Note: can also use batch gradient descent on many points at once.



Backpropagation Algorithm

We need an iterative algorithm for getting the gradient efficiently.For each training example:

1 Forward propagation: Input the training example to the network andcompute outputs

2 Compute output units errors:

lk = olk(1 o lk)(y lk o lk)

3 Compute hidden units errors:

lh = olh(1 o lh)

k

wh,klk

4 Update network weights:

wi ,j = wi ,j + wli ,j = wi ,j +

ljo

li



A Short History

40s Early work in NN goes back to the 40s with a simple modelof the neuron by McCulloh and Pitt as a summing andthresholding devices.

1958 Rosenblatt in 1958 introduced the Perceptron,a two layernetwork (one input layer and one output node with a bias inaddition to the input features.

1969 Marvin Minsky: 1969. Perceptrons are just linear, AI goeslogical, beginning of AI Winter

1980s Neural Network resurgence: Backpropagation (updatingweights by gradient descent)

1990s SVMs! Kernals can do anything! (no, they cant)



A Short History

1993 LeNet 1 for digit recognition2003 Deep Learning (Convolutional Nets Dropout/RBMs, Deep

Belief Networks)1986, 2006 Restricted Boltzman Machines

2006 Neural Network outperform RBF SVM on MNISThandwriting dataset (Hinton et al.)

2012 AlexNet for ImageNet challenge - this algorithm beatcompetition by error rate of 16% vs 26% for next best

ImageNet : contains 15 million annotated images inover 22,000 categories.ZFNet paper (2013) extends this and has gooddescription of network structure

2012-present Google Cat Youtube, speech recognition, self driving cars,computer defeats regional Go champion, ...

2014 GoogLeNet added many layers and introduced inceptionmodules (allows parallel computation rather than seriallyconnected CNN layers)Mark Crowley A to Z of AI/ML Sep 23, 2017 33 / 112


A Short History

2014 Generative Adversarial Networks (GANs) introduced.

2015 Microsoft algorithm beats human performance at ImageNetchallenge.

2016 AlphaGo defeats one of best world players of Go Lee Sedolusing Deep Reinforcement Learning.

2016 Deep Mind introduces A3C Deep RL algorithm that canlearn to play Atari games from images by playing with noinstructions.



Outline

Introduction

What is AI?



Do you need AI/ML?



Problems with ANNs

Overfitting

Very inneficient for images, timeseries, large numbers ofinputs-outputs

Slow to train

Hard to interpret the resulting model

Overfitting


Neural Networks Improving Performance

Heuristics for Improving Backpropagation

There are a number of useful heuristics for training Neural Networks thatare useful in practice (maybe well learn more today):

Less hidden nodes, just enough complexity to work, not too much tooverfit.

Train multiple networks with different sizes and search for the bestdesign.

Validation set: train on training set until error on validation set startsto rise, then evaluate on evaluation set.

Try different activiation functions: tanh, ReLU, ELU, ...?

Dropout (Hinton 2014) - randomly ignore certain units duringtraining, dont update them via gradient descent, leads to hiddenunits that specialize

Modify learning rate over time (cooling schedule)



Dropout

Dropout (Hinton 2014) - randomly ignore certain units duringtraining, dont update them via gradient descent, leads to hiddenunits that specialize.With probability p dont include a weight in the gradient updates.Reduces overfitting by encouraging robustness of weights in thenetwork.



Large, Shallow Models Overfit More

(Goodfellow 2016)

CHAPTER 6. DEEP FEEDFORWARD NETWORKS

0.0 0.2 0.4 0.6 0.8 1.0

Number of parameters

108

91

92

93

94

95

96

97

Test

accuracy

(percent)

3, convolutional

3, fully connected

11, convolutional

Figure 6.7: Deeper models tend to perform better. This is not merely because the model islarger. This experiment from Goodfellow et al. (2014d) shows that increasing the numberof parameters in layers of convolutional networks without increasing their depth is notnearly as effective at increasing test set performance. The legend indicates the depth ofnetwork used to make each curve and whether the curve represents variation in the size ofthe convolutional or the fully connected layers. We observe that shallow models in thiscontext overfit at around 20 million parameters while deep ones can benefit from havingover 60 million. This suggests that using a deep model expresses a useful preference overthe space of functions the model can learn. Specifically, it expresses a belief that thefunction should consist of many simpler functions composed together. This could resulteither in learning a representation that is composed in turn of simpler representations (e.g.,corners defined in terms of edges) or in learning a program with sequentially dependentsteps (e.g., first locate a set of objects, then segment them from each other, then recognizethem).

203



Outline

Introduction

What is AI?

Neural Networks

Convolutional Neural NetworksMotivationOther Types of Deep Neural Networks

Do you need AI/ML?


Convolutional Neural Networks Motivation

Convolutional Network Structure

input data: image (eg. 256x256 pixels x3 channels RGB)

output : categorical label



Example Applications of CNNs

(Karpathy Blog, Oct, 25, 2015 - http:karpathy.github.io20151025selfie)



Parameter sharingCHAPTER 9. CONVOLUTIONAL NETWORKS

x1x1 x2x2 x3x3

s2s2s1s1 s3s3

x4x4

s4s4

x5x5

s5s5

x1x1 x2x2 x3x3 x4x4 x5x5

s2s2s1s1 s3s3 s4s4 s5s5

Figure 9.5: Parameter sharing: Black arrows indicate the connections that use a particularparameter in two different models. (Top)The black arrows indicate uses of the centralelement of a 3-element kernel in a convolutional model. Due to parameter sharing, thissingle parameter is used at all input locations. (Bottom)The single black arrow indicatesthe use of the central element of the weight matrix in a fully connected model. This modelhas no parameter sharing so the parameter is used only once.

for every location, we learn only one set. This does not affect the runtime offorward propagationit is still O(k n)but it does further reduce the storagerequirements of the model to k parameters. Recall that k is usually several ordersof magnitude less than m. Since m and n are usually roughly the same size, k ispractically insignificant compared to mn. Convolution is thus dramatically moreefficient than dense matrix multiplication in terms of the memory requirementsand statistical efficiency. For a graphical depiction of how parameter sharing works,see figure 9.5.

As an example of both of these first two principles in action, figure 9.6 showshow sparse connectivity and parameter sharing can dramatically improve theefficiency of a linear function for detecting edges in an image.

In the case of convolution, the particular form of parameter sharing causes thelayer to have a property called equivariance to translation. To say a function isequivariant means that if the input changes, the output changes in the same way.Specifically, a function f(x) is equivariant to a function g if f(g(x)) = g(f(x)).In the case of convolution, if we let g be any function that translates the input,i.e., shifts it, then the convolution function is equivariant to g. For example, let Ibe a function giving image brightness at integer coordinates. Let g be a function

338

Convolution shares the same

parameters across all spatial

locations

Traditional matrix

multiplication does not share any parameters

(Goodfellow 2016)



2D Convolution

(Goodfellow 2016)

CHAPTER 9. CONVOLUTIONAL NETWORKS

a b c d

e f g h

i j k l

w x

y z

aw + bx +ey + fzaw + bx +ey + fz

bw + cx +fy + gzbw + cx +fy + gz

cw + dx +gy + hzcw + dx +gy + hz

ew + fx +iy + jzew + fx +iy + jz

fw + gx +jy + kzfw + gx +jy + kz

gw + hx +ky + lzgw + hx +ky + lz

InputKernel / Tensor

Output

Figure 9.1: An example of 2-D convolution without kernel-flipping. In this case we restrictthe output to only positions where the kernel lies entirely within the image, called validconvolution in some contexts. We draw boxes with arrows to indicate how the upper-leftelement of the output tensor is formed by applying the kernel to the correspondingupper-left region of the input tensor.

334



A simple example

Edge detection by convolution with a kernal that subtracts the value fromthe neighbouring pixel on the left for every pixel.

Edge Detection by Convolution


Figure 9.6: Efficiency of edge detection. The image on the right was formed by takingeach pixel in the original image and subtracting the value of its neighboring pixel on theleft. This shows the strength of all of the vertically oriented edges in the input image,which can be a useful operation for object detection. Both images are 280 pixels tall.The input image is 320 pixels wide while the output image is 319 pixels wide. Thistransformation can be described by a convolution kernel containing two elements, andrequires 319 280 3 = 267, 960 floating point operations (two multiplications andone addition per output pixel) to compute using convolution. To describe the sametransformation with a matrix multiplication would take 320 280 319 280, or overeight billion, entries in the matrix, making convolution four billion times more efficient forrepresenting this transformation. The straightforward matrix multiplication algorithmperforms over sixteen billion floating point operations, making convolution roughly 60,000times more efficient computationally. Of course, most of the entries of the matrix would bezero. If we stored only the nonzero entries of the matrix, then both matrix multiplicationand convolution would require the same number of floating point operations to compute.The matrix would still need to contain 2 319 280 = 178, 640 entries. Convolutionis an extremely efficient way of describing transformations that apply the same lineartransformation of a small, local region across the entire input. (Photo credit: PaulaGoodfellow)

340


Figure 9.6: Efficiency of edge detection. The image on the right was formed by takingeach pixel in the original image and subtracting the value of its neighboring pixel on theleft. This shows the strength of all of the vertically oriented edges in the input image,which can be a useful operation for object detection. Both images are 280 pixels tall.The input image is 320 pixels wide while the output image is 319 pixels wide. Thistransformation can be described by a convolution kernel containing two elements, andrequires 319 280 3 = 267, 960 floating point operations (two multiplications andone addition per output pixel) to compute using convolution. To describe the sametransformation with a matrix multiplication would take 320 280 319 280, or overeight billion, entries in the matrix, making convolution four billion times more efficient forrepresenting this transformation. The straightforward matrix multiplication algorithmperforms over sixteen billion floating point operations, making convolution roughly 60,000times more efficient computationally. Of course, most of the entries of the matrix would bezero. If we stored only the nonzero entries of the matrix, then both matrix multiplicationand convolution would require the same number of floating point operations to compute.The matrix would still need to contain 2 319 280 = 178, 640 entries. Convolutionis an extremely efficient way of describing transformations that apply the same lineartransformation of a small, local region across the entire input. (Photo credit: PaulaGoodfellow)

340

-1 -1

Input

KernelOutput

(Goodfellow 2016)



Other CNN Modification

Pooling: Nearby pixels tend to represent the same thing/class/object.So, pool responses from nearby nodes. (eg. mean, median,min, max)

Strides: number of pixels overlap between adjacent filters

Zero padding: removing edge pixels from filter scan, can reduce size ofnetwork and deal with edge effects

Connectivity: Alternate local connectivity options, partial connectivity


Convolutional Neural Networks Other Types of Deep Neural Networks

Other Types of Deep Neural Networks

RBM: Restricted Boltzman Machines (RBM) - older directed deepmodel.

RNN: Recurrent Neural Networks (RNN) - allow links from outputsback to inputs, over time, good for time series learning

LSTM: Long-Term Short-Term networks - more complex form ofRNN

DeepRL: Deep Reinforcement Learning

GAN: General Adversarial Networks - train two networks at once







integrate strategically remembered particularinformation from the pastformalizes a process for forgetting information overtime.useful if you need to learn patterns over time and yourdata feautres










CNNs + Fully Connected Deep Network for learning arepresentation of a policyReinforcement Learning for updating the policy throughexperience to make improved decision decisionsRequires a value/reward function




General Adversarial Networks

One network produces/hallucinates new answers (generative)

Second network distinguishes between the real and the generatedanswers (adversary/critic)

Blog withCode: GANS in 50 lines of code PyTorch code. easy wayto get started


Do you need AI/ML?

Outline

Introduction

What is AI?

Neural Networks


Do you need AI/ML?Defining Your QuestionsDesigning Your AI/ML SystemLanguages and LibrariesDeep Learning FrameworksCompute Resources


Do you need AI/ML? Defining Your Questions

Defining Your Questions

Is it a decision to be made?

Is there a pattern to detect?

Do you have data?

What kinds of questions do you have about the data?

Yes/No questions - Did X happen? Are A and B correlated?Timing - When did X happen?

Anomaly detection - Is X strange/abnormal/unexpected?Classification - What kind of Y is X?Prediction - Weve seen lots of (X,Y) now we want to know (X,?)Do you have labels?

Can you give the right answer for some portion of the data?Collecting labels: Automatic? Manual? Crowd-sourced? (eg. AmazonMechanical Turk) YYes Supervised Learning - Lots of optionsNo Unsupervised Learning - Some options (getting better all thetime)


Do you need AI/ML? Defining Your Questions

Answers and Constraints

What kind of answer do you need? (increasing difficulty)

Find patterns which are present in the data and view them

Most likely explanation for a pattern

Probability of (fact about X,A,B...) being true

A policy for actions to take in the future to maximize benefit

Guarantees that X will (or will not) happen (very hard)

How big is your data?

Is it static?

MB, GB, TB?

Is it streaming?

KB/sec, MB/sec

How many data points/rows/events will there be?


Do you need AI/ML? Designing Your AI/ML System

How to Design your AI/ML Question

Define your task:

Prediction, Clustering, Classification,Anomaly Detection?

Define objectives, error metrics,performance standards

Collect Data:

Set up data stream (storage, inputflow, parallelization, Hadoop)

Preprocessing:

Noise/Outlier Filtering

Completing missing data (histograms,interpolation)

Normalization (scaling data)


Do you need AI/ML? Designing Your AI/ML System

How to Design your AI/ML Question

Dimensionality Reduction / FeatureSelection:

Choose features to use/extractfrom data

PCA/LDA/LLE/GDA

Choose Algorithm:

Consider goals, questions

Tractability

Experimental Design:

train/validate/test data sets

cross-validation

Run it! :

Deployment

Maintenance

Updating to new advances


Do you need AI/ML? Languages and Libraries

Language Choices

Any language can be used for implementing/using AI/ML algorithms, butsome make it much easier

C++: you can do it, may need to implement many things yourself

Java: many of libraries for ML (Weka is a good open source one,Deeplearning4j)

Scala: leaner, functional language that compile to JVM bytecode,good for prototyping, can reuse libraries for Java(Deeplearning4j)

R: focussed on statisical methods, more and more machinelearning libraries implemented for this

Matlab: good for all the calculations, if you have the right librariesits great (not cheap or very portable beyond school)

Python: most commonly used right now for deep learning, weregonna need another slide ...


Do you need AI/ML? Languages and Libraries

Python

numpy - numerical libraries, implementation of matrix and linearalgerbra datastructures, graphing tools

pandas - table datastructure, statistical analysis tools (implementsmany useful features from R)

scipy - includes all of the above and more, full installation ofscientific libraries, basically turns Python into R+Matlab

scikit-learn - many standard machine learning algorithms implemented aseasy-to-use Python APIs

jupyter notebooks - these are powerful web-based interfaces to python fordata analysis and machine learning.


Do you need AI/ML? Deep Learning Frameworks

Deep Learning Frameworks

Caffe - older, easy to set up mockups, harder to install?

Theano - made out of University of Montreal, great theoretical setup,very flexible, python only

Tensorflow - made by Google, scales to many GPUs, servers, lots ofoptimization, requires planning of the whole systembeforehand, most languages

PyTorch - easier to mock things up, try different designs, not asoptimized for large scale performance as tensorflow

MXNet - made by Microsoft, supports most languages and OSs

Deeplearning4j - Java focussed framework

Keras - open interface to create models in multiple frameworks(tensorflow, theano, MXNet)


Do you need AI/ML? Compute Resources

Cloud Services

There are several powerful, free services you can access via a studentaccount which you can request directly.

AWS: Amazon Web Service - very large, has accessible APIs toconnect to, many options for hardware to run on (but thebest ones will cost extra)

Azure: Microsoft - lots of visual tools for composing AI/MLcomponents.

Google Cloud ML Engine: - uses all the latest tools and tensorflow models

None of these provide GPU servers for free, that will costextra. (It will still work, just be slower for deep learning.)



Summary

Introduction

What is AI?Landscape of Big Data/AI/MLClassification






Useful Books

A book for of three eras of Machine Learning:

[Goodfellow, 2016]Goodfellow, Bengio and Courville. Deep Learning, MIT Press, 2016.

http://www.deeplearningbook.org/Website has free copy of book as pdfs.

[Murphy, 2012]Kevin Murphy, Machine Learning: A Probabilistic Perspective, MITPress, 2012.

[Duda, Pattern Classification, 2001]R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification (2nded.), John Wiley and Sons, 2001.


http://www.deeplearningbook.org/


Useful Papers and Blogs

[lecun2015]Y. LeCun, Y. Bengio, G. Hinton, L. Y., B. Y., and H. G., Deeplearning, Nature, vol. 521, no. 7553, pp. 436444, 2015. Greatreferences at back with comments on seminal papers.

[bengio2009]Y. Bengio, Learning Deep Architectures for AI, Foundations andTrends in Machine Learning, vol. 2, no. 1. 2009. An earlier generalreferenceon the fundamentals of Deep Learning.

[krizhevsky2012]A. Krizhevsky, G. E. Hinton, and I. Sutskever,ImageNet Classificationwith Deep Convolutional Neural Networks, Adv. Neural Inf. Process.Syst. pp. 19, 2012. The beginning of the current craze.

[Karpathy, 2015]Andrej Karpathys Blog - http://karpathy.github.io Easy tofollow explanations with code.

[desphande2016]A. Desphande. The 9 Deep Learning Papers You Need To KnowAbout. github.io blog post, 2016.https://adeshpande3.github.io/adeshpande3.github.io/

The-9-Deep-Learning-Papers-You-Need-To-Know-About.html


http://karpathy.github.iohttps://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html

IntroductionWhat is AI?Landscape of Big Data/AI/MLClassification




a to z of ai/ml: a quick introduction to artificial ... · a quick introduction to arti cial...

Documents