artificial neural networks - brain circuits · •try recurrent neural networks if time steps are...

ARTIFICIAL NEURAL NETWORKSAN EXAMPLE OF MACHINE LEARNING ALGORITHMS

TABLE OF CONTENTS - ANN

• Introduction machine learning

• Basics of mathematics behind ANNs

• Things to take into account

digitaltrends.com

• Introduction machine learning

MACHINE LEARNING

• Automated iterative algorithms to handle huge amounts of data

MACHINE LEARNING

• Finding features and connections

MACHINE LEARNING

• Finding features and connections

• Clustering

• Sorting

• Categorization

• Prediction

BASIC IDEAS OF MACHINE LEARNING

WHAT ARE WE USUALLY DOING?

• Unsupervised learning:

• Set algorithm gets input and find structures on its own

• By iterative update of some values

• building clusters, trees, connections

• Supervised learning:

• Special cases:

• Semi-supervised learning – e.g. Netflix: you may also like

• Reinforcement learning – rewards and punishment feedback e.g. GO-Computer

ARTIFICIAL NEURAL NETWORKS

• Famous example of supervised machine learning

• Popular subcategories:

• Basic ANNs

• Convolutional neural networks

• Recurrent neural networks

• Connect desired input with desired output !

• See if the algorithm can find a good internal representation

PRINCIPLES OF ARTIFICIAL NEURAL NETWORKS

• Training phase Adjustment

• Test phase Evaluation

• Usage Prediction

• Basics of mathematics behind ANNs

PRINCIPLES OF ARTIFICIAL NEURAL NETWORKS

• Choose desired input and output

• Classifiers or Regressors

• The algorithm tries to find its own way

to connect them

• This needs a good and large dataset !

MATHEMATICAL REPRESENTATION OF THIS CONNECTION- TRAINING PHASE

• Initialization of weights

• Feed forward

• Compare difference between calculated

and real output with loss function

• Iteratively update of weights by a distinct

amount (learning rate) via

backpropagation to minimize loss function

• Repeat this process by a distinct amount of

iterations (Epochs)

IMPORTANT VOCABULARY FOR TRAINING PHASE

• Weights – factors to weight the values of different nodes

• All the learning lies in the adjustment of weights !!!

• Activation function – a function applied to the value of a node

• Loss function – e.g. mean squared error, mean absolute percentage error (regressors)

binary crossentropy (classifiers)

• Difference between real output and calculated output

• Should be minimized

• Backpropagation algorithm

• Stochastic gradient descent

• Mechanism to iteratively adjust the weights

AND NOW ADD SOME HIDDEN LAYERS

The Artificial Neural Networks handbook: Part 1

www.datasciencecentral.com

• Things to take into account

WHEN TO USE ARTIFICIAL NEURAL NETWORKS?

• It connects desired input with desired output !

• As long as you have enough training data…

Change.org

Optimum.net

• Always !?

• Classification and regression prediction problems

GARBAGE IN – GARBAGE OUT

• Limitations are quite clear,

if you understood the mathematics

of the feed forward algorithm

GARBAGE IN – GARBAGE OUT

• Limitations are quite clear,

if you understood the mathematics

of the feed forward algorithm

What are these activation functions

doing in our tiny network?

EXAMPLE OF A HOUSE

Bedrooms Area m2 Age Public

Transport lines

Urban Rural Price

3 150 130 2 0 1 200.000

2 100 5 5 1 0 350.000

4 200 60 1 0 1 100.000

1 70 10 1 1 0 ???

Rows: samples (each row is an input vector)

Columns: categories (each in a distinct input node)

EXAMPLE OF A HOUSE

Bedrooms Area m2 Age Public

Transport lines

Urban Rural Price

3 150 130 2 0 1 200.000

2 100 5 5 1 0 350.000

4 200 60 1 0 1 100.000

1 70 10 1 1 0 ???

Dummy Variable Trap

Redundant Information can trap the backprob algorithm with unwanted gradients

For Regression Problems

EXAMPLE OF A HOUSE

Bedrooms Area m2 Age Public Transport lines Urban Price Quickly

sellable?

Days to sell

3 150 130 2 0 200.000 0 274

2 100 5 5 1 350.000 1 12

4 200 60 1 0 100.000 1 23

1 70 10 1 1 300.000 ??? ???

Do you see a problem due to weight

adjustment?

Typical learning rate : 0.001

• Carefully think about your input

• Relative independency of columns/categories within a sample

• Dependency of rows/samples within a category: SCALING

• Provide different types of input: time courses are usually a bad idea

• Try recurrent neural networks if time steps are of equal length

Bedrooms Area m2 Age Public Transport lines Urban Price Quickly sellable? Days to sell

3 150 130 2 0 200.000 0 274

2 100 5 5 1 350.000 1 12

4 200 60 1 0 100.000 1 23

1 70 10 1 1 300.000 ??? ???

OUTPUT

• Carefully think about your output

• It’s harder for the network to predict different outputs at the same time

• Nearly all weights contribute to each output

• Try to avoid mixing output types

• Classifiers are much easier to evaluate

• And often easier to train

Bedrooms Area m2 Age Public Transport lines Urban Price Quickly sellable? Days to sell

3 150 130 2 0 200.000 0 274

2 100 5 5 1 350.000 1 12

4 200 60 1 0 100.000 1 23

1 70 10 1 1 300.000 ??? ???

WHAT’S THE EASIEST WAY TO BUILD AN ANN?

• Using python and the KERAS or PYTORCH library

• 13 – 30 lines of code for the framework and basic pipeline

from importing data till evaluation !!

WHAT’S THE EASIEST WAY TO BUILD AN ANN?

• Using python and the KERAS or PYTORCH library

• 11 – 30 lines of code for the framework and basic pipeline

from importing data till evaluation !!

So where is the problem?

Backpropagation is a very popular neural network learning

algorithm because it is conceptually simple, computationally

efficient, and because it often works. However, getting it to work

well, and sometimes to work at all, can seem more of an art than a

science. Designing and training a network using backprop requires

making many seemingly arbitrary choices such as the number and

types of nodes, layers, learning rates, training and test sets, and so

forth. These choices can be critical, yet there is no foolproof recipe

for deciding them because they are largely problem and data

dependent.

Yann LeCun – Efficient BackProp

HYPERPARAMETERS – OUR PART IN THE PROCESS

• Number of Hidden layers their number of nodes

• Batch size and Epochs (related to frequency of weight updates)

• Learning rate and Optimizer (Stochastic gradient descent)

• Loss functions and activation functions

• Regularization, dropouts, noise layers and other methods to avoid overfitting

• Architecture of the network

HYPERPARAMETERS – OUR PART IN THE PROCESS

• The big problem: ALL PARAMETERS HAVE A TRADEOFF

EXAMPLE OF A MORE COMPLICATED ARCHITECTURERESIDUAL NETWORK LAYERS

• Show output of early layer to much deeper layers

• Refeeding predicted outputs

• Reshow the original input

• Focus region of weight adjustment

due to auxiliary outputs and auxiliary

loss functions

INFORMATION IS ALL – GARBAGE IN - GARBAGE OUT

• Preprocessing and Scaling data is crucial

• Good and large datasets reduce overfitting dramatically

• Every additional type of information can boost your prediction ability dramatically

• Refeeding your initial guesses in the network can help – initial guesses are information

• Auxiliary inputs and outputs can give a good drive your network

• The random initialization of weights can have a bigger impact than parameter adjustments !

• Basics

Labs.bawi.io

CONVOLUTIONAL NETS ARE 3D STRUCTURES

• Image processing – 3D tensor , consisting of Image (2D) splitted into RGB channels

http://cs231n.github.io/convolutional-networks/

CONVOLUTIONAL NETWORK + ANN

medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050

ZERO-PADDING

CONVOLUTION

• Filter size (receptive field)

• Stride

• Padding

medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050

CONVOLUTION

• 3D input needs 3D Kernels

• Here are 2 Kernels/Filters

• number of Filters choosable

• number of features

CONVOLUTION

• 3D input needs 3D Kernels

• Here are 2 Kernels/Filters

• number of Filters choosable

• number of features

• Stride ?

• Output volume?

APPLYING AN ACTIVATION FUNCTION (E.G. RELU)

DIMENSION OF THE FILTERS/KERNELS

Researchgate.net

LET’S CHECK IF WE HAVE UNDERSTOOD THE CONVOLUTION

• Stride

• # Filters

• Filter size

• Padding

jeremyjordan.me/convnet-architectures/

RECEPTIVE FIELD REDUCES NUMBER OF CONNECTIONS

• 5x5x3 Filter has 75 weights (+1 bias parameter)

• Multiplied by number of pixels in the CONV output layer

• 1) 5x5x3 Filter has 75 weights (+1 bias parameter)

• 2) Multiplied by number of pixels in the CONV output layer

• 3) Multiplied by number of filters

• Parameter sharing

kicks out 2)

POOLING

Size choosable – 2x2 reduces size by 75% without significant loss of information

Speeds up processing

LET’S CHECK IF WE HAVE UNDERSTOOD EVERYTHING

• Stride

• # Filters

• Filter size

• Padding

• Pooling size

• What’s the

blue stuff ?

jeremyjordan.me/convnet-architectures/

LET’S CHECK IF WE HAVE UNDERSTOOD EVERYTHING

https://medium.com/machine-learning-bites/deeplearning-series-convolutional-neural-networks-a9c2f2ee1524

artificial neural networks - brain circuits · •try recurrent neural networks if time steps are...

Documents

sellable house services

neural networks part ii feed-forward neural networks

neural networks: backpropagation -...

neural network part 1: multiple layer neural...

neural networks chapter 8. 8.1 feed-forward neural networks

artificial neural networks. introduction to neural networks

artificial neural networks -...

artificial neural networks lect7: neural networks based on...

lecture 10: neural networks and deep...

nyft minimum sellable product

neural networks · neural networks neural networks by...

neural networks: old and new · arti cial neural networks...

introduction to neural networks - databricks · •...

introduction to deep neural networks - deep learning ·...

neural networks,cellular neural networks and adaptive fuzzy...

a study of neural networks and multiple neural networks …

neural networks

artificial neural networks for robot control neural networks...

quantized neural networks: training neural networks with...

neural networks neural networks based on competition chapter...