(artificial) neural networks: from perceptron to mlp · 2020-06-29 · artificial neural networks...

(Artificial) Neural Networks:From Perceptron to MLP

Industrial AI Lab.

Prof. Seungchul Lee

Perceptron: Example

2

Perceptron: Example

3

Perceptron: Example

4

Perceptron: Forward Propagation

5

From Perceptron to MLP

6

XOR Problem

• Minsky-Papert Controversy on XOR– Not linearly separable

– Limitation of perceptron

• Single neuron = one linear classification boundary

7

Artificial Neural Networks: MLP

• Multi-layer Perceptron (MLP) = Artificial Neural Networks (ANN)– Multi neurons = multiple linear classification boundaries

8

Artificial Neural Networks: Activation Function

• Differentiable nonlinear activation function

9

Artificial Neural Networks

• In a compact representation

10

Artificial Neural Networks

• A single layer is not enough to be able to represent complex relationship between input and output

⟹ perceptron with many layers and units

• Multi-layer perceptron– Features of features

– Mapping of mappings

11

Another Perspective:ANN as Kernel Learning

12

Neuron

• We can represent this “neuron” as follows:

13

XOR Problem

• The main weakness of linear predictors is their lack of capacity.

• For classification, the populations have to be linearly separable.

14

Nonlinear Mapping

• The XOR example can be solved by pre-processing the data to make the two populations linearly separable.

Source: Dr. Francois Fleuret at EPFL 15

Nonlinear Mapping



Kernel

• Often we want to capture nonlinear patterns in the data

– nonlinear regression: input and output relationship may not be linear

– nonlinear classification: classes may note be separable by a linear boundary

• Linear models (e.g. linear regression, linear SVM) are not just rich enough

– by mapping data to higher dimensions where it exhibits linear patterns

– apply the linear model in the new input feature space

– mapping = changing the feature representation

• Kernels: make linear model work in nonlinear settings

18

Kernel + Neuron

• Nonlinear mapping + neuron

𝜙: Kernel Perceptron

19

Neuron + Neuron

• Nonlinear mapping can be represented by another neurons

• Nonlinear Kernel – Nonlinear activation functions

𝜙: Kernel

20

Multi Layer Perceptron

• Nonlinear mapping can be represented by another neurons

• We can generalize an MLP

𝜙: Kernel

21

Summary

• Universal function approximator

• Universal function classifier

• Parameterized

22

Deep Artificial Neural Networks

• Complex/Nonlinear universal function approximator

– Linearly connected networks

– Simple nonlinear neurons

Feature learning Classification

Class 1

Class 2

nonlinearlinear

…

OutputInput

23

Deep Artificial Neural Networks

• Complex/Nonlinear universal function approximator

– Linearly connected networks

– Simple nonlinear neurons

Class 1

Class 2

…… … ………

nonlinearlinear

Feature learning Classification

OutputInput

24

Looking at Parameters

25

Logistic Regression in a Form of Neural Network

26

Logistic Regression in a Form of Neural Network

• Neural network convention

Do not indicate bias units

27

Nonlinearly Distributed Data

Do not include bias units

• Example to understand network’s behavior– Include a hidden layer

28

Multi Layers


• 𝑧 space

29

Multi Layers


• 𝑥 space

30

(Artificial) Neural Networks:Training

Industrial AI

Prof. Seungchul Lee

Training Neural Networks: Loss Function

• Measures error between target values and predictions

• Example– Squared loss (for regression):

– Cross entropy (for classification):

32

Gradients in ANN

• Learning weights and biases from data using gradient descent

•𝜕ℓ

𝜕𝜔: too many computations are required for all 𝜔

• Structural constraint of NN: – Composition of functions

– Chain rule

– Dynamic programming

33

Training Neural Networks with TensorFlow

• Optimization procedure

• It is not easy to numerically compute gradients in network in general.– The good news: people have already done all the "hard work" of developing numerical solvers (or

libraries)

– There are a wide range of tools → We will use the TensorFlow

34

ANN in TensorFlow:MNIST

35

MNIST database

• Mixed National Institute of Standards and Technology database

• Handwritten digit database

• 28 × 28 gray scaled image

• Flattened matrix into a vector of 28 × 28 = 784

36

Our Network Model

Input layer(784)

hidden layer(100)

output layer(10)

Input image(28 X 28)

flatteneddigit prediction

in one-hot-encoding

37

Machine Learning and Deep Learning

Prof. Seungchul Lee

Industrial AI Lab.

38

Supervised Learning

…

Training Data

Chair

Dest

Generalization

39

Machine Learning

• Image of digit 0

• Image of digit 1

Image

40

이사?

간다

온다

“좋은 특성인자 선정이 중요하다.”

전문가 지식

Machine Learning Deep Learning

Data

Features

Decision

Data

Decision

• Less depends on domain knowledge• AI-discovered features

Signal processing + Domain knowledge“ 딥러닝은기계학습보다는

Domain Knowledge의존도가 낮다. ”

42

Deep Learning

• Convolutional Neural Networks (CNN)

• Image pattern recognition problems

Image

0

1

43

Deep Learning

• Convolutional Neural Networks (CNN)

• Image pattern recognition problems

Image

0

1

44

Handwritten Digit Recognition

Ind

ust

rial

AI L

ab.

45

AI Core vs. AI + X

46

인공지능과공학

• Generally correct in ideal cases

• Difficult to reflect uncertainties in reality

47

(artificial) neural networks: from perceptron to mlp · 2020-06-29 · artificial neural networks...

Documents