inf 5860 machine learning for image classification lecture ... · inf 5860 machine learning for...

55
INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January 24, 2017

Upload: others

Post on 18-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

INF 5860 Machine learning for image classification

Lecture 2 : Image classification and regressionAnne Solberg January 24, 2017

Page 2: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Today’s topics

• KNN (K- nearest neighbor)• Crossvalidation• Linear regression• Loss functions• Minimize loss functions using gradient

descent• Logistic regression – continued next week

INF 5860 3

Page 3: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Today’s read

• Note on introduction to classification– http://cs231n.github.io/classification

• Linear regression– Deep Learning Chap 5.1 (brief introduction)– More details: read

• http://cs229.stanford.edu/notes/cs229-notes1.pdf(pages 1-7 and 16-19)

• Gradient descent– Deep Learning Chap 4.3

INF 5860 4

Page 4: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Relevant additional video links:

• https://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv– Lecture 2 and 3– Remark: they do not cover regresion.

INF 5860 5

Page 5: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Image classification - introduction

INF 5860 6

DOG

Task: use the entire image to classify the image into one of a set of known classes

Which object does the image contain

Page 6: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Where we currently are

• KNN (K- nearest neighbor)• Crossvalidation• Linear regression• Loss functions• Minimize loss functions using gradient

descent• Logistic regression

INF 5860 7

Page 7: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

8

From last week: k-Nearest-Neighbor(KNN) classification• Classification of a new sample x(i) is done as follows:

– Out of N training images, identify the k nearest neighbor images (measured by L1 or L2) in the training set, irrespectively of theclass label.

– Out of these k samples, identify the number of images kj thatbelong to class j , j:1,2,....M (if we have M classes)

– Assign x(i) to the class j with the maximum number of kj samples. • k should be odd, and must be selected a priori. • The distance measure and k are hyperparameters.

Page 8: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

• L1-distance

• L2-distance

Measuring similarity between two images

INF 5860 9

L2 is called Euclidean distance

=∑ ∑ 1 , 2 ,

Page 9: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

KNN

• The KNN-classifier is mostly used for small datasets. • There is no training, but classifying/predicting new

data is slow as the size of the training data increases.

• The parameter k (and possibly the distancemeasure) should be found by cross-validation. – IF we have large amounts of data we will have a separate

validation set, but we avoid using KNN for large datasets.

INF 5860 10

Page 10: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Is KNN robust to changes in position, color, shape, contrast?• All these images have the same L2-distance

from the original.

INF 5860 11

Page 11: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Where are we

• KNN (K- nearest neighbor)• Crossvalidation• Linear regression• Loss functions• Minimize loss functions using gradient

descent• Logistic regression

INF 5860 12

Page 12: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Selecting k using cross-validation• For each value of k:

– Cross-validation: split the training data into d subset/folds– Train on data from d-1 folds , – Estimate the accuracy/compute the number of correctly classified

images on the last fold , store the accuracy.– Repeat this nfold times and compute for the average of the

accuracies.

• Repeat with different values of k, select the value that got the highestaccuary.

• Now train on the entire dataset using this value of k, then classify the test data set ONCE to get the accuracy of the classifier.

INF 5860 13

Page 13: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

• Example: 10000 training images, 5000 test images.• Split training images into 10 folds of 1000 images.• Train on 9 folds (9000 images), compute accuracy

on the last 1000 images during cross-valiation.• After finding k: train on all 10000, and estimate the

reported accuracy on the 5000 test images.

INF 5860 14

Page 14: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Cross-validation plot

INF 5860 15

For each value of k, show themean value and the standard deviation of the classificationaccuracy

Page 15: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Where are we

• KNN (K- nearest neighbor)• Crossvalidation• Linear regression• Loss functions• Minimize loss functions using gradient

descent• Logistic regression

INF 5860 16

Page 16: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Notation

• m – number of examples in the training data set

• nx – input size, dimension of the input data • ny – output size: typically number of classes• Superscript x(i): training sample number i, if x

is a single measurement for each sample.• Later for images: matrix notation for X for a

set of m training images.

INF 5860 17

Page 17: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

From linear regression to logistic regression for classification• Logistic classification is done iteratively in a

manner similar to neural nets.• We study this method before proceeding to

neural networks.• First – a brief look at linear regression to

introduce loss functions and gradient descentminimization.

INF 5860 18

Page 18: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

The linear regression problem• Predict the true values based on data

vector from a training data set. • In regression, we want to predict

(a continuous number) based on data . – Example: Predict the population in

Norway based on measurements from 1990-2010.

• Linear hypothesis

• Learning will be based on comparingand

INF 5860 19

y

Training set,

Learning algorithm

Hypothesis

Estimated value

Page 19: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Linear regression: training data set

INF 5860 20

Page 20: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

INF 5860 21

yi- iy

Blue: true data pointsRed: estimated function

The distance between the true (blue) and estimated (red) value will be used to measure how good the fit isThis is Mean Square Error (MSE) ifsummed over all samples.

Page 21: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Where are we

• KNN (K- nearest neighbor)• Crossvalidation• Linear regression• Loss functions• Minimize loss functions using gradient

descent• Logistic regression

INF 5860 22

Page 22: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Error measure for learning linear regression: Mean square error(MSE)• Mean square error over the training data set• Training data: a set of m samples x={x(i),i,i==1..m}• x(i) can consist of one of more variables/features, e.g.

several measurements. , ∑

INF 5860 23

line a of parameters thefit, want toweparameter theis

norm-L2 ˆ21 :formIn vector 2

2

yy m

Page 23: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent minimization

• Let’s see how gradient descent can be used to find w that mimize MSE.

• Read Section 4.3 in Deep Learning.

INF 5860 24

Page 24: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent intuition

INF 5860 25

*

Start from a point and take a step downhill in the steepest possible directionRepeat this until we end up in a local minimumIf I start from a neighboring point, I should end in the same minimum

Page 25: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent intuition

INF 5860 26

*

If we start from a different point we might end up in another local minimumFor finding the direction, compute the local derivative in the point

Page 26: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Iterative minimization outline

• Have a function J(w,b) (can be generalized to more than twoparameters)

• Want to find w,b that minimize J(w,b)• Outline

1. Start with some value of w,b (e.g. w=0,b =0) 2. Compute J(w,b) for the given value of w,b and change w,b in a manner

that will decrease J(w,b)3. Repeat step 2 until we hopefully end up in a minimum

INF 5860 27

Page 27: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent principle• Given a function J(w,b) of several variables.• The directional derivative in a given direction is the slope of J(w,b) in that

direction. • To iteratively minimize f, we want to find the direction in which f decreases the

fastest.• This can be shown to be in the opposite direction as the gradient.

• So we can minimize f by taking a step in the direction of the negative gradient.

• The gradient descent propose a new point where is thelearning rate.

• This is done iteratively in a number of iterations. • If is too small, the algorithm converges too slow. • It is too large, it may fail to converge, or diverge.

INF 5860 28

)(' xfxx x

Page 28: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Back to linear regression

INF 5860 29

Example function Countours of the function

Model hypothesis:

Page 29: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent for linear regression• Let w and b be the unknown two parameters in the linear model:

• We want the minimize the criterion function, , , by computing thederivate with respect to w and b, and set the derivative to 0.

,12

12

• This is a quadratic (convex) function and we can find the global minima.

• Remark: in the literature you will see used as a vector of all parameters.

INF 5860 30

Page 30: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent for linear regressionUnivariate x – a single feature/gray level

• Here we sum the gradient over all xi in the training data set. • This is called batch gradient descent.

INF 5860 31

ii

ii

iii

xybxwm

ybxwmw

bwJw

22

21),( 2

Here we usethe chain rule

iii

iii

ybxwm

ybxwmw

bwJb

22

21),( 2

Page 31: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent algorithm for one variable x Gradient descentrepeat until convergence

=

= ∑

= b ∑

Linear regression modely=wx+b

INF 5860 32

Page 32: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Example of J(w,b) for a general line (y= w x+ b)

INF 5860 33

Page 33: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

The result from gradient descent

INF 5860 34

Page 34: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

The value of J overlaid the values of w,bafter every 50th iteration

INF 5860 35

Page 35: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

J as a function of iterations

INF 5860 36

Page 36: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Implementing gradient descent

w1

b b1

• The sum over all samples x(i) can be done on vectorsusing np.sum() and other vector operations.

INF 5860 37

Page 37: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent in practice: finding the learning rate• How do we make sure that the optimization runs correctly?

– Make sure J decreases! Plot J as a function of the numberof iterations

• Computation of J should also be vectorized

• J w, b ∑

– If is too small: slow convergence– If is too large: may not decrease, may not converge– is a number between 0 and 1, often close to 0 (try

0.001,…0.01,….0.1,….1)

INF 5860 38

Page 38: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Where are we

• KNN (K- nearest neighbor)• Crossvalidation• Linear regression• Loss functions• Minimize loss functions using gradient

descent• Logistic regression

INF 5860 39

Page 39: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Introduction to logistic regression

• Let us show how a regression problem can be transformed into a binary (2-class) classificationproblem using a nonlinear loss function.

• Then generalize to multiple classes (next week).

INF 5860 40

Page 40: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Introduction

INF 5860 41length

Cod=1

Herring=0 x xx

xx x x

x

x

Page 41: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

What would linear regression give?

• Maybe we would threshold this?

INF 5860 42length

Cod=1

Herring=0 x xx

xx x x

x

x

Threshold: if wx+b<0.5: class 0, otherwise 1

Page 42: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

What would linear regression give?

INF 5860 43length

Cod=1

Herring=0 x xx

xx x x

x

x x x

Page 43: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

What if we fitted it to a function f(x) that is close to either 0 or 1?• Hypothesis h(x) is now a non-linear function of x

Classification: y=0 or 1Threshold h(x): if h(x)>0.5 : set y=1, otherwise set y=0

• Desirable to have h(x)≤1

INF 5860 44length

Cod=1

Herring=0 x xx

xx x x

x

x

Page 44: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Logistic regression model

• Want 0≤ h(x)≤1 (binaryproblem)

• Let•

• g(z) is called the sigmoid function

INF 5860 45

Page 45: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Decisions for logistic regression

• Decide y=1 if h(x)> 0.5, and y=0 otherwise

• g(z)>0.5 if z>0– wx+b>0

g(z)<0 if z<0wx+b<0

INF 5860 46

x

z

T

T

eXh

ezg

xgXh

11)(

11)(

)(

Here the compact notation means the vectorof parameters [w,b]

Page 46: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

An example with 2 features

INF 5860 47

X(i,2)

X(i,1)

Predict y=1 if X(i,1)+2X(i,2)-4≥0

Decision boundary X(i,1)+2X(i,2)-4=

If we KNOW b, w1 and w2, classification is based on whichside of the boundary we are on.

=g(b+w1X(i,1)+w2X(i,2))

Page 47: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Nonlinear boundary possible?

• The basic model gives a linear boundary.

• To get a non-linear boundary needed for thisexample, higher order terms must be added

INF 5860 48

0)2,()1,(1 if 1Predict y

)2,()1,())2,()1,(()(22

24

23210

iXiX

iXiXiXiXgxh

Page 48: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Introducing a logistic cost function

• Training set

• How do we set the parameter vector to have high classification accuracy?

INF 5860 49

xT

ex

jXmy

yy

y

nmXmXmX

nXXXnXXX

X

11)(h

i samplefor features all :i Row :)X(i,j featurefor samples all :j Column )(:,

)(

)2()1(

),()1,()0,(

),1()1,1()0,1(),0(..)1,0()0,0(

Page 49: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Logistic regression cost

Minimize

Due to the sigmoid function g(z), this is a non-quadratic function, and non-convex.Set

INF 5860 50

m

iiyiXh

mJ

12)(:)),((

21)(

yprobabilita Mimick Cost:1)( and 0 yIf

0)( and 0 yif 0 Cost Cost:0)( and 1 yIf

1)( and 1 yif 0 Cost

0 yif ))(1log(

1 yif ))(log()),((

xhxh

xhxh

xhxh

yxhCost

Correct classification, no cost

Check that this functiondoes what we want

Correct classification, no cost

Wrong classification, high cost

Correct classification, no cost

Page 50: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Cost function for logistic regreesion

• We have two classes, 1 and 0.• Let us use a probabilistic model

Let the parameters be =[w1,….wnk,b] if we have nk features.• P(y=1|x,) = h(x)• P(y=0,x)= (1- h(x))• This can be written more compactly as

p(y|x, ) = h(x)y(1- h(x)) 1-y

INF 5860 51

Page 51: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Cost function for logistic regreesion

• The likelihood of the parameter values is

• It is easier to maximize the log-likelihood

• We will use gradient descent to maximize this, taking a step in the positive direction since we are maximizing, not minimizing

INF 5860 52

Page 52: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Computing the gradient of thelikelihood function

INF 5860 53

Here, we used the fact the g’(z)=g(z)(1-g(z))

Page 53: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Gradient descent of J()=-L()

INF 5860 54

)),()(:)),(((1

:Repeatdescentgradient using )J( minimize that find : find To

:)),((1log())(1(:),((log)(1)(

1

1

jiXiyiXhm

iXhiyiXhiym

J

m

ij

jjj

m

i

This algorithm looks similar to linear regression, but now

xT

exh

11)(

Page 54: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Mandatory exercise 1

• Available in few days.• Complete two notebooks

– knn.ipynb and additional functions• Can start as soon the the exercise is available

– softmax.ipynb and additional functions

INF 5860 55

Page 55: INF 5860 Machine learning for image classification Lecture ... · INF 5860 Machine learning for image classification Lecture 2 : Image classification and regression Anne Solberg January

Next week

• Generalizing logistic regression to multiple classes

• Adding regularization• Using it to classify images

INF 5860 56