inf5860 – machine learning for image analysis 17.1 ... · • neural networks was the buzzword,...

INF5860 – Machine learning for image analysis

17.1: IntroductionAnne Solberg, Tollef Jahren, Ole-Johan Skrede

Plan for today

• About the course•Motivation: what deep learning can do• Short background in image classification• Information about exercises and jupyter/python

Lecturers• Tollef Jahren: [email protected]

• Ole-Johan Skrede: [email protected]

• Anne Schistad Solberg: [email protected]

OverviewLectures Groups

Machine learning

Deep learning and image analysis

Getting deep learning to work in pratice

Applications

Jupyter, numpy and image processing

Simple image classification and basic neural nets and backpropagation

Tensorflow – toolbox for deep learning

Visualisation

6

Who is this course for?• Core focus: Master and PhD students working in image

analysis problems using deep learning• Will be useful for others with a good background in python and

linear algebra wanting to learn deep learning. • Demanding programming exercises: do all of them!!!• Expertise wanted for many companies – good job prospects

Why is deep learning so interesting?

Why image analysis?“Many tools require information extracted from images.”

Why machine learning in image analysis?Image analysis is difficult!

- The human visual system is so good in recognizing objects.

- It is almost impossible to specify all variations of an object that can occurr.

- Writing new code for each new task is a lot of work.

Why does machine learning help?“Adapt and learn all parameters about an object from data, without needing an explicit model.”

ProgramData

Input

Output

Training a machine learning system• Training data: data with known objects used to fit an algorithm to the data

• Validation data: data with known objects used to measure how well the method generalized to newdata

• Test data: data with known objects used to estimatethe error rate of the system.

Back to the 1990’s

• Neural networks was the buzzword, but why did they not work then?

• What we believed:–They require too much labelled training data–The learning time does not scale well–They get stuck in local optima–Adding layers beyong 3-4 layers do not help.

What WAS wrong with neural nets?

• We did not collect enough labelled data

• We did not have fast enough computers

• We did not understand the tricks why learning did not work– Wrong weight initialization

– Did not monitor the learning process

– No redundancy in the nets by using Dropout

2011: 25% error rate2012: 16% error rate2016: 3% error rate

Image classification - introduction

INF 5860 16

DOG

Task: use the entire image to classify the image into one of a set of known classes

Which object does the image contain

Challenges - Illumination

INF 5860 17

Challenges - occlusion

INF 5860 18

Challenges – background clutter

INF 5860 19

Challenges - deformation

INF 5860 20

Challenges – intraclass variation

INF 5860 21

INF 4300

Conventional approach to image classification• Goal: get the series of digits,

e.g. 14159265358979323846……

Steps in the program:

1.Segment the image to find digit pixels.

2.Find angle of rotation and rotate back.

3.Create region objects – one object pr. digit or connected component.

4.Compute features describing shape of objects

5.Train a classifier on many objects of each digit.

6.Assign a class label to each new object,i.e., the class with the highest probability. 22

Edge detection using filters

Designing features is difficult

How would you describe the content of these images?

Two-layer net for image classification

INF 5860 25

xi one input nodefor each pixel

W1(i,j) weight for input feature j tonode i in the hiddenlayer

W2(j,k) weight for hidden node k to output node j

Hidden layer

One output node for each clas.sk is the score for class k

Architecture for a feed-forward net

INF 5860 26

Training a neural net - backpropagation

INF 5860 27

Compare the output to trueobject classSend differences backwardsin the netIteratively update the weightson all connections

The danger of overfitting to training data

INF 5860 28

4-layered net- input layer not counted

How does a fully connected see the world?A neural network or standard machine learning have to learn that pixels close to each other are more related.

A cat moved from one part of a picture to the other is viewed as completely different objects.

Finding features that are invariant to scale, shape color, contrast, and location is difficult!

Convolutional networks enforce similar response to neighboring pixels

What do the layers learn?

How does deep networks create invariance?

Will all these variants need a lot of unique representations?

Reuse is important!

- A network can reuse representations for many purposes

- Eye detectors can be used for humans as well as cats

- Edges and corners are essential also for cars and other objecs.

Generalizing out of class and contextSince all sample images contribute to a representations, it is possible to generalize without having seen all configurations.

A network can “understand” e.g. a woman with glasses, even though it has no examples of women with glasses in the training set. This is because it has learned that men and women is similar and that men can have glasses.

Similarly it can learn that fur can be black, and therefore cats may have black fur, even though there is no black cat in the training set.

Transfer learning from pretrained network- Neural networks share representations

across classes

- A network train on many classes and many examples have many general representation

- You can reuse these features for many different applications

- Only do a simple classification or train the last layer of the network.

What can you transfer to?- Detecting special views in Ultrasound

- Initially far from ImageNet

- Benefit from fine-tuning imagenet features

- 300 patients, 11000 images

Standard Plane Localization in Fetal Ultrasound via Domain Transferred Deep Neural Networks

Recurrent networks (time series data) –Example: Predicting the next character

Training a model on “hello”

Learn relations between characters

The hidden layers have connection (represent information relating characters in this case)

Inputing Shakespeare- Input a lot of text

- Try to predict the next character

- Learning spelling, quotes and punctuation automatically.

https://gist.github.com/karpathy/d4dee566867f8291f086

Inputing Shakespeare - ??? - Profit?

Inputing C code (linux kernel)

Recurrent networks for image captioning

Application examples

Segmentation

Detection- Simpler than segmentation

- Find a bounding box around each object

Keypoint detectionMatch objects in different images, track objects

Depth estimation from one or two cameras

Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks

Example results from deep learning

https://www.youtube.com/watch?v=9reHvktowLY

https://www.youtube.com/watch?v=rhNxt0VccsE

https://www.youtube.com/watch?v=ohmajJTcpNk

https://www.youtube.com/watch?v=qWl9idsCuLQ

Generate captions

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Image generation

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Limitations of deep learning- Complex tasks can be difficult to learn,

particularly memory-intensive tasks

- It is easy to «fool» the network

- Structured data might be necessary, and this information must sometimes be built in.

- Some tasks are untrainable (currently).

- Deep learning requires good knowledge of technical details, correct use, and «dirtly tricks».

Web page• http://www.uio.no/studier/emner/matnat/ifi/inf5860/2018/index.html

• Lectures with links to lecture foils, curriculum, weekly exercises, and mandatory exercises.

• Canvas: quiz and discussion forum– See an introductory quiz about your background

• Messages

• Urgent messages sent to [email protected], [email protected], which links to your Ifi/uio official email –read this!

17.1.2018 INF 5860 52

Group sessions• Tuesday 12-14 and Wednesday 10-12• Led by one (or more) of the lecturers• 2x2 hours to give you time to solve with help• Programming exercises as jupyter notebooks• Some theory exercises• Theory quiz in Canvas

17.1.2018 INF 5860 53

Groups next week (23-24.1):

•Tuesday 12.15-13: background lecture onimage convolution

•Work on exercises in basic python and setting up jupyter notebooks Tuesday 13.15-14 and wednesday

Mandatory exercises:• Required: use Python and Tensorflow!

• Exercise 1: Implement Knn classifier and logistic classificationusing stochastic gradient descent in numpy – Start after nextlecture!

• Exercise 2: Implement basic neural nets and backpropagationin Python.

• Exercise 3: Implement Convolutional neural nets in Tensorflow.

17.1.2018 INF 5860 55

Using jupyter notebooks• Three options:

–Use the linux computers in modula•Or at other linux machines as /opt/ifi/anaconda2/bin/jupyter-notebook

–Use jupyterhub.uio.no–Download anaconda/python 3 to your own computer

• Start the notebook from the folder where you have downloaded the exercise

Using jupyterhub.uio.no

• Log on to jupyterhub.uio.no

• In the home directory, make a folder INF5860

• Create subfolders for each week or exercise.

• Download the zip-file to the correct subfolder

• Open a new notebook and type the following to unzip the files:

Curriculum• Lecture notes, weekly exercises and mandatory exercises defines the

curriculum.

• The field is so new and rapidly changing that no printed book covers it well.

• Parts of the lecture are based on:–Deep learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, available

at http://www.deeplearningbook.org/

• Each lecture will give links to relevant literature

• Very good material in the Stanford courses CS 229 and CS 231n. Links will be given.

17.1.2018 INF 5860 58

Required background• Good knowledge in Python programming

• Experience in image analysis programming desired–Attend Tuesday at 12.15 : lecture on convolution!

–Sliding windows, convolution, edge detection, basic pixel-basedclassifcation.

• Good knowledge in mathematics (as a tool, not derivations of thetheory)

–Linear algebra

–Gradient-based optimization

–Gradients and derivation18.1.2018 INF 5860 59

Linear algebra required knowledge

•Vectors•Dot products•Matrix multiplication•Matrix transpose and inversion

17..1.2018 INF 5860 60

61

Required: image gradients

●The gradient in a point has a direction and a magnitude.

●The direction is the direction withmaximum slope, and the magnitude is the slope in that direction.

*

Required: Gradient descentoptimization

20.1.2017 INF 5860 62

Repetition/introduction to pixel-based classification(from INF 4300)

17.1.2018 INF 5860 63

Repetition of pixel-based classification(from INF 4300)

• xi – feature vector for pixel i

• i- The class label for pixel i

• K – the number of classes given inthe training data

INF 4300 64

Mask with training pixels

Multiband image withn spectral channels or features

Concepts in classification• Supervised classification: a dataset with known class labels is used to train the

classifier.

–Training set: used to estimate parameters of the classifier

–Validation set: used to estimate hyperparameters

–Test set: ONLY used at last to estimate the final classification accuracy

–Classifier accuracy: computed by comparing estimated and true labels.

– Parametric classification: Computing the probability that an object belongsto a class.

• Let each class be represented by a probability density function. Assigna pixel to the class with the highest probability (or smallest distance)

INF 4300 65

•Now: introduction to KNN-classificationbased on ALL PIXELS in the image

• Introducing the CIFAR-10 data set

17.1.2018 INF 5860 66

The CIFAR image data set

17.1.2018 INF 5860 67

10 classes50 000 training images, size 32x3210 000 test images

https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

• L1-distance

• L2-distance

Measuring similarity between two images

17.1.2018 INF 5860 68

L2 is called Euclidean distance

1 | 1 , 2 , |

2 1 , 2 , 2

Nearest neighbour image classifier

•For a new image, compute the L1 or L2-distance to all the images in the training data set to find the single most similar image.

•Assign the image to the same class as themost similar image.

•Note: all training images must be stored. 17.1.2018 INF 5860 69

INF 4300 70

k-Nearest-Neighbor classification• Classification of a new sample xi is done as follows:

–Out of N training images, identify the k nearest neighbor images (measured by L1 or L2) in the training set, irrespectively of theclass label.

–Out of these k samples, identify the number of images ki thatbelong to class i , i:1,2,....M (if we have M classes)

–Assign xi to the class i with the maximum number of ki samples.

• k should be odd, and must be selected a priori.

• The distance measure and k are hyperparameters.

INF 4300 71

About kNN-classification• If k=1 (1NN-classification), each sample is assigned to the same class as the

closest sample in the training data set.

• If the number of training samples is very high, this can be a good rule.

• If k->, this is theoretically a very good classifier.

• This classifier involves no ”training time”, but the time needed to classify onepattern xi will depend on the number of training samples, as the distance to all points in the training set must be computed.

• ”Practical” values for k: 3<=k<=9

• Classification performance should always be computed on the test data set.

Next week:• Logistic regression and classification• Introduction to loss functions and gradient descent.• Classification as a regression problem with a learning rule

• Softmax-classification.

17.1.2018 INF 5860 72

inf5860 – machine learning for image analysis 17.1 ... · • neural networks was the buzzword,...

Documents