image representation usage guide

Lu, Wang-Chou

Image Representation Usage Guide

2014/10/01 @ 林⼜⼝口

Why Image Representation?

Machine Learning Course @ Caltech

Xi = 500 x 375 D

Low Dimensional Vector

Map of Representation

Robust

Computation Cost

Color Histogram, !PCA, !

Sparse Coding, !Bag of Visual Word, !

DPM, !Deep Learning…

Outline❖ Hand Crafted Features!

❖ Machine Learning approach!

❖ Hierarchical Approaches!

❖ When to use? Real-time vs Precision

Hand Crafted Features❖ Color Histogram, Template, Haar Features!

❖ Interested Point Detector + HOG!

❖ Bag of Visual Word

Simple FeaturesHistogram Based Haar Features

Template Based

SIFT Like Approach

HOG, 3780D, !overlapped 7 x 15 cells * !

( normalized 2x2 grid) * 9 binsSIFT 128D,|V| = 1!4 X 4 grid * 8 bins

David Lowe [IJCV 2004] N Dalal et al [CVPR2005]

Bag of Visual Word + SPM

SVM

SIFT!Descriptor

S. Lazebnik et al [cvpr06]

Machine Learning Approach❖ Dimensionality Reduction!

❖ PCA, Manifold Learning, Sparse Coding, LSH!

❖ Deformable Part Model!

❖ Neural Network!

❖ Convolution Neural Network

Principle Component Analysis

MA Turk et al [cvpr91]

Manifold Learning

[ISOMAP, LLE 2003]

Sparse Coding

H Lee et al. [NIPS 2007]

reconstruction error sparsity

Y: Input Vector!B: Basis Matrix!

Z: weight

Locality Sensitive Hashing Embedding

Deformable Part Model

Pedro F. Felzenszwalb et al [PAMI 2010]

Neural Network

Tanh & Sigmoid !nonlinear function

Convolution Neural Network

LeCun 1989

Krizhevsky et al. [NIPS2012]

ReLu

State of the Art

GoogleNet 2014

MSRA2014

Deepness Table

Convolution Neural Network & Deformable Part Model use max pooling, !others use sum pooling or say histogram pooling

Image Representation Usage Guide

1

1.5

2

2.5

3

3.5

4

4.5

5

iPhone 5s Tegra K1 or PC Geforce Titan HPC

Real Time ApplicationInteractive Application

Color Histogram

SIFT/HOG

Bag of Visual Word

BoW+ SPM, Deformable Part Model

Convolution Neural Network

45 gflops 370 gflops 5.1 tera flops

Deepness

gflops for Single Precision, PC: i7 3.5G 4 cores parellel computing

TRAINING TIME NOT INCLUDED

Some Tips❖ GPU ~= 50 CPU Cores!

❖ Hand Crafted Feature is shallow, higher feature template need to be learnt.!

❖ Do Dimensionality Reduction!

❖ Deeper Features, More Training Data!

❖ Handle Invariance: Registration vs Spatial Pooling!

❖ The Learnt Deep Representation(CNN) is shareable

image representation usage guide

Data & Analytics

histogram pooling

deep learning

machine learning course

neural networktanh sigmoid

sum pooling

felzenszwalb et

model use max pooling

lazebnik et