statistical learning for image orientation · statistical learning for image orientation luke...

Statistical Learning for Image Orientation

Luke Barrington

Mentors: Nuno Vasconcelos, UCSD ECE Dept.Babak Jafarian, Cal-(IT)2

Agenda

Project OverviewSVM BackgroundParameter SelectionFirst ExperimentFeature Selection MethodsFuture Work / Wrap Up / Questions

Project OverviewDevelop an image classification system for MMS-enabled cell phonesMotivation is for use by spectators at Athens 2004 Olympic gamesUse modern classification technologies to improve server-side organization and provide useful user output

Goal for my project: image orientation

Sports Photographers

Image OrientationCamera phones, scanners, digital cameras, online image databases, …Device output has standard dimension

At least 50% will be incorrect

Bulk, automated, corrective processing before display to human users

System Block Diagram

MMS/Internet server

.

.

.

.

.

.

UnorderedDirectory

OrderedOutputUser Input

Classifier

Classifier Block DiagramUser’s Parameters

+1-1…+1

KernelErrorWeights

TRAIN ModelParameters CLASSIFY

+1

-1

Training DataTraining Data

Data Labels

Class Label

Unlabeled Data

Agenda


Decision Boundary

Margin

Training = Learn Decision Boundary

-1 training examples

+1training examples

Support Vectors

Decision Boundary

Classification

Support Vectors

-1

+1

B

Test B = +1

A

Test A = -1

Extension to Multi-Class Problems

SVM is a binary decision classifierOne solution is a decision tree format

n-class problem requires ½n(n-1) classifiers

A B C D

A/B C/D

A/C A/D B/C B/D

Multi-Class: 1-vs-All ClassifierAlternative is a 1-vs-all classifierTrain each class with all training dataLabel: +1 (in the class) or

–1 (in any other class)Requires n classifiers for an n-class problemOutput may be undetermined (all classifiers return –1) Class +1 -1

a a {b, …, n}b b {a, c, …, n}

n n {a, …, n-1}M MM

Linear Support Vector Machine for separable dataAssume the training points can be divided by a linear hyperplane:

xi●w + b = 0 w is the normal vector

is perpendicular distance to origin

With appropriate scaling, the training data satisfy:xi●w + b ≥ +1 for yi = +1xi●w + b ≤ -1 for yi = -1

Combine this into a single condition: yi(xi●w + b) – 1 ≥ 0 for all i

The width of margin is: 2/||w||Robust classifier has widest margin => minimize ||w||2

wb−

Linear SVM, separable data contd.

Primal Lagrangian formulation:

Setting the gradient of LP w.r.t. w & b = 0 gives the conditions:

Combining these equality constraints in the dual formulation gives:

SVM training now amounts to maximizing LD with respect to the weights αi, subject to the constraints above and αi ≥ 0

Support vectors are all the training points where αi > 0

For classifier’s purposes, all other points can be ignored

∑∑ ⋅−== ji

jijiji

l

iiD xxyyL

,1

ααα

( ) ∑∑==

++⋅−=l

ii

l

iiiiP bwxywL

11

2

21 αα

011

== ∑∑==

l

iii

l

iiii yandxyw αα

Linear SVM for unseparable dataWe modify the equations from linear and separable case by the addition of a “slack” variable ξ. Now the equations become

xi●w + b ≥ +1 – ξi for yi = +1xi●w + b ≤ -1 + ξi for yi = -1 ξi ≥ 0if ξi > 1 => an error occurred Σiξi is an upper-bound on the training error

Now we want to minimize:

C is a user-specified cost parameter

This is again a constrained optimization problem where we want to maximize:

Subject to:

The data which satisfy this maximization are the Support Vectors

∑∑ ⋅−== ji

jijiji

l

iiD xxyyL

,1ααα

k

iiC

w⎟⎠

⎞⎜⎝

⎛− ∑ξ

2

2

001

=≤≤ ∑=

l

iiii yandC αα

Non-linear SVMsGeneralize to non-linear decision function for the classifierMap the data to some other (higher-dimension) Euclidean space H:

Now the problem depends only on

Find a kernel function such that

Use this (non-infinite) result to get classifier output

HRd →Φ :

)()( ji xx Φ⋅Φ

)()(),( jiji xxxxK Φ⋅Φ=

( ) ∑∑==

+=+Φ⋅Φ=SS N

iiii

N

iiii bxsKybxsyxf

11),()()( αα

Agenda


SVM ParametersKernel:

Linear (dot product) K(xi,xj) = (xi.xj)Polynomial: K(xi,xj) = (xi.xj + 1)p

RBF: K(xi,xj) = e-|| xi-xj ||²/2σ²

Sigmoid neural network: K(xi,xj) = tanh(κxi.xj -δ)

C (Error penalty) :Sets an upper bound on the alpha coefficients Higher C gives higher weight to outliersLow C assumes many data points are outliersAssign different C for each class based on number training data points

Parameter Selection“Research” = correct choice of parametersRadial Basis Function generalizes bestParameter space is 2-DFind optimum set using grid search

σ

C

Agenda


Sample Problem: Car Orientation

= +1

= -1

Feature Space

Grayscale pixel values (0 to 255)No input scalingDown-sample image to 128x128 and row scan

Input data is 16384-dimensional vector

Data SetFrom MIT labs516 imagesTraining set: 400Validation set: 50RBF kernel, σ = 4 C = 25

Car Orientation ResultsClassifiers for {0º, 30º, 60º, 90º, 120º, 150º, 180º}

In-class detection accuracy = 97.5 ± 2%False Positive < 2%

Initial ResultsSVM solves this problem easily on a limited domainTraining time can be significant (large feature size)Classification is almost instantaneous

Agenda


Expansion of Training SetEfficacy of classifier depends on training data

Corel Database

Sports Images

Crawl web sites

Use subclasses to choose feature spaceTrain “final” classifier on everything

Feature SpaceGrayscale pixel valuesSquare images

Each image provides 4 training examples

Down-sample image to 112x112Input data is 12544 dimensional vector

= +1 = -1

ResultsBest Corel class

70% detection rate, 3% false positiveWorst Corel class

0% detection rate, 30% false positiveSports images

63% detection rate, 14% false positive

Still only working on one specific class at a time

New Feature SpacePixel data is over-sized and under-poweredNeed to capture edge information in more succinct and compact formatEdge orientation histogram

Eigenvalues of gradient matrix

Gradient Matrix

comes from edge detection

⎟⎟⎠

⎞⎜⎜⎝

⎛=0k

⎟⎟⎠

⎞⎜⎜⎝

⎛=

k0

for vertical edges for horizontal edges

•Construct a gradient field on N x N image sub-blocks

•Sub-block size depends on image size

•Eigenvectors of H are features

•Use N = 16, 8, 4, 2, 1 to capture both local and global features

•Feature size = 2(162 + 82 + 42 + 22 + 1) = 682

Using Color DataPixel intensity alone requires 12544 dimensional input spaceAdding color => increase by 200%HSV space mean & varianceat image borders (Wang & Zhang 2003)

Current WorkApply new feature space to experiments reviewedExplore color feature representationsTrain general system across all classesOptimize system to work with camera phone outputDevelop web site / application for digital photos

Add external modules (e.g. face detection, scene classification) to improve performance

Agenda


statistical learning for image orientation · statistical learning for image orientation luke...

Documents