building text features for object image classification

29
BUILDING TEXT FEATURES FOR OBJECT IMAGE CLASSIFICATION Gang Wang Derek Hoeim David Forsyth

Upload: nitara

Post on 08-Feb-2016

77 views

Category:

Documents


1 download

DESCRIPTION

Building Text features for object image classification. Gang Wang Derek Hoeim David Forsyth. Main Idea. Text based image features built using auxiliary dataset of images(internet) annotated with tags. Visual classifier with an object viewed under novel circumstances. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building Text features for object image classification

BUILDING TEXT FEATURES FOR OBJECT IMAGE CLASSIFICATIONGang Wang Derek Hoeim David Forsyth

Page 2: Building Text features for object image classification

MAIN IDEA

Text based image features built using auxiliary dataset of images(internet) annotated with tags.

Visual classifier with an object viewed under novel circumstances.

So, basically,Text classifier Image

ClassifierUnified

Page 3: Building Text features for object image classification

WHAT ARE THEY TRYING TO DO?

Page 4: Building Text features for object image classification

CHALLENGES

Determine which objects are present in an image based on the text that surrounds similar images drawn from large collections.

Sounds easy but: Object appearance Pose Illumination

Page 5: Building Text features for object image classification

LOW LEVEL FEATURES CAN RESCUE BUT…..

Color Texture SIFT features Can help if we had millions of training

samples but this is unrealistic.

So what can help?????Millions of images on the internet, not tagged but the text associated with them helps classification.

Page 6: Building Text features for object image classification

EUREKA!!!!!!

Easier to determine image content using surrounding text than with currently available image features.

Given a large enough dataset, we are bound to find very similar images to an input image. So they infer likely text for an input image based on similar images

Page 7: Building Text features for object image classification

THE COMMON APPROACH

Approach Improve annotation quality or filter spurious

search results that can be used for training. The Problem

Noise or ambiguity in annotations can easily nullify any benefit

Proposal Learn a distance metric that causes images

with similar surrounding text to be similar in visual feature space.

Page 8: Building Text features for object image classification

THEIR APPROACH

Build text features for object image classification as they are expected to capture direct semantic meaning of an image.

Page 9: Building Text features for object image classification

APPROACH EXPLAINED

Dataset = Training + Test images Auxiliary Dataset= Internet images(Flickr),

have associated text.

For each training image Extract visual features. Find K nearest neighbor images from internet

dataset. Use text associated with these internet images

to build text feature. Train!!

Repeat for visual features and combine both.

Page 10: Building Text features for object image classification

VISUAL FEATURES

SIFT :

Used for image matching and object recognition. They use to detect and describe local patches. Extract 1000 local patches from each image. Quantized to 1000 clusters and each patch

denoted to a cluster index. Finally each image represented as a normalized

histogram of cluster indices.

Page 11: Building Text features for object image classification

GIST: Powerful in scene categorization and retreiving. They represent each image as a 960 dimension

GIST descriptor.

Color: Quantize each channel to 8 bins. Each pixel value is represented as integer

between 1 to 512. 512 dimensional histogram for each image.

Page 12: Building Text features for object image classification

Gradient Can be considered as global and coarse SIFT

feature. Divide image into 4*4 cells At each cell quantize the gradient into 16 bins. Whole image represented as 256 dimensional

vector.

Unified Concatenation of the 4 previously described

features. Let the above features be f1, f2, f3, f4 . Resultant features [w1f1, w2f2 ,w3f3,w4f4]

Page 13: Building Text features for object image classification

HOW TO FIND WEIGHTS:

Learn weights from training images. Aim to force the images from the same

category to be close and vice versa. Randomly select N pairs of images from the

training set. For ith pair, Si=1 if two images share atleast

one same object class, otherwise Si=0. Calculate chi square distance fj for the ith pair

as Learn weights:

Can solve directly using “fmincon” in Matlab.

Page 14: Building Text features for object image classification

CHI SQUARE???

Chi square distance(http://

www.stat.lsu.edu/faculty/moser/exst7037/geometry.pdf):

Denominator is the normalization component for each point in X.

So for n dimensions:

Page 15: Building Text features for object image classification

FMINCON?????

Finds minimum of constrained nonlinear multivariable function.

x = fmincon(fun,x0,A,b)x = fmincon(fun,x0,A,b,Aeq,beq)x = fmincon(fun,x0,A,b,Aeq,beq,lb,ub)…..

http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html

Page 16: Building Text features for object image classification

AUXILIARY DATASET

Collected from Flickr. Total 1 million images Out of which 700,000 images collected for 58

object categories whose names come from PASCAL and CALTECH 256 datasets.

Rest collected from a group called “10 million photos ”. Random images.

Page 17: Building Text features for object image classification

TEXT FEATURES

For each training/test image Find K nearest neighbor images from the

auxiliary dataset. Extract text with these associated images Build text features.

“Dogs! Dogs! Dogs!” treated as a single item.

Use only frequent tags and group names(6000) in the auxiliary dataset.

Text feature is a normalized histogram of tag and group name counts.

Page 18: Building Text features for object image classification

CLASSIFIER

SVM classifier with a chi-squared kernel for text features.

Same used for visual features as well.

Page 19: Building Text features for object image classification

FUSION

Build visual classifier Build text classifier Third classifier trained to combine the

confidence values of above two to give final prediction.

Final classifier logistic regression and is trained on a validation test.

Page 20: Building Text features for object image classification

RESULTS

PASCAL VOC 2006-10 object categories PASCAL VOC 2007-20 object categories

Performance quantitatively measured using AUC(Area under the ROC curve) for 2006 dataset and by AP(Average Precision) for 2007 dataset.

Use 150 nearest neighbor images in all experiments.

Page 21: Building Text features for object image classification

PERFORMANCE METRICS

Performance of text features built with different visual features.

Effects of combining text and visual classifiers.

Effects of varying number of training images Performance of the text features built with

varying number of internet images Effects of category names

Page 22: Building Text features for object image classification

For 2006 Dataset: Text classifier outperforms GIST KNN for each feature. Unified is best amongst all. Combination(V) etc. are obtained by training a logistic regression classifier on the validation dataset usingthe confidence values returned by the individual classifiers.

Page 23: Building Text features for object image classification
Page 24: Building Text features for object image classification
Page 25: Building Text features for object image classification
Page 26: Building Text features for object image classification
Page 27: Building Text features for object image classification

VARYING NUMBER OF AUXILIARY IMAGES

Page 28: Building Text features for object image classification

EXCLUDING CATEGORY NAMES

Page 29: Building Text features for object image classification

QUESTIONS???