building text features for object image classification
DESCRIPTION
Building text features for object image classification. Group 1 : Eddie Sun, Youngbum Kim, Yulong Wang. Which object is presented ?. Why we need text features?. Main idea & Insights. Main idea - PowerPoint PPT PresentationTRANSCRIPT
Building text features for object image classification
Group 1: Eddie Sun, Youngbum Kim, Yulong Wang
Which object is presented?
Why we need text features?
Main idea & InsightsMain idea
◦ Determine which objects are present in an image based on the text that surrounds similar images.
Insights◦ First, it is often easier to determine the image
content using surrounding text than with currently available image features.
◦ Given a large enough dataset, we are bound to find very similar images to an input image, even when matching with simple image features.
Illustration for building text features
Internet
Images
with text
Text Features
Framework of the approach
K Most Similar Images
Texts of These Similar Images Training
Process
Visual Features: SIFT, Gist, Color, Gradient and Unified of all previous one
ExperimentDataset
◦The PASCAL Visual Object Classes Challenge
ExperimentFeatures
◦SIFT◦Gist
an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.)
◦Color Color Features in the RGB space
◦Gradient◦Unified
a concatenation of the above four features
Experiment
Experiment
Experiment
Experiment
Experiment
Summary How it works Results
How it works?
Input Image1. Training images2. Test images
Extract visual features
Return most similar images with their labels
Get similar images based on visual features
Internet images dataset
with text
Dog, pet, animal
Cute, puppy, canine Dog cool
dogs, boxerConstruct
text features from labels
DogPuppy
Text features
• SIFT• Gist• Color• Gradie
nt• UnifiedVisual features
Visual Classifi
erText
Classifier
Fusion Classifi
er
Merge
DogFinal
Output
Notes• Unified Feature – weighted
average of the above 4 features
• Text features – normalized histogram of tags counts
Learn parameters on training images
ResultsText features are built from visual
features.Better visual features -> better text features
Combining visual and text classifiersVisual and text classifiers correct each other
Number of training imagesSmall number of training images -> text classifiers outperform visual classifiersCombine -> always better
Number of Internet images in dataset200,000 -> 600,000 : Big improvement600,000 -> 1 million : very small improvement
Questions?
Thank you!