spot the dog: an overview of semantic retrieval of unannotated images in the semantic gap project

Spot the Dog An overview of semantic retrieval of unannotated images in the Semantic

Gap projectSemantic Image Retrieval - The User Perspective

Jonathon Hare Intelligence, Agents, Multimedia Group

School of Electronics and Computer ScienceUniversity of Southampton

{jsh2}@ecs.soton.ac.uk

The previous talks have described the issues associated with image retrieval from the practitioner perspective -- a problem that has become known as the ‘semantic gap’ in image retrieval.

This presentation aims to explore how the use of novel computational and mathematical techniques can be used to help improve content-based multimedia search by enabling textual search of unannotated imagery.

Introduction

Unannotated Imagery

Manually constructing metadata in order to index images is expensive.

Perhaps US$1-$5 per image for simple keywording.

More for archival quality metadata (keywords, caption, title, description, dates, times, events).

Every day, the number of images is increasing.

In many domains, manually indexing everything is an impossible task!

Unannotated Imagery An Example

Kennel club image collection.

relatively small (~60,000 images)

~7000 of those digitised.

~3000 of those have subject metadata (mostly keywords), remainder have little/no information.

Each year, after the Crufts dog show they expect to receive additional (digital) images [of the order of a few 1000] with little, if any metadata, other than date/time (and only then if the camera is set-up correctly).

An Overview of Our Approach

Conceptually simple idea: Teach a machine to learn the relationship between visual features of images and the metadata that describes them.

So, two stages:

Use exemplar image/metadata pairs to learn relationships.

Project learnt relationships to images without metadata in order to make them searchable.

Modelling Visual Information

In order to model the visual content of an image we can generate and extract descriptors or feature-vectors.

Feature-vectors can describe many differing aspects of the image content.

Low level features:

Fourier transforms, wavelet decomposition, texture histograms, colour histograms, shape primitives, filter primitives, etc.

Higher-level features:

Faces, objects, etc.

Visual Term Representations

A modern approach to modelling the content of an image is to treat it like a textual document.

Model image as a collection of “visual terms”.

Synonymous with words in a text document.

Feature-vectors can be transformed into visual terms through some mapping.

Visual Term Representations Bag-of-Terms

For indexing purposes, we often discount order/arrangement of terms and just count number of occurrences.

The quick brown fox

jumped over the lazy dog

brown dog fox jumped lazy over quick the

1 1 1 1 1 1 1 2[ ]1[ 2 0 0 6 ]

Visual Term Representations Example: Global Colour Visual Terms

A common way of indexing the global colours used in an image is the colour histogram.

The each bin of the histogram counts the number of pixels of the colour range represented by that bin.

The colour histogram can thus be used directly as a term occurrence vector in which each bin is represented as a visual term.

1569

3408

491

0 0

902

2146

5026

0 0 56

3633

0 0 0

6827

Visual Term Representations Example: Local interest-point based visual terms

Features based on Lowe’s difference-of-Gaussian region detector and SIFT feature vector.

A vocabulary of exemplar feature-vectors is learnt by applying k-means clustering to a training set of features.

Feature-vectors can then be quantised to discrete visual terms by finding the closest exemplar in the vocabulary.

Semantic SpacesBasic idea: Create a large multidimensional space in which images, keywords (or other metadata) and visual terms can be placed.

In the training stage learn how keywords are related to visual terms and images.

Place related visual terms, images and keywords close-together within the space.

In the projection stage unannotated images can be placed in the space based upon the visual terms they contain.

The placement should be such that they lie near keywords that describe them.

Semantic Spaces Conceptual Overview

Semantic Spaces Uses of the space

Once constructed, the semantic space has a number of uses:

Finding images (both annotated and unannotated) by keyword(s)/metadata.

Finding images (both annotated and unannotated) by semantically similar images.

Determining likely metadata for an image.

Examining keyword-keyword and keyword-visual term relationships.

Segmenting an image.

Semantic Spaces Searching by Keyword

SUN

TRAIN


SUN

TRAIN

Ranked Search Results:

Search for images about “SUN”


SUN

TRAIN


Search for images about “SUN”

SUN

Semantic Spaces Searching by Image

Semantic Spaces Searching by Image

Search for images like this:


Semantic Spaces Suggesting Keywords

SUN

SKY

MOUNTAINTREE

CAR


Suggested keywords:

Suggest keywords for this image: SUN

SKY

MOUNTAINTREE

CAR


Suggested keywords:

Suggest keywords for this image: SUN

SKY

MOUNTAINTREE

CAR

SKY MOUNTAIN TREE SUN CAR

CARSUN

TREE

SKY

MOUNTAIN

Semantic Spaces Experimental Retrieval Results - Corel Dataset

Colour Histograms used as visual terms (each bin representing a single term).

Standard experimental collection: 500 test images, 4500 training images.

Results quite impressive ~ comparable with Machine Translation auto-annotation technique (but remember we are using much simpler image features).

Works well for query keywords that are easily associated with a particular set of colours,

but not so well for the other keywords.


Top 15 images when querying for ‘sun’


Top 15 images when querying for ‘horse’


Top 15 images when querying for ‘foals’

Demo The K9 Retrieval System

We have built a demonstration system around the semantic space idea and applied it to images from the Kennel Club picture library (>7000 images, ∼3000 with keywords).

The system allows annotated images to be retrieved by keywords and concepts (keywords with thesaurus expansion).

Both annotated and unannotated images can also be retrieved using the semantic space and regular content-based techniques.

This brief demo will concentrate on retrieval of annotated images using keyword matching, and unannotated images using the semantic space.

Conclusions

Semantic retrieval of unannotated images is hard!

Our semantic space approach takes us some of the way, but there is still a long way to go.

Retrieval is limited by the choice of visual features, and how well those features relate to the keywords.

Questions?

spot the dog: an overview of semantic retrieval of unannotated images in the semantic gap project

Data & Analytics

visual content

visual features of images

image content

visual termsfe

model image

collection of visual

number of images

colour histograms