inferring semantic concepts from community- contributed images and noisy tags

Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua †

† National University of Singapore‡ University of Illinois at Urbana-Champaign

Outline

Motivation

Sparse-Graph based Semi-supervised Learning

Handling of Noisy Tags

Inferring Concepts in Semantic Concept Space

Experiments

Summarization and Future Work

Web Images and Metadata

Our task

No manual annotation are required.

Methods Can be Used With models:

SVM GMM …

Infer labels directly: k-NN Graph-based semi-supervised methods

Normal Graph-based Methods

A common disadvantage: Have certain parameters that require manual tuning Performance is sensitive to parameter tuning

The graphs are constructed based on visual distance Many links between samples with unrelated-concepts The label information will be propagated incorrectly.

Locally linear reconstruction: Still needs to select neighbors based on visual distance

Key Ideas of Our Approach Sparse Graph based Learning

Noisy Tag Handling

Inferring Concepts in the Concept Space

Why Sparse Graph ?

Human vision system seeks a sparse representation for the incoming image using a few visual words in a feature vocabulary. (Neural Science)

Advantages: Reduce the concept-unrelated links to avoid the

propagation of incorrect information; Practical for large-scale applications, since the sparse

representation can reduce the storage requirement and is feasible for large-scale numerical computation.

Normal Graph v.s. Sparse Graph

Normal Graph Construction.

Sparse Graph Construction.

Sparse Graph Construction The ℓ1-norm based linear reconstruction error

minimization can naturally lead to a sparse representation for the images *.

* J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(2):210–227, Feb. 2009

The sparse reconstruction can be obtained by solving the following convex optimization problem:

minw ||w||1 , s.t. x=Dw

w R∈ n : the vector of the reconstruction coefficients; x R∈ d : feature vector of the image to be reconstructed;D R∈ d*n (d < n) : a matrix formed by the feature vectors of the other images in the dataset.

Sparse Graph Construction (cont.)

Handle the noise on certain elements of x: Reformulate x = Dw+ , where Rξ ξ ∈ d is the noise term. Then :

ˆ 1ˆ ˆmin || || , . .s t w w x Bw( )[ ; ] d n dR B D I T T Tˆ [ ; ]w w

Set the edge weight of the sparse graph:

( ), ;( 1), ;

j if j iw j if j i

if j i

Semi-supervised Inference

Result:

min || || , . .N

f i ij j i i ii j i

f w f s t f y if x L

* 1u uu ul

f M M y

symmetric matrix

C I W I W

Semi-supervised Inference (cont.)

* 1u uu ul

f M M y The problem with :

Muu is typically very large for image annotation It is often computationally prohibitive to calculate its

inverse directly Iterative solution with non-negative constraints:

may not be reasonable since some samples may have negative contributions to the other samples

Solution: Reformulate: uu u ulM f M y

The generalized minimum residual method (usually abbreviated as GMRES) can be used to iteratively solve this large-scale sparse system of linear equations effectively and efficiently.

Different Types of Tags

√: correct; ?: ambiguous; m: missing

Handling of Noisy Tags We cannot assume that the training tags are fixed during

the inference process. The noisy training tags should be refined during the label

inference.

Solution: adding two regularization terms into the inferring framework to handle the noise:

2 21 2 1

ˆ ˆmin || || || || || ||l l l f f Wf f f f y

Handling of Noisy Tags (cont.) Solution:

Set the original label vector as the initial estimation of ideal label vector, that is, set , and then solve

and we can obtain a refined fl. Fix fl and solve

Use the obtained to replace the y in the previous graph-based method, and we can solve the sparse system of linear equations to infer the labels of the unlabeled samples.

ˆmin || || || ||l l f f Wf f fl̂ f y

2 2ˆ 1

ˆ ˆmin || || || ||l

f f f y

Why Concept Space? It is well-known that inferring concepts based on low-level

visual features cannot work very well due to the semantic gap.

To bridge this semantic gap Construct a concept space and then infer the semantic

concepts in this space. The semantic relations among different concepts are

inherently embedded in this space to help the concept inference.

The requirements for the concept space

Low-semantic-gap: Concepts in the constructed space should have small semantic gaps;

Informative: These concepts can cover the semantic space spanned by all useful concepts (tags), that is, the concept space should be informative;

Compact: The set including all the concepts forming the space should be compact (i.e., the dimension of the concept space is small).

Concept Space Construction Basic terms:

Ω : the set of all concepts; Θ : the constructed concept set.

Three measures: Semantic Modelability: SM(Θ) Coverage of Semantic Concept Space: CE(Θ, Ω) Compactness: CP(Θ)=1/#(Θ)

Objective:

max ( ) ( , ) (1 ) ( )SM CE CP

Solution for Concept Space Construction

Simplification: fix the size of the concept space.

max ( ) (1 ) ( , ) , . . #( )SM CE s t m

Then we can transform this maximization to a standard quadratic programming problem.

See the paper for more details.

Inferring Concepts in Concept Space

Image mapping: xi D(i)

Query concept mapping: cx Q(cx)

Ranking the given images:

1 2( ) [ ( ), ( ),..., ( )]

mc c cD i D i D i D i

1 2( ) [ ( | ), ( | ),..., ( | )]x x x x mQ c p c c p c c p c c

T( ) ( )( ( ), ( ))

|| ( ) || || ( ) ||x

Q c D isim Q c D i

Q c D i

The Whole Framework

Experiments Dataset

NUS-WIDE Lite Version (55,615 images)

Low-level Features Color Histogram (CH) and Edge Direction Histogram

(EDH), combine directly.

Evaluation 81 concepts AP and MAP

Experiments

Ex1: Comparisons among Different Learning Methods

ExperimentsEx1: Comparisons among Different Learning Methods

Experiments Ex2: Concept Inference with and without Concept Space

ExperimentsEx3: Inference with Tags vs. Inference with Ground-truth

We can achieve an MAP of 0.1598 by inference from tags in the concept space, which is comparable to the MAP obtained by inference from ground-truth of training labels.

Summarization Exploited the problem of inferring semantic

concepts from community-contributed images and their associated noisy tags.

Three points: Sparse graph based label propagation Noisy tag handling Inference in a low-semantic-gap concept space

Future Work

Training set construction from the web resource

Thanks! Questions?

inferring semantic concepts from community- contributed images and noisy tags

images labels

web images

images semantic labels

given images

set of images

huge number of images

community contributed

nontagged images

Documents

inferring cause and effect

reading skills inferring

direct noisy speech modeling for noisy-to-noisy voice

noisy portfolios

what is inferring?

inferring semantic interfaces of data structures - falk...

inferring rooted species trees

military tags sample tags description sample tags...

inferring petroleum-complex fundamentals

china noisy

inferring from the crowd

inferring object invariants

haku inc. -...

inferring phylogenetic trees

noisy neighbors

noisy numbers

inferring cancer subnetwork markers

struts tags classification of struts tags are: form tags....

inferring biologically meaningful relationships

inferring evolutionary processes from phylogenies€¦ ·...