computational intelligence in media indexing and retrieval

1

Computational Intelligence in Media Computational Intelligence in Media Indexing and RetrievalIndexing and Retrieval

Dr. Kim-Hui Yap

School of Electrical & Electronic Engineering Nanyang Technological University

April 2007

2

Goals of this TutorialGoals of this Tutorial

Introduce media indexing and retrievalmotivationbackgrounds applications

Discuss some challenges faced by media (image) retrievalDiscuss how Computational Intelligence can be used to address these issues

3

MediaMedia

This tutorial will use image indexing and retrieval to demonstrate the relevant concepts and techniquesSimilar ideas can be extended to other media retrieval systemsMedia Types

Graphics Image Video AnimationAudio Speech

4

Media Retrieval MethodologiesMedia Retrieval Methodologies

Text-based RetrievalKeyword-basedAnnotated by expert annotators

Content-based RetrievalCenter on audio-visual content analysisQuery-by-example

Metadata-based RetrievalGoogle/Yahoo search enginesWeb mining

Peer taggingAnnotation by voluntary peer users in a distributed mannerE.g. Flickr, YouTube

5

MotivationsMotivations

Motivations for development of efficient media retrieval systems:

Explosion in the volume of media data over the Internet and wireless networksIncreasing popularity of imaging devices such as digital camera, prevalence of low-cost high-capacity storage devices, and increasing proliferation of image data over communications networks.Emergence of new consumerism where media technologies meet consumers’ needs

6

ApplicationsApplications

Education (e.g. mobile learning)Military (e.g. surveillance applications) Healthcare (e.g. biomedicine)Information (e.g. media search engines)Social (e.g. MySpace, Facebook)

7

ApplicationsApplications

8

Tutorial OutlineTutorial Outline

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

9

GUIGUI

MEDIA RETRIEVALSEE HOW IT CAN BE

DONE USING:

COMPUTATIONAL INTELLIGENCE

WITHOUT THE FORCE

Find the Image…You must

25

10

IntroductionIntroduction



Conclusion

11

TextText--based Image Retrievalbased Image RetrievalTraditional text-based image retrieval search engines

Manual annotation of images by a few expertsUse keywords to denote images

Issues faced by traditional text-based image retrievalExhausting and time consumingHighly individual and subjective An image is worth a thousand words Some images are hard to describe using keywords

“Lake, house, tree, autumn, scenery…”

An image with complex textures that is hard to describe using keywords

“An image is worth a thousand words”

12

ContentContent--based Image Retrieval (CBIR)based Image Retrieval (CBIR)

ImageContent Analysis

Feature Vector

DecisionMaking

Feature Database

Query

Result Visualization

Image Database

Advantages over text-based image retrievalAlleviate intensive human laborAvoid subjectivityOffer an alternative approach to perform a query such as query-by-example (QBE)

13

Existing CBIR SystemsExisting CBIR Systems

QBIC (M. Flickher et al. 1995)MARS (Y. Rui et al. 1998)Virage (G. Amarnath et al. 1997)Photobook (A. Pentland et al. 1996)VisualSEEk (J. R. Smith et al. 1996)PicToSeek (T. Gevers et al. 1996)PicHunter (I. J. Cox et al. 2000)

14

Peer TaggingPeer Tagging--based Image Retrievalbased Image Retrieval

Distributed users (particularly Internet users) provide tags (keywords) to shared imagesLow cost, simple, easy, flexible and effort-sharingFlickr, YouTube

15

Issues and ChallengesIssues and Challenges



Conclusion

16

Challenging Problems in Image RetrievalChallenging Problems in Image Retrieval

Semantic gapSmall sample problemsImage region perceptual importancePeer tagging and knowledge propagation

17

Semantic GapSemantic Gap

Semantic gap exists between low-level visual features (color, texture, shape) and high-level human perceptionUncertainty in the correspondence between

The information that one can extract from the visual data The interpretation that the same data have for a user in a given situation

Color: redTexture: ruffled Shape: round

Users

Semantic gap

CBIR system

F lo w e r, R ose , P lan t

18

Visual Similarity Visual Similarity vsvs Semantic SimilaritySemantic Similarity

Can we address this perceptual ambiguity?

19

Small Sample ProblemSmall Sample Problem

Learning from a small number of training samples is a challenging problemImage labeling is a time consuming task and users are often unwilling to label too many imagesCan Computational Intelligence help to resolve this challenge?

20

Image Region Perceptual ImportanceImage Region Perceptual Importance

Global features sometimes fail to match users’ object-level perceptionsHow do we determine the importance of different image regions and perform comparison of image similarity based on these local regions?

21

Tagging and Knowledge PropagationTagging and Knowledge Propagation

Tagging can be cumbersome in certain cases, e.g. mobile mediaUsers sometimes will not annotate all the images Many existing image databases are not fully annotatedHow do we utilize correlation between visual content and annotated texts to propagate keywords from a small set of samples to the rest of database?

22

What is Computational Intelligence (CI)?What is Computational Intelligence (CI)?

Computational

Intelligence Techniques

Neural

Networks

Fuzzy

Logic

Evolutionary

Computations

Support Vector

Machines

A field of studies that attempts to simulate human intelligence using computational algorithms

23

Why Use Why Use Computational IntelligenceComputational Intelligence in in Image Retrieval?Image Retrieval?

Computational intelligence is used in image retrieval due to its capability in systematic signal identification, intelligent information integration, and robust optimization.We will demonstrate how they can be employed to address some of the challenges faced by image retrieval systems.

24

Structured Levels of Image Content AnalysisStructured Levels of Image Content Analysis

This tutorial will introduce the development of image retrieval systems from the low-level (color, texture, shape), medium-level (regions of attributes) to the high-level (keywords, tags).

Im agesMedia Layer

Feature Layer

Object Layer

Concept Layer

Color, texture, shape

Regions of attributes

Keywords, tags

25

Soft Relevance Framework for Fuzzy Soft Relevance Framework for Fuzzy PerceptionPerception


Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perceptionPseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

26

Interactive CBIR SystemsInteractive CBIR Systems

ImageContent Analysis

Feature Vector

DecisionMaking

Feature Database

Relevance Feedback

Result Visualization

Query

Image Database

27

RelevanceRelevance FeedbackFeedback

Why relevance feedback?Narrow the semantic gapReduce user perceptual subjectivity

Initial retrieval result

User feedback

Improved result after learning

Display

System learning

Display

User feedback

User feedback

28

Fuzzy User PerceptionFuzzy User Perception

Query

Retrieval result

Relevant or irrelevant?

29

Previous ApproachesPrevious Approaches

Binary labeling (Y. Rui et al. 1997, P. Muneesawang et al. 2002)Binary feedback: relevant or irrelevantAdvantage: simple, easy to implementDisadvantage: hard decision, crisp logic without considering the degree of relevance

Multi-level labeling (Y. Rui et al. 1998, X. S. Zhou et al. 2001)Feedback with degree of relevance: highly-relevant, relevant, no-opinion, irrelevant, highly-irrelevantAdvantage: more information due to detailed descriptionDisadvantage: cumbersome and tedious

30

Hierarchical Structure of User Information Hierarchical Structure of User Information PrioritiesPriorities

31

Computational Intelligence in Soft Computational Intelligence in Soft Relevance FrameworkRelevance Framework

Fuzzy interpretationIntegrate potential imprecision of user perception into relevance feedback

Fuzzy labelingRelevant, irrelevant and fuzzy labelsAn a posteriori probability estimator is used to evaluate the relevance of fuzzy images

Machine learningA progressive fuzzy radial basis function network (PFRBFN) is developed to learn the user information needs.

32

Schematic OverviewSchematic Overview

Fuzzy Relevance Feedback

Query Selection

Feature Extraction

Similarity Comparison

Machine Learning

Feature Extraction

Database Creation

Display and Relevance Feedback

Offline Processing

Online Querying

PFRBFN

Euclidean Distance

Color Texture

33

FlowchartFlowchart Retrieve initial results based onk-nearest neighbor (k-NN) search

User feedback of relevant, irrelevant, andfuzzy images

Two-stage clustering to determine the clustercenters for relevant, irrelevant, and fuzzy subnets

Estimation of the soft relevance membershipfunction

Construction and training of the PFRBFN

Retrieve new images from database based ontrained PFRBFN

Have terminationcriteria been

satisfied?

End

Yes

No

34

Feature ExtractionFeature ExtractionConcatenate the features to form 170-dimensional feature vectors

0 10 , 6n nθ θ θ π+= = +

Features

Color histogram

Description Dimension

HSV space is chosen, each H , S , V com ponent is uniform lyquantized into 8, 2 and 2 bins respectively

32

Color auto-correlogram 64

Color moments The first two moments (mean and standard deviation) fromthe R, G , B color channels are extracted 6

Gabor wavelet

20W avelet momentsApplying the wavelet transform to the image with a 3-leveldecomposition, the mean and the standard deviation of thetransform coefficients are used to form the feature vector

48Gabor wavelet filters spanning four scales: 0.05, 0.1,0.2 and 0.4 and six orientations:are applied to the image. The mean and standard deviationof the Gabor wavelet coefficients are used to form thefeature vector

The chessboard distance is chosen as the distance m easure.T he im age is quan tized in to 4x4x4=64 co lo rs in the R G Bspace [11].

35

Main Features of PFRBFNMain Features of PFRBFN

The PFRBFN is developed with a few main considerations:

to integrate the unique batch process of user feedbackto reduce the computational time of training process to integrate the potential fuzzy feedbacks from the users

A two-stage clustering algorithm is employed to simplify the network by reducing the number of hidden neurons. An efficient gradient descent-based learning strategy is employed to estimate the underlying network parameters by minimizing a cost function.

36

Schematic Diagram of PFRBFNSchematic Diagram of PFRBFN

1

2

R

xx

x

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

xM

( )tF x

1 ( )tF − x

( )tU x

1x

2x

Rx

( )tU x

Input layer Subnets Output layer

1

tPw

IP

tPw

1

tNw

IN

tNw

1

tFw

IF

tFw

1( ) ( ) 1,2, ,( )

0 0t t

t

F U t TF

t− + =⎧

= ⎨ =⎩

x xx

K

37

PFRBFN Training (I)PFRBFN Training (I)

1

( | ) ( )1( )( | ) ( ) ( | ) ( )

Mjm r r

jm jm r r jm i i

p Ps

M p P p Pω ω

ω ω ω ω=

=+∑

xx

x x

Two-stage clustering for PFRBFN subnet creationSubtractive clustering Fuzzy C-means (FCM)

A posteriori estimator for soft relevance membership estimation of fuzzy images

11 22

1 1( | ) exp ( ) ( )2(2 )

q q qjm q jm m m jm md qm

m

p Τ −⎡ ⎤= − − −⎢ ⎥⎣ ⎦∑

∑x x xω

πμ μ

38

PFRBFN Training (II)PFRBFN Training (II)

Kernel functionT

2

( ) ( )( , , ) exp , =1, 2, , , 1, 2, , 2( )

t tt t i i

i i ti

f t T i Iα αα α α

α

σσ

⎛ ⎞− −= − =⎜ ⎟

⎝ ⎠

x v x vx v K KΛ

Desired output0

( ) 1( )

j

t j j

j j

NY P

s F

⎧ ∈⎪= ∈⎨⎪ ∈⎩

xx x

x x

PFRBFN output

1{ , , } 1

( ) ( , , ) 1, 2, ,( )

0 0

It t t

t i i iP N F it

F w f t TF

t

α

α α αα

σ−∈ =

⎧+ =⎪= ⎨

⎪ =⎩

∑ ∑x x vx

K

Error function

( )22

1 1

1 1 ( ) ( )2 2

T TN N

t jt t j t jj j

E e Y F= =

= = −∑ ∑ x x

39

PFRBFN Training (III)PFRBFN Training (III)Gradient descent learning

(1) Weight estimation at the k-th learning iteration

(2) Center estimation at the k-th learning iteration

(3) Width estimation at the k-th learning iteration

( )2

1

1arg min( ) arg min ( ) ( )2

TN

t t j t jj

E Y F∈ ∈ =

⎛ ⎞= = −⎜ ⎟⎜ ⎠⎝

∑ x xθ Θ θ Θ

θ

1

( ) ( ) ( , ( ), ( ))( )

TNt tt

jt j i itji

E k e k f k kw k α α

α

σ=

∂= −

∂ ∑ x v

1( )( 1) ( ) , { , , }, 1,2, ,( )

t t ti i t

i

E kw k w k P N F i Iw kα α α

α

η α∂+ = − ∈ =

∂K

T

31

( ( )) ( ( ))( ) ( ) ( ) ( , ( ), ( ))( ) ( ( ))

Tt tN

j i j it t tti jt j i it t

ji i

k kE k w k e k f k kk k

α αα α α

α α

σσ σ=

− −∂= −

∂ ∑x v x v

x vΛ

3( )( 1) ( ) , { , , }, 1, 2, ,( )

t t ti i t

i

E kk k P N F i Ikα α α

α

σ σ η ασ

∂+ = − ∈ =

∂K

21

( ( ))( ) ( ) ( ) ( , ( ), ( ))( ) ( ( ))

TtN

j it t tti jt j i it t

ji i

kE k w k e k f k kk k

αα α α

α α

σσ=

−∂= −

∂ ∑x v

x vv

Λ

2( )( 1) ( ) , { , , }, 1, 2, ,( )

t t ti i t

i

E kk k P N F i Ikα α α

α

η α∂+ = − ∈ =

∂v v

vK

(4) Repeat steps (1)-(3) until convergence or a maximum number of iterations is reached

40

Graphical User InterfaceGraphical User Interface

41

Image DatabaseImage Database

100 categories, 10,000 color images

42

Objective System Performance (I)Objective System Performance (I)

Experimental Setup:100 queriesTop 25 resultsAverage precision-versus-recall graph of the PFRBFN and ARBFN (P. Muneesawang et al. 2002) methods after 5 iterations

43

Objective System Performance (II)Objective System Performance (II)

Experimental Setup:100 queriesRetrieval accuracy of the PFRBFN and ARBFN methods in top 25 results

44

Subjective System PerformanceSubjective System Performance

Experimental Setup:150 queries Retrieval accuracy of the PFRBFN, ARBFN and MARS methods in top 25 result

45

Case Study (I)Case Study (I)

. (a) Initial retrieval results based on k-NN search

Objective: Looking for some home pets, especially dogs

46

Case Study (II)Case Study (II)

(b) Retrieval results with user marking the cat images as irrelevant

47

Case Study (III)Case Study (III)

(c) Retrieval results with user marking the cat images as relevant

48

Case Study (IV)Case Study (IV)

(d) Retrieval results with user marking the cat images as fuzzy

49

PseudoPseudo--labeling and Machine Learninglabeling and Machine Learning



Conclusion

50

Small Sample ProblemSmall Sample Problem

Small sample problemRelevance feedback in CBIR systems uses only the labeled images for learningImage labeling is a time consuming task and users are often unwilling to label too many images

Challenge Learning from a small number of training samples

What is the solution?

51

Incorporating Unlabeled ImagesIncorporating Unlabeled ImagesDiscriminant Expectation Maximization (D-EM)

Incorporate unlabeled samples to estimate the underlying probability distribution Y. Wu et al. 2000

Transductive support vector machine (TSVM) Incorporate unlabeled images to train an initial SVM, followed by standard active learning T. Joachims et al. 1999, L. Wang et al. 2003

Support vector machine (SVM) with prior knowledge Incorporate prior knowledge into the SVM L. Wang et al. 2004

52

PseudoPseudo--labelinglabeling

Idea of existing pseudo-labeling methods Obtaining a large number of labeled images is labor intensive while unlabeled images are readily availableUtilize the unlabeled images to augment the available labeled images Each selected, unlabeled image is assigned a pseudo-label of either ‘relevant’ or ‘irrelevant’ based on a proposed algorithm

Shortcoming:Pseudo-labeled images are fuzzy in nature as they are not explicitly labeled by the usersThe potential imprecision embedded in their class information should be taken into consideration

53

Computational Intelligence in PseudoComputational Intelligence in Pseudo--labelinglabeling

Fuzzy support vector machine with active learningSoft membership estimation of unlabeled imagesLabel propagation Two-stage clustering of labeled samples into relevant or irrelevant classes

54

Active LearningActive Learning

Active learning is designed to achieve maximal information gain or minimize uncertainty in decision making Active learning selects the most informative samples to query the users for labelingSVM-based active learning

Select samples that are closest to the current SVM decision boundary as the most informative pointsSamples that are farthest away from the boundary and on the positive side are considered as the most relevant images

55

Proposed PseudoProposed Pseudo--labeling Framework for labeling Framework for CBIRCBIR

(1) Perform k-nearest neighbors (K-NN) search and return the top most similar images to the user for feedback

(2) User provides feedback as either relevant or irrelevant on the images, an initial SVM classifier is trained

(3) The SVM active learning is employed by selecting l unlabeled images that are closest to the current SVM decision boundary forthe user to label

(4) After the user labels the l images, add them to the previously labeled training set

(5) A two-stage clustering is performed separately on the labeled relevant and irrelevant images. The formed clusters are then used for unlabeled images selection and pseudo-label assignment

(6) A fuzzy metric is employed to evaluate the soft relevance membership of the pseudo-labeled images

(7) An FSVM is trained using a hybrid of the labeled and pseudo-labeled images

(8) Repeat steps (3)-(7) until the retrieval performance is satisfactory

0l

0l

56

Support Vector MachinesSupport Vector Machines

Map the input data into a high-dimensional feature space through a mapping function

Find the optimal separating hyperplanewith minimal classification errors in this space:

0b⋅ + =w z

57

Soft Errors of Fuzzy SVMSoft Errors of Fuzzy SVM

Figure source: Ronan CollobertDalle Molle Institute for Perceptual Artificial Intelligence (IDIAP)

58

Soft Membership Estimation of PseudoSoft Membership Estimation of Pseudo--labeled Imageslabeled Images

Τ Τ

1 Τ Τ1

min ( ) ( ) min ( ) ( )exp if 1

( ) min ( ) ( ) min ( ) ( )

0 otherwise

P Si P Si P Si P Sii i

P P Oj P Oj P Oj P Ojj j

aw

⎧ ⎛ ⎞− − − −⎪ ⎜ ⎟− <⎪ ⎜ ⎟= − − − −⎨ ⎝ ⎠⎪⎪⎩

x v x v x v x v

x x v x v x v x v

22

2

1 pseudo-label is positive1 exp( )

( )1 otherwise

1 exp( )

P

a yw

a y

⎧⎪ + −⎪= ⎨⎪⎪ +⎩

x

1 2( ) ( ) ( )P P Pg w w=x x x

Objective: find a fuzzy membership mapping that assigns a relevance value [0, 1] to each pseudo-labeled image

The membership function depends on two factors:Distance of the pseudo-label images to other label images (w1)Agreement between the predicted labels obtained during the pseudo-labeling process and using the trained SVM (w2)

59

Fuzzy Support Vector MachineFuzzy Support Vector Machine

Fuzzy membership is introduced to reflect different contributions of the input (label and pseudo-labeled images)

2

1

1m inimize 2

subject to ( ) 1 , 0, 1, ,

n

i ii

i i i i

C

y b i n

μ ξ

ξ ξ=

+

⋅ + ≥ − ≥ =

∑w

w z K

iμ

Optimization problem of FSVM can be transformed into its dual problem

1 1 1

1

1maximize ( , )2

subject to 0, 0 , 1, ,

n n n

i i j i j i ji i j

n

i i i ii

y y K

y C i n

α α α

α α μ

= = =

=

−

= ≤ ≤ =

∑ ∑∑

∑

x x

K

60

System Performance (I)System Performance (I)

Experimental Setup:100 queriesFive feedback iterations Average precision-versus-recall graphs of PLFSVM and SVM (S. Tong et al. 2001) methods after the first iteration of active learning for 0 10l =

61

System Performance (II)System Performance (II)

Retrieval accuracy of the PLFSVM and SVM methods in top 10 results for 0 10l =

62

RegionRegion--based Image Retrieval (RBIR) based Image Retrieval (RBIR) Using Neural NetworkUsing Neural Network


Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learningRegion-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

63

Challenges in RBIR SystemsChallenges in RBIR Systems

Issues in RBIR systemsChoice of image similarity metric to decrease the impact of imprecise segmentationDetermine the importance of different image regionsLearning algorithm to progressively improve the retrieval accuracy through user interaction

64

VariableVariable--Length Radial Basis Function Length Radial Basis Function Network (VLRBFN)Network (VLRBFN)

We introduce a VLRBFN to model and learn user perception of image similarity in RBIR systems

The VLRBFN is developed for variable-length region-based image representation in RBIR systemsA systematic region weight learning strategy is introducedA kernel function centered on variable-length representation (VLR) is introduced

65

General RBIR OverviewGeneral RBIR Overview

Relevance Feedback

Query Selection

Image Segmentation

Feature Extraction

Similarity Comparison

Machine Learning

Feature Extraction

Image Segmentation

Database Creation

Display and Relevance Feedback

Offline Processing

Online queryingVLRBFN

EMD distance

Mean-shift algorithm

Color Texture

66

RegionRegion--based Image Representationbased Image Representation

Mean-shift algorithmMode seeking in non-parametric distributions Partition images into homogenous regions

Extracted features for the regionsColor feature: color momentsTexture feature: wavelet moments

Perceptual determination of region weightsReflect the importance of the regionsCriteria: area, location

67

Region SegmentationRegion Segmentation

68

Region Weight LearningRegion Weight Learning

Perceptual Determination Determine the most salient region in each relevant image such that these regions jointly capture the semantic class of the user’s queryUse perceptual importance (area and location) together with feedback information

Density Estimation Utilize the set of selected regions from all the relevant images for weight estimation of other unseen regionsEmploy one-class SVM (OCSVM) to estimate their density distributionImportance of a test region is evaluated by determining how it differs from the estimated distribution

69

Image Similarity MetricImage Similarity Metric

Earth Mover’s Distance (EMD)Measure the least amount of work needed to transform one image distribution into the other Operate on variable-length representations Suitable for region-based image similarity comparisonY. Rubner et al. 2002

Variable-length representation (VLR)Each image is described by sets of weighted featuresThe size of the set may vary depending on the number of regions in different images

70

Structure of Progressive VLRBFNStructure of Progressive VLRBFN

( )tU X

1( )tF − X

( )tF X1{( , ) }mk k kW == CX

input layer hidden layer output layer

( )tU X

2tw

1tw

tiw

tcw

( , )m mWC

2 2( , )WC

1 1( , )WC

1 ( ) ( ) 1, 2, ,( )

0 0t t

t

F U t TF

t− + =⎧

= ⎨ =⎩

KX XX

71

VLRBFN TrainingVLRBFN Training (I)(I)

Kernel function 2

2

EM D ( , )( , , ) exp , =1, 2, , , 1, 2, ,

2( )

tj it t

j i i ti

f t T i cσσ

⎛ ⎞= − =⎜ ⎟⎜ ⎟

⎝ ⎠K K

X VX V

VLRBFN output

11

( ) ( , , ) 1, 2, ,( )

0 0

ct t t

t j i j i iit j

F w f t TF

t

σ−=

⎧ + =⎪= ⎨⎪ =⎩

∑ KX X VX

Error function

( )22

1 1

1 1 ( ) ( )2 2

T TN N

t jt t j t jj j

E e Y F= =

= = −∑ ∑ X X

1 =1

1 =1

( , )EMD( , ) =

m nij pi qji j

p q m niji j

f d

f=

=

∑ ∑∑ ∑

C CX X

where

72

VLRBFN Training (II)VLRBFN Training (II)Gradient descent learning

( )2

1

1arg min ( ) arg min ( ) ( )2

TN

t t j t jj

E Y F∈ ∈ =

⎛ ⎞= = −⎜ ⎟⎜ ⎠⎝

∑ X Xθ Θ θ Θ

θ

(1) Weight estimation at the l-th learning iteration

(2) Width estimation at the l-th learning iteration

1

( ) ( ) ( , ( ), ( ))( )

TNt tt

jt j i itji

E l e l f l lw l

σ=

∂= −

∂ ∑ X V

1( )( 1) ( ) , 1, 2, ,( )

t t ti i t

i

E lw l w l i cw l

η ∂+ = − =

∂K

2

31

EMD ( , ( ))( ) ( ) ( ) ( , ( ), ( ))( ) ( ( ))

TtN

j it t tti jt j i it t

ji i

lE l w l e l f l ll l

σσ σ=

∂= −

∂ ∑X V

X V

2( )( 1) ( ) , 1, 2, ,( )

t t ti i t

i

E ll l i cl

σ σ ησ

∂+ = − =

∂K

(3) Repeat steps (1)-(2) until convergence or a maximum number of iterations is reached

73

System Performance (I)System Performance (I)Experimental setup

100 queriesAverage precision-versus-recall graph in top 25 returned images after 1 and 5 feedback iterations

ComparisonGRBFN method, K. H. Yap et al. 2005RQPM and RSVMmethods, F. Jing et al. 2004

74

System Performance (II)System Performance (II)

Observations of developed methodBetter retrieval performance than both the GRBFN and RQPM methodsComparable retrieval performance to that of the RSVM method, butcomputationally more efficient

75

System Performance (III)System Performance (III)Observation

Consistently provide superior results when compared with the GRBFN and RQPM methodsAchieve comparable retrieval performance to that of the RSVM methodSeven times faster than the RSVM method

76

Peer Tagging and Knowledge PropagationPeer Tagging and Knowledge Propagation


Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learningRegion-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

77

Hybrid KeywordHybrid Keyword-- and Contentand Content--based Image based Image RetrievalRetrieval

Retrieving images via low-level features alone cannot achieve satisfactory resultsKeywords are the best descriptors for image semanticsBuild interactive CBIR systems which support high-level semantic query

Integrate the strengths of content- and keyword-based image indexing and retrieval algorithms Alleviate their respective difficulties

78

Peer Tagging in Image RetrievalPeer Tagging in Image Retrieval

79

Peer TaggingPeer TaggingMotivation

Challenge of semantic gapFull manual annotation of complete database is tedious and expensive

Peer taggingDistributed users (particularly Internet users) provide tags (keywords) to shared images

Advantages of peer taggingDistribute annotation workload to multiple voluntary contributors Low cost, simple, easy, flexible and effort-sharing

Online media (image and video) tagging systemsFlickr, YouTube

80

Issues in Peer TaggingIssues in Peer Tagging

Human-computer interface (HCI) for image tagging Tagging granularity (tagging on the segmented regions or the global images)Provide a user-friendly environment for image tagging

Tag generation and formationTag reliability (spelling mistake, inaccurate, irrelevant, or esoteric tags)Suggestive tagging Tag cloud in Flickr.com

Tag clusteringIdentify groups of tags sharing similar semantic conceptsTag clustering in Flickr.com

81

Knowledge PropagationKnowledge Propagation

Issues involved in the peer tagging servicesThe correlation between the tags and the media contents need to be exploredOnly a fraction of the images out of the complete collection is annotated

Knowledge propagationPropagate keywords from a small sample to the whole population

Keyword annotated

images

82


SendSave

Gorilla Monkey

Bear Lion

Previous More

Match Quick Facts•Gorillas are the biggest primate•Gorillas, chimpanzees and humans are classified under the same family—Hominidae•Gorillas are mainly folivorousthough they may supplement their diet with insects and small animals•Gorillas live to approximately 35-40yrs in the wild and about 50 years in captivity

Previous Share More

Photographs the animal Sends MMS to zoo’s network server

Matches the animal and retrieves more information

Shares his photograph with the zoo’s database

Q: What is the key?A: Domain information

83


Basic idea: perform analysis of media contents and their correspondence with the annotated keywords. Media content descriptors can be extracted and represented usingfeature vectors. SVM-based classifier can then be trained to map the feature vectors to soft labels reflecting the likelihood that a keyword matches an image/video. The SVM-based classifiers are trained based on the media data annotated by peer users. Once these classifiers are fully trained, they will be used to propagate keywords with associated probability to other unannotated media in the data collection.

84

ConclusionConclusion



Conclusion

85

ConclusionConclusion

This tutorial shows that computational intelligence is instrumental in addressing some challenging issues in image indexing and retrieval:

Issue: Fuzzy user perceptionTechnique: Soft relevance framework to integrate the users’fuzzy perception of visual contentsIssue: Small sample problemTechnique: Pseudo-labeling, active learning and FSVMIssue: Image region perceptual importanceTechnique: Perceptual region estimation and VLRBFN Issue: Peer tagging and knowledge propagationTechnique: Develop domain-based keyword propagation

86

Selected Recent PublicationsSelected Recent Publications

Kim-Hui Yap and Kui Wu, “A soft relevance framework in content-based image retrieval systems,” IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 12, pp. 1557–1568, Dec. 2005Kui Wu and Kim-Hui Yap, “Fuzzy SVM for content-based image retrieval - A pseudo-label support vector machine framework,” IEEE Computational Intelligence Magazine, vol. 1, pp. 10–16, May 2006K. Wu and K.-H. Yap, “Content-based image retrieval using fuzzy perceptual feedback,” accepted for publication in Multimedia Tools and Applications, Kluwer.K. Wu and K.-H. Yap, “A perceptual subjectivity notion in interactive content-based image retrieval systems,” in Intelligent Multimedia Processing with Soft Computing, Springer-Verlag, pp. 55-73, 2005.Kui Wu and Kim-Hui Yap, “Region-based image retrieval using radial basis function network,” in Proc. IEEE Int. Conf. Multimedia & Expo, Toronto, Canada, 2006

87

References (I)References (I)

M. Flickher, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” IEEE Computer, vol. 28, no. 9, pp. 23–32, Sept. 1995.Y. Rui, T. S. Huang, and S. Mehrotra, “Content-based image retrieval with relevance feedback in MARS,” in Proc. IEEE Int. Conf. Image Processing, Washington D.C., USA, pp. 815–818, 1997.G. Amarnath and J. Ramesh, “Visual information retrieval,” Communications of ACM, vol. 40, no. 5, pp. 70–79, May 1997.A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: content-based manipulation of image databases,” International Journal of Computer Vision, vol. 18, no. 3, pp. 233–254, June 1996.J. R. Smith and S. F. Chang, “VisualSEEk: a fully automated content-based image query system,” in Proc. ACM Multimedia, pp. 87–98, Nov. 1996.T. Gevers and A. W. M. Smeulders, “PicToSeek: Combining color and shape invariant features for image retrieval,” IEEE Trans. Image Processing, vol. 9, pp. 102–119, 2000.I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, “The Bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments,” IEEE Trans. Image Processing, vol. 9, no. 1, pp. 20–37, 2000.A. M. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no.12, pp. 1349–1380, Dec 2000.

88

References (II)References (II)Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: A power tool for interactive content-based image retrieval,” IEEE Trans. Circuits and Video Technology, vol. 8, no. 5, pp. 644–655, 1998.W. Y. Ma and B. S. Manjunath, “NeTra: A toolbox for navigating large image databases,” Multimedia System, vol. 7, no. 3, pp. 184–198, 1999.J. Z. Wang, J. Li, and G. Weiderhold, “SIMPLIcity: Semantic-sensitive integrated matching for picture libraries,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no.9, pp.947–963, 2001.C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image segmentation using expectation-maximization and its application to image querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp.1026–1038, 2002.Y. X. Chen and J. Z. Wang, “A region-based fuzzy feature matching approach to content-based image retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no.9, pp.1252–1267, 2002.F. Jing, M. J. Li, H. J. Zhang, and B. Zhang, “Relevance feedback in region-based image retrieval,” IEEE Trans. Circuits and Systems for Video Technology, vol.14, no. 5, pp. 672–681, 2004.R. Zhang and Z. Zhang, “Hidden semantic concept discovery in region based image retrieval,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pp. 996–1001, June 2004.Z. Stejic, Y. Takama, and K. Hirota, “Relevance feedback-based image retrieval interface incorporating region and feature saliency patterns as visualizable image similarity criteria,” IEEE Trans. Industrial Electronics, vol. 50, no. 5, pp. 839–852, 2003.

89

References (III)References (III)P. Muneesawang and L. Guan, “Automatic machine interactions for content-based image retrieval using a self-organizing tree map architecture,” IEEE Trans. Neural Networks, vol. 13, no. 4, pp. 821–834, July 2002.Y. Wu, Q. Tian, and T. S. Huang, “Discriminant-EM algorithm with application to image retrieval,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, South Carolina, pp. 222–227, 2000.L. Wang and K. L. Chan, “Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pp. 629–634, 2003.S. Tong and E. Chang, “Support vector machine active leaning for image retrieval,” in Proc. ACM Int. Conf. Multimedia, Ottawa Canada, pp. 107–118, 2001.Y. Rubner, C. Tomasi, and L. Guibas, “The Earth Mover’s Distance as a metric for image retrieval,” International Journal of Computer Vision, vol. 40, pp. 99–123, 2000.J. Jeon, V. Lavrenko and R. Manmatha, “Automatic image annotation and retrieval using cross-media relevance models,” in Proc. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 119–126, 2003.E. Chang, G. Kingshy, G. Sychay, and G. Wu, “CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machines,” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, no.1, pp. 26–38, Jan. 2003.V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.S. Haykin, Neural Networks a Comprehensive Foundation. Upper Saddle River, NJ: Prentice-Hall, 1999.

90

THANK YOU! ☺

Questions?

computational intelligence in media indexing and retrieval

Documents