computational intelligence in media indexing and retrieval

90
1 Computational Intelligence in Media Computational Intelligence in Media Indexing and Retrieval Indexing and Retrieval Dr. Kim-Hui Yap School of Electrical & Electronic Engineering Nanyang Technological University April 2007

Upload: others

Post on 10-Jul-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Intelligence in Media Indexing and Retrieval

1

Computational Intelligence in Media Computational Intelligence in Media Indexing and RetrievalIndexing and Retrieval

Dr. Kim-Hui Yap

School of Electrical & Electronic Engineering Nanyang Technological University

April 2007

Page 2: Computational Intelligence in Media Indexing and Retrieval

2

Goals of this TutorialGoals of this Tutorial

Introduce media indexing and retrievalmotivationbackgrounds applications

Discuss some challenges faced by media (image) retrievalDiscuss how Computational Intelligence can be used to address these issues

Page 3: Computational Intelligence in Media Indexing and Retrieval

3

MediaMedia

This tutorial will use image indexing and retrieval to demonstrate the relevant concepts and techniquesSimilar ideas can be extended to other media retrieval systemsMedia Types

Graphics Image Video AnimationAudio Speech

Page 4: Computational Intelligence in Media Indexing and Retrieval

4

Media Retrieval MethodologiesMedia Retrieval Methodologies

Text-based RetrievalKeyword-basedAnnotated by expert annotators

Content-based RetrievalCenter on audio-visual content analysisQuery-by-example

Metadata-based RetrievalGoogle/Yahoo search enginesWeb mining

Peer taggingAnnotation by voluntary peer users in a distributed mannerE.g. Flickr, YouTube

Page 5: Computational Intelligence in Media Indexing and Retrieval

5

MotivationsMotivations

Motivations for development of efficient media retrieval systems:

Explosion in the volume of media data over the Internet and wireless networksIncreasing popularity of imaging devices such as digital camera, prevalence of low-cost high-capacity storage devices, and increasing proliferation of image data over communications networks.Emergence of new consumerism where media technologies meet consumers’ needs

Page 6: Computational Intelligence in Media Indexing and Retrieval

6

ApplicationsApplications

Education (e.g. mobile learning)Military (e.g. surveillance applications) Healthcare (e.g. biomedicine)Information (e.g. media search engines)Social (e.g. MySpace, Facebook)

Page 7: Computational Intelligence in Media Indexing and Retrieval

7

ApplicationsApplications

Page 8: Computational Intelligence in Media Indexing and Retrieval

8

Tutorial OutlineTutorial Outline

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 9: Computational Intelligence in Media Indexing and Retrieval

9

GUIGUI

MEDIA RETRIEVALSEE HOW IT CAN BE

DONE USING:

COMPUTATIONAL INTELLIGENCE

WITHOUT THE FORCE

Find the Image…You must

25

Page 10: Computational Intelligence in Media Indexing and Retrieval

10

IntroductionIntroduction

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 11: Computational Intelligence in Media Indexing and Retrieval

11

TextText--based Image Retrievalbased Image RetrievalTraditional text-based image retrieval search engines

Manual annotation of images by a few expertsUse keywords to denote images

Issues faced by traditional text-based image retrievalExhausting and time consumingHighly individual and subjective An image is worth a thousand words Some images are hard to describe using keywords

“Lake, house, tree, autumn, scenery…”

An image with complex textures that is hard to describe using keywords

“An image is worth a thousand words”

Page 12: Computational Intelligence in Media Indexing and Retrieval

12

ContentContent--based Image Retrieval (CBIR)based Image Retrieval (CBIR)

ImageContent Analysis

Feature Vector

DecisionMaking

Feature Database

Query

Result Visualization

Image Database

Advantages over text-based image retrievalAlleviate intensive human laborAvoid subjectivityOffer an alternative approach to perform a query such as query-by-example (QBE)

Page 13: Computational Intelligence in Media Indexing and Retrieval

13

Existing CBIR SystemsExisting CBIR Systems

QBIC (M. Flickher et al. 1995)MARS (Y. Rui et al. 1998)Virage (G. Amarnath et al. 1997)Photobook (A. Pentland et al. 1996)VisualSEEk (J. R. Smith et al. 1996)PicToSeek (T. Gevers et al. 1996)PicHunter (I. J. Cox et al. 2000)

Page 14: Computational Intelligence in Media Indexing and Retrieval

14

Peer TaggingPeer Tagging--based Image Retrievalbased Image Retrieval

Distributed users (particularly Internet users) provide tags (keywords) to shared imagesLow cost, simple, easy, flexible and effort-sharingFlickr, YouTube

Page 15: Computational Intelligence in Media Indexing and Retrieval

15

Issues and ChallengesIssues and Challenges

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 16: Computational Intelligence in Media Indexing and Retrieval

16

Challenging Problems in Image RetrievalChallenging Problems in Image Retrieval

Semantic gapSmall sample problemsImage region perceptual importancePeer tagging and knowledge propagation

Page 17: Computational Intelligence in Media Indexing and Retrieval

17

Semantic GapSemantic Gap

Semantic gap exists between low-level visual features (color, texture, shape) and high-level human perceptionUncertainty in the correspondence between

The information that one can extract from the visual data The interpretation that the same data have for a user in a given situation

Color: redTexture: ruffled Shape: round

Users

Semantic gap

CBIR system

F lo w e r, R ose , P lan t

Page 18: Computational Intelligence in Media Indexing and Retrieval

18

Visual Similarity Visual Similarity vsvs Semantic SimilaritySemantic Similarity

Can we address this perceptual ambiguity?

Page 19: Computational Intelligence in Media Indexing and Retrieval

19

Small Sample ProblemSmall Sample Problem

Learning from a small number of training samples is a challenging problemImage labeling is a time consuming task and users are often unwilling to label too many imagesCan Computational Intelligence help to resolve this challenge?

Page 20: Computational Intelligence in Media Indexing and Retrieval

20

Image Region Perceptual ImportanceImage Region Perceptual Importance

Global features sometimes fail to match users’ object-level perceptionsHow do we determine the importance of different image regions and perform comparison of image similarity based on these local regions?

Page 21: Computational Intelligence in Media Indexing and Retrieval

21

Tagging and Knowledge PropagationTagging and Knowledge Propagation

Tagging can be cumbersome in certain cases, e.g. mobile mediaUsers sometimes will not annotate all the images Many existing image databases are not fully annotatedHow do we utilize correlation between visual content and annotated texts to propagate keywords from a small set of samples to the rest of database?

Page 22: Computational Intelligence in Media Indexing and Retrieval

22

What is Computational Intelligence (CI)?What is Computational Intelligence (CI)?

Computational

Intelligence Techniques

Neural

Networks

Fuzzy

Logic

Evolutionary

Computations

Support Vector

Machines

A field of studies that attempts to simulate human intelligence using computational algorithms

Page 23: Computational Intelligence in Media Indexing and Retrieval

23

Why Use Why Use Computational IntelligenceComputational Intelligence in in Image Retrieval?Image Retrieval?

Computational intelligence is used in image retrieval due to its capability in systematic signal identification, intelligent information integration, and robust optimization.We will demonstrate how they can be employed to address some of the challenges faced by image retrieval systems.

Page 24: Computational Intelligence in Media Indexing and Retrieval

24

Structured Levels of Image Content AnalysisStructured Levels of Image Content Analysis

This tutorial will introduce the development of image retrieval systems from the low-level (color, texture, shape), medium-level (regions of attributes) to the high-level (keywords, tags).

Im agesMedia Layer

Feature Layer

Object Layer

Concept Layer

Color, texture, shape

Regions of attributes

Keywords, tags

Page 25: Computational Intelligence in Media Indexing and Retrieval

25

Soft Relevance Framework for Fuzzy Soft Relevance Framework for Fuzzy PerceptionPerception

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perceptionPseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 26: Computational Intelligence in Media Indexing and Retrieval

26

Interactive CBIR SystemsInteractive CBIR Systems

ImageContent Analysis

Feature Vector

DecisionMaking

Feature Database

Relevance Feedback

Result Visualization

Query

Image Database

Page 27: Computational Intelligence in Media Indexing and Retrieval

27

RelevanceRelevance FeedbackFeedback

Why relevance feedback?Narrow the semantic gapReduce user perceptual subjectivity

Initial retrieval result

User feedback

Improved result after learning

Display

System learning

Display

User feedback

User feedback

Page 28: Computational Intelligence in Media Indexing and Retrieval

28

Fuzzy User PerceptionFuzzy User Perception

Query

Retrieval result

Relevant or irrelevant?

Page 29: Computational Intelligence in Media Indexing and Retrieval

29

Previous ApproachesPrevious Approaches

Binary labeling (Y. Rui et al. 1997, P. Muneesawang et al. 2002)Binary feedback: relevant or irrelevantAdvantage: simple, easy to implementDisadvantage: hard decision, crisp logic without considering the degree of relevance

Multi-level labeling (Y. Rui et al. 1998, X. S. Zhou et al. 2001)Feedback with degree of relevance: highly-relevant, relevant, no-opinion, irrelevant, highly-irrelevantAdvantage: more information due to detailed descriptionDisadvantage: cumbersome and tedious

Page 30: Computational Intelligence in Media Indexing and Retrieval

30

Hierarchical Structure of User Information Hierarchical Structure of User Information PrioritiesPriorities

Page 31: Computational Intelligence in Media Indexing and Retrieval

31

Computational Intelligence in Soft Computational Intelligence in Soft Relevance FrameworkRelevance Framework

Fuzzy interpretationIntegrate potential imprecision of user perception into relevance feedback

Fuzzy labelingRelevant, irrelevant and fuzzy labelsAn a posteriori probability estimator is used to evaluate the relevance of fuzzy images

Machine learningA progressive fuzzy radial basis function network (PFRBFN) is developed to learn the user information needs.

Page 32: Computational Intelligence in Media Indexing and Retrieval

32

Schematic OverviewSchematic Overview

Fuzzy Relevance Feedback

Query Selection

Feature Extraction

Similarity Comparison

Machine Learning

Feature Extraction

Database Creation

Display and Relevance Feedback

Offline Processing

Online Querying

PFRBFN

Euclidean Distance

Color Texture

Page 33: Computational Intelligence in Media Indexing and Retrieval

33

FlowchartFlowchart Retrieve initial results based onk-nearest neighbor (k-NN) search

User feedback of relevant, irrelevant, andfuzzy images

Two-stage clustering to determine the clustercenters for relevant, irrelevant, and fuzzy subnets

Estimation of the soft relevance membershipfunction

Construction and training of the PFRBFN

Retrieve new images from database based ontrained PFRBFN

Have terminationcriteria been

satisfied?

End

Yes

No

Page 34: Computational Intelligence in Media Indexing and Retrieval

34

Feature ExtractionFeature ExtractionConcatenate the features to form 170-dimensional feature vectors

0 10 , 6n nθ θ θ π+= = +

Features

Color histogram

Description Dimension

HSV space is chosen, each H , S , V com ponent is uniform lyquantized into 8, 2 and 2 bins respectively

32

Color auto-correlogram 64

Color moments The first two moments (mean and standard deviation) fromthe R, G , B color channels are extracted 6

Gabor wavelet

20W avelet momentsApplying the wavelet transform to the image with a 3-leveldecomposition, the mean and the standard deviation of thetransform coefficients are used to form the feature vector

48Gabor wavelet filters spanning four scales: 0.05, 0.1,0.2 and 0.4 and six orientations:are applied to the image. The mean and standard deviationof the Gabor wavelet coefficients are used to form thefeature vector

The chessboard distance is chosen as the distance m easure.T he im age is quan tized in to 4x4x4=64 co lo rs in the R G Bspace [11].

Page 35: Computational Intelligence in Media Indexing and Retrieval

35

Main Features of PFRBFNMain Features of PFRBFN

The PFRBFN is developed with a few main considerations:

to integrate the unique batch process of user feedbackto reduce the computational time of training process to integrate the potential fuzzy feedbacks from the users

A two-stage clustering algorithm is employed to simplify the network by reducing the number of hidden neurons. An efficient gradient descent-based learning strategy is employed to estimate the underlying network parameters by minimizing a cost function.

Page 36: Computational Intelligence in Media Indexing and Retrieval

36

Schematic Diagram of PFRBFNSchematic Diagram of PFRBFN

1

2

R

xx

x

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

xM

( )tF x

1 ( )tF − x

( )tU x

1x

2x

Rx

( )tU x

Input layer Subnets Output layer

1

tPw

IP

tPw

1

tNw

IN

tNw

1

tFw

IF

tFw

1( ) ( ) 1,2, ,( )

0 0t t

t

F U t TF

t− + =⎧

= ⎨ =⎩

x xx

K

Page 37: Computational Intelligence in Media Indexing and Retrieval

37

PFRBFN Training (I)PFRBFN Training (I)

1

( | ) ( )1( )( | ) ( ) ( | ) ( )

Mjm r r

jm jm r r jm i i

p Ps

M p P p Pω ω

ω ω ω ω=

=+∑

xx

x x

Two-stage clustering for PFRBFN subnet creationSubtractive clustering Fuzzy C-means (FCM)

A posteriori estimator for soft relevance membership estimation of fuzzy images

11 22

1 1( | ) exp ( ) ( )2(2 )

q q qjm q jm m m jm md qm

m

p Τ −⎡ ⎤= − − −⎢ ⎥⎣ ⎦∑

∑x x xω

πμ μ

Page 38: Computational Intelligence in Media Indexing and Retrieval

38

PFRBFN Training (II)PFRBFN Training (II)

Kernel functionT

2

( ) ( )( , , ) exp , =1, 2, , , 1, 2, , 2( )

t tt t i i

i i ti

f t T i Iα αα α α

α

σσ

⎛ ⎞− −= − =⎜ ⎟

⎝ ⎠

x v x vx v K KΛ

Desired output0

( ) 1( )

j

t j j

j j

NY P

s F

⎧ ∈⎪= ∈⎨⎪ ∈⎩

xx x

x x

PFRBFN output

1{ , , } 1

( ) ( , , ) 1, 2, ,( )

0 0

It t t

t i i iP N F it

F w f t TF

t

α

α α αα

σ−∈ =

⎧+ =⎪= ⎨

⎪ =⎩

∑ ∑x x vx

K

Error function

( )22

1 1

1 1 ( ) ( )2 2

T TN N

t jt t j t jj j

E e Y F= =

= = −∑ ∑ x x

Page 39: Computational Intelligence in Media Indexing and Retrieval

39

PFRBFN Training (III)PFRBFN Training (III)Gradient descent learning

(1) Weight estimation at the k-th learning iteration

(2) Center estimation at the k-th learning iteration

(3) Width estimation at the k-th learning iteration

( )2

1

1arg min( ) arg min ( ) ( )2

TN

t t j t jj

E Y F∈ ∈ =

⎛ ⎞= = −⎜ ⎟⎜ ⎠⎝

∑ x xθ Θ θ Θ

θ

1

( ) ( ) ( , ( ), ( ))( )

TNt tt

jt j i itji

E k e k f k kw k α α

α

σ=

∂= −

∂ ∑ x v

1( )( 1) ( ) , { , , }, 1,2, ,( )

t t ti i t

i

E kw k w k P N F i Iw kα α α

α

η α∂+ = − ∈ =

∂K

T

31

( ( )) ( ( ))( ) ( ) ( ) ( , ( ), ( ))( ) ( ( ))

Tt tN

j i j it t tti jt j i it t

ji i

k kE k w k e k f k kk k

α αα α α

α α

σσ σ=

− −∂= −

∂ ∑x v x v

x vΛ

3( )( 1) ( ) , { , , }, 1, 2, ,( )

t t ti i t

i

E kk k P N F i Ikα α α

α

σ σ η ασ

∂+ = − ∈ =

∂K

21

( ( ))( ) ( ) ( ) ( , ( ), ( ))( ) ( ( ))

TtN

j it t tti jt j i it t

ji i

kE k w k e k f k kk k

αα α α

α α

σσ=

−∂= −

∂ ∑x v

x vv

Λ

2( )( 1) ( ) , { , , }, 1, 2, ,( )

t t ti i t

i

E kk k P N F i Ikα α α

α

η α∂+ = − ∈ =

∂v v

vK

(4) Repeat steps (1)-(3) until convergence or a maximum number of iterations is reached

Page 40: Computational Intelligence in Media Indexing and Retrieval

40

Graphical User InterfaceGraphical User Interface

Page 41: Computational Intelligence in Media Indexing and Retrieval

41

Image DatabaseImage Database

100 categories, 10,000 color images

Page 42: Computational Intelligence in Media Indexing and Retrieval

42

Objective System Performance (I)Objective System Performance (I)

Experimental Setup:100 queriesTop 25 resultsAverage precision-versus-recall graph of the PFRBFN and ARBFN (P. Muneesawang et al. 2002) methods after 5 iterations

Page 43: Computational Intelligence in Media Indexing and Retrieval

43

Objective System Performance (II)Objective System Performance (II)

Experimental Setup:100 queriesRetrieval accuracy of the PFRBFN and ARBFN methods in top 25 results

Page 44: Computational Intelligence in Media Indexing and Retrieval

44

Subjective System PerformanceSubjective System Performance

Experimental Setup:150 queries Retrieval accuracy of the PFRBFN, ARBFN and MARS methods in top 25 result

Page 45: Computational Intelligence in Media Indexing and Retrieval

45

Case Study (I)Case Study (I)

. (a) Initial retrieval results based on k-NN search

Objective: Looking for some home pets, especially dogs

Page 46: Computational Intelligence in Media Indexing and Retrieval

46

Case Study (II)Case Study (II)

(b) Retrieval results with user marking the cat images as irrelevant

Page 47: Computational Intelligence in Media Indexing and Retrieval

47

Case Study (III)Case Study (III)

(c) Retrieval results with user marking the cat images as relevant

Page 48: Computational Intelligence in Media Indexing and Retrieval

48

Case Study (IV)Case Study (IV)

(d) Retrieval results with user marking the cat images as fuzzy

Page 49: Computational Intelligence in Media Indexing and Retrieval

49

PseudoPseudo--labeling and Machine Learninglabeling and Machine Learning

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 50: Computational Intelligence in Media Indexing and Retrieval

50

Small Sample ProblemSmall Sample Problem

Small sample problemRelevance feedback in CBIR systems uses only the labeled images for learningImage labeling is a time consuming task and users are often unwilling to label too many images

Challenge Learning from a small number of training samples

What is the solution?

Page 51: Computational Intelligence in Media Indexing and Retrieval

51

Incorporating Unlabeled ImagesIncorporating Unlabeled ImagesDiscriminant Expectation Maximization (D-EM)

Incorporate unlabeled samples to estimate the underlying probability distribution Y. Wu et al. 2000

Transductive support vector machine (TSVM) Incorporate unlabeled images to train an initial SVM, followed by standard active learning T. Joachims et al. 1999, L. Wang et al. 2003

Support vector machine (SVM) with prior knowledge Incorporate prior knowledge into the SVM L. Wang et al. 2004

Page 52: Computational Intelligence in Media Indexing and Retrieval

52

PseudoPseudo--labelinglabeling

Idea of existing pseudo-labeling methods Obtaining a large number of labeled images is labor intensive while unlabeled images are readily availableUtilize the unlabeled images to augment the available labeled images Each selected, unlabeled image is assigned a pseudo-label of either ‘relevant’ or ‘irrelevant’ based on a proposed algorithm

Shortcoming:Pseudo-labeled images are fuzzy in nature as they are not explicitly labeled by the usersThe potential imprecision embedded in their class information should be taken into consideration

Page 53: Computational Intelligence in Media Indexing and Retrieval

53

Computational Intelligence in PseudoComputational Intelligence in Pseudo--labelinglabeling

Fuzzy support vector machine with active learningSoft membership estimation of unlabeled imagesLabel propagation Two-stage clustering of labeled samples into relevant or irrelevant classes

Page 54: Computational Intelligence in Media Indexing and Retrieval

54

Active LearningActive Learning

Active learning is designed to achieve maximal information gain or minimize uncertainty in decision making Active learning selects the most informative samples to query the users for labelingSVM-based active learning

Select samples that are closest to the current SVM decision boundary as the most informative pointsSamples that are farthest away from the boundary and on the positive side are considered as the most relevant images

Page 55: Computational Intelligence in Media Indexing and Retrieval

55

Proposed PseudoProposed Pseudo--labeling Framework for labeling Framework for CBIRCBIR

(1) Perform k-nearest neighbors (K-NN) search and return the top most similar images to the user for feedback

(2) User provides feedback as either relevant or irrelevant on the images, an initial SVM classifier is trained

(3) The SVM active learning is employed by selecting l unlabeled images that are closest to the current SVM decision boundary forthe user to label

(4) After the user labels the l images, add them to the previously labeled training set

(5) A two-stage clustering is performed separately on the labeled relevant and irrelevant images. The formed clusters are then used for unlabeled images selection and pseudo-label assignment

(6) A fuzzy metric is employed to evaluate the soft relevance membership of the pseudo-labeled images

(7) An FSVM is trained using a hybrid of the labeled and pseudo-labeled images

(8) Repeat steps (3)-(7) until the retrieval performance is satisfactory

0l

0l

Page 56: Computational Intelligence in Media Indexing and Retrieval

56

Support Vector MachinesSupport Vector Machines

Map the input data into a high-dimensional feature space through a mapping function

Find the optimal separating hyperplanewith minimal classification errors in this space:

0b⋅ + =w z

Page 57: Computational Intelligence in Media Indexing and Retrieval

57

Soft Errors of Fuzzy SVMSoft Errors of Fuzzy SVM

Figure source: Ronan CollobertDalle Molle Institute for Perceptual Artificial Intelligence (IDIAP)

Page 58: Computational Intelligence in Media Indexing and Retrieval

58

Soft Membership Estimation of PseudoSoft Membership Estimation of Pseudo--labeled Imageslabeled Images

Τ Τ

1 Τ Τ1

min ( ) ( ) min ( ) ( )exp if 1

( ) min ( ) ( ) min ( ) ( )

0 otherwise

P Si P Si P Si P Sii i

P P Oj P Oj P Oj P Ojj j

aw

⎧ ⎛ ⎞− − − −⎪ ⎜ ⎟− <⎪ ⎜ ⎟= − − − −⎨ ⎝ ⎠⎪⎪⎩

x v x v x v x v

x x v x v x v x v

22

2

1 pseudo-label is positive1 exp( )

( )1 otherwise

1 exp( )

P

a yw

a y

⎧⎪ + −⎪= ⎨⎪⎪ +⎩

x

1 2( ) ( ) ( )P P Pg w w=x x x

Objective: find a fuzzy membership mapping that assigns a relevance value [0, 1] to each pseudo-labeled image

The membership function depends on two factors:Distance of the pseudo-label images to other label images (w1)Agreement between the predicted labels obtained during the pseudo-labeling process and using the trained SVM (w2)

Page 59: Computational Intelligence in Media Indexing and Retrieval

59

Fuzzy Support Vector MachineFuzzy Support Vector Machine

Fuzzy membership is introduced to reflect different contributions of the input (label and pseudo-labeled images)

2

1

1m inimize 2

subject to ( ) 1 , 0, 1, ,

n

i ii

i i i i

C

y b i n

μ ξ

ξ ξ=

+

⋅ + ≥ − ≥ =

∑w

w z K

Optimization problem of FSVM can be transformed into its dual problem

1 1 1

1

1maximize ( , )2

subject to 0, 0 , 1, ,

n n n

i i j i j i ji i j

n

i i i ii

y y K

y C i n

α α α

α α μ

= = =

=

= ≤ ≤ =

∑ ∑∑

x x

K

Page 60: Computational Intelligence in Media Indexing and Retrieval

60

System Performance (I)System Performance (I)

Experimental Setup:100 queriesFive feedback iterations Average precision-versus-recall graphs of PLFSVM and SVM (S. Tong et al. 2001) methods after the first iteration of active learning for 0 10l =

Page 61: Computational Intelligence in Media Indexing and Retrieval

61

System Performance (II)System Performance (II)

Retrieval accuracy of the PLFSVM and SVM methods in top 10 results for 0 10l =

Page 62: Computational Intelligence in Media Indexing and Retrieval

62

RegionRegion--based Image Retrieval (RBIR) based Image Retrieval (RBIR) Using Neural NetworkUsing Neural Network

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learningRegion-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 63: Computational Intelligence in Media Indexing and Retrieval

63

Challenges in RBIR SystemsChallenges in RBIR Systems

Issues in RBIR systemsChoice of image similarity metric to decrease the impact of imprecise segmentationDetermine the importance of different image regionsLearning algorithm to progressively improve the retrieval accuracy through user interaction

Page 64: Computational Intelligence in Media Indexing and Retrieval

64

VariableVariable--Length Radial Basis Function Length Radial Basis Function Network (VLRBFN)Network (VLRBFN)

We introduce a VLRBFN to model and learn user perception of image similarity in RBIR systems

The VLRBFN is developed for variable-length region-based image representation in RBIR systemsA systematic region weight learning strategy is introducedA kernel function centered on variable-length representation (VLR) is introduced

Page 65: Computational Intelligence in Media Indexing and Retrieval

65

General RBIR OverviewGeneral RBIR Overview

Relevance Feedback

Query Selection

Image Segmentation

Feature Extraction

Similarity Comparison

Machine Learning

Feature Extraction

Image Segmentation

Database Creation

Display and Relevance Feedback

Offline Processing

Online queryingVLRBFN

EMD distance

Mean-shift algorithm

Color Texture

Page 66: Computational Intelligence in Media Indexing and Retrieval

66

RegionRegion--based Image Representationbased Image Representation

Mean-shift algorithmMode seeking in non-parametric distributions Partition images into homogenous regions

Extracted features for the regionsColor feature: color momentsTexture feature: wavelet moments

Perceptual determination of region weightsReflect the importance of the regionsCriteria: area, location

Page 67: Computational Intelligence in Media Indexing and Retrieval

67

Region SegmentationRegion Segmentation

Page 68: Computational Intelligence in Media Indexing and Retrieval

68

Region Weight LearningRegion Weight Learning

Perceptual Determination Determine the most salient region in each relevant image such that these regions jointly capture the semantic class of the user’s queryUse perceptual importance (area and location) together with feedback information

Density Estimation Utilize the set of selected regions from all the relevant images for weight estimation of other unseen regionsEmploy one-class SVM (OCSVM) to estimate their density distributionImportance of a test region is evaluated by determining how it differs from the estimated distribution

Page 69: Computational Intelligence in Media Indexing and Retrieval

69

Image Similarity MetricImage Similarity Metric

Earth Mover’s Distance (EMD)Measure the least amount of work needed to transform one image distribution into the other Operate on variable-length representations Suitable for region-based image similarity comparisonY. Rubner et al. 2002

Variable-length representation (VLR)Each image is described by sets of weighted featuresThe size of the set may vary depending on the number of regions in different images

Page 70: Computational Intelligence in Media Indexing and Retrieval

70

Structure of Progressive VLRBFNStructure of Progressive VLRBFN

( )tU X

1( )tF − X

( )tF X1{( , ) }mk k kW == CX

input layer hidden layer output layer

( )tU X

2tw

1tw

tiw

tcw

( , )m mWC

2 2( , )WC

1 1( , )WC

1 ( ) ( ) 1, 2, ,( )

0 0t t

t

F U t TF

t− + =⎧

= ⎨ =⎩

KX XX

Page 71: Computational Intelligence in Media Indexing and Retrieval

71

VLRBFN TrainingVLRBFN Training (I)(I)

Kernel function 2

2

EM D ( , )( , , ) exp , =1, 2, , , 1, 2, ,

2( )

tj it t

j i i ti

f t T i cσσ

⎛ ⎞= − =⎜ ⎟⎜ ⎟

⎝ ⎠K K

X VX V

VLRBFN output

11

( ) ( , , ) 1, 2, ,( )

0 0

ct t t

t j i j i iit j

F w f t TF

t

σ−=

⎧ + =⎪= ⎨⎪ =⎩

∑ KX X VX

Error function

( )22

1 1

1 1 ( ) ( )2 2

T TN N

t jt t j t jj j

E e Y F= =

= = −∑ ∑ X X

1 =1

1 =1

( , )EMD( , ) =

m nij pi qji j

p q m niji j

f d

f=

=

∑ ∑∑ ∑

C CX X

where

Page 72: Computational Intelligence in Media Indexing and Retrieval

72

VLRBFN Training (II)VLRBFN Training (II)Gradient descent learning

( )2

1

1arg min ( ) arg min ( ) ( )2

TN

t t j t jj

E Y F∈ ∈ =

⎛ ⎞= = −⎜ ⎟⎜ ⎠⎝

∑ X Xθ Θ θ Θ

θ

(1) Weight estimation at the l-th learning iteration

(2) Width estimation at the l-th learning iteration

1

( ) ( ) ( , ( ), ( ))( )

TNt tt

jt j i itji

E l e l f l lw l

σ=

∂= −

∂ ∑ X V

1( )( 1) ( ) , 1, 2, ,( )

t t ti i t

i

E lw l w l i cw l

η ∂+ = − =

∂K

2

31

EMD ( , ( ))( ) ( ) ( ) ( , ( ), ( ))( ) ( ( ))

TtN

j it t tti jt j i it t

ji i

lE l w l e l f l ll l

σσ σ=

∂= −

∂ ∑X V

X V

2( )( 1) ( ) , 1, 2, ,( )

t t ti i t

i

E ll l i cl

σ σ ησ

∂+ = − =

∂K

(3) Repeat steps (1)-(2) until convergence or a maximum number of iterations is reached

Page 73: Computational Intelligence in Media Indexing and Retrieval

73

System Performance (I)System Performance (I)Experimental setup

100 queriesAverage precision-versus-recall graph in top 25 returned images after 1 and 5 feedback iterations

ComparisonGRBFN method, K. H. Yap et al. 2005RQPM and RSVMmethods, F. Jing et al. 2004

Page 74: Computational Intelligence in Media Indexing and Retrieval

74

System Performance (II)System Performance (II)

Observations of developed methodBetter retrieval performance than both the GRBFN and RQPM methodsComparable retrieval performance to that of the RSVM method, butcomputationally more efficient

Page 75: Computational Intelligence in Media Indexing and Retrieval

75

System Performance (III)System Performance (III)Observation

Consistently provide superior results when compared with the GRBFN and RQPM methodsAchieve comparable retrieval performance to that of the RSVM methodSeven times faster than the RSVM method

Page 76: Computational Intelligence in Media Indexing and Retrieval

76

Peer Tagging and Knowledge PropagationPeer Tagging and Knowledge Propagation

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learningRegion-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 77: Computational Intelligence in Media Indexing and Retrieval

77

Hybrid KeywordHybrid Keyword-- and Contentand Content--based Image based Image RetrievalRetrieval

Retrieving images via low-level features alone cannot achieve satisfactory resultsKeywords are the best descriptors for image semanticsBuild interactive CBIR systems which support high-level semantic query

Integrate the strengths of content- and keyword-based image indexing and retrieval algorithms Alleviate their respective difficulties

Page 78: Computational Intelligence in Media Indexing and Retrieval

78

Peer Tagging in Image RetrievalPeer Tagging in Image Retrieval

Page 79: Computational Intelligence in Media Indexing and Retrieval

79

Peer TaggingPeer TaggingMotivation

Challenge of semantic gapFull manual annotation of complete database is tedious and expensive

Peer taggingDistributed users (particularly Internet users) provide tags (keywords) to shared images

Advantages of peer taggingDistribute annotation workload to multiple voluntary contributors Low cost, simple, easy, flexible and effort-sharing

Online media (image and video) tagging systemsFlickr, YouTube

Page 80: Computational Intelligence in Media Indexing and Retrieval

80

Issues in Peer TaggingIssues in Peer Tagging

Human-computer interface (HCI) for image tagging Tagging granularity (tagging on the segmented regions or the global images)Provide a user-friendly environment for image tagging

Tag generation and formationTag reliability (spelling mistake, inaccurate, irrelevant, or esoteric tags)Suggestive tagging Tag cloud in Flickr.com

Tag clusteringIdentify groups of tags sharing similar semantic conceptsTag clustering in Flickr.com

Page 81: Computational Intelligence in Media Indexing and Retrieval

81

Knowledge PropagationKnowledge Propagation

Issues involved in the peer tagging servicesThe correlation between the tags and the media contents need to be exploredOnly a fraction of the images out of the complete collection is annotated

Knowledge propagationPropagate keywords from a small sample to the whole population

Keyword annotated

images

Page 82: Computational Intelligence in Media Indexing and Retrieval

82

Knowledge PropagationKnowledge Propagation

SendSave

Gorilla Monkey

Bear Lion

Previous More

Match Quick Facts•Gorillas are the biggest primate•Gorillas, chimpanzees and humans are classified under the same family—Hominidae•Gorillas are mainly folivorousthough they may supplement their diet with insects and small animals•Gorillas live to approximately 35-40yrs in the wild and about 50 years in captivity

Previous Share More

Photographs the animal Sends MMS to zoo’s network server

Matches the animal and retrieves more information

Shares his photograph with the zoo’s database

Q: What is the key?A: Domain information

Page 83: Computational Intelligence in Media Indexing and Retrieval

83

Knowledge PropagationKnowledge Propagation

Basic idea: perform analysis of media contents and their correspondence with the annotated keywords. Media content descriptors can be extracted and represented usingfeature vectors. SVM-based classifier can then be trained to map the feature vectors to soft labels reflecting the likelihood that a keyword matches an image/video. The SVM-based classifiers are trained based on the media data annotated by peer users. Once these classifiers are fully trained, they will be used to propagate keywords with associated probability to other unannotated media in the data collection.

Page 84: Computational Intelligence in Media Indexing and Retrieval

84

ConclusionConclusion

IntroductionImage indexing and retrieval basicsIssues and challenges

Computational Intelligence in Image RetrievalSoft relevance framework for fuzzy perception Pseudo-labeling and machine learning Region-based image retrieval using neural networkPeer tagging and knowledge propagation

Conclusion

Page 85: Computational Intelligence in Media Indexing and Retrieval

85

ConclusionConclusion

This tutorial shows that computational intelligence is instrumental in addressing some challenging issues in image indexing and retrieval:

Issue: Fuzzy user perceptionTechnique: Soft relevance framework to integrate the users’fuzzy perception of visual contentsIssue: Small sample problemTechnique: Pseudo-labeling, active learning and FSVMIssue: Image region perceptual importanceTechnique: Perceptual region estimation and VLRBFN Issue: Peer tagging and knowledge propagationTechnique: Develop domain-based keyword propagation

Page 86: Computational Intelligence in Media Indexing and Retrieval

86

Selected Recent PublicationsSelected Recent Publications

Kim-Hui Yap and Kui Wu, “A soft relevance framework in content-based image retrieval systems,” IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 12, pp. 1557–1568, Dec. 2005Kui Wu and Kim-Hui Yap, “Fuzzy SVM for content-based image retrieval - A pseudo-label support vector machine framework,” IEEE Computational Intelligence Magazine, vol. 1, pp. 10–16, May 2006K. Wu and K.-H. Yap, “Content-based image retrieval using fuzzy perceptual feedback,” accepted for publication in Multimedia Tools and Applications, Kluwer.K. Wu and K.-H. Yap, “A perceptual subjectivity notion in interactive content-based image retrieval systems,” in Intelligent Multimedia Processing with Soft Computing, Springer-Verlag, pp. 55-73, 2005.Kui Wu and Kim-Hui Yap, “Region-based image retrieval using radial basis function network,” in Proc. IEEE Int. Conf. Multimedia & Expo, Toronto, Canada, 2006

Page 87: Computational Intelligence in Media Indexing and Retrieval

87

References (I)References (I)

M. Flickher, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” IEEE Computer, vol. 28, no. 9, pp. 23–32, Sept. 1995.Y. Rui, T. S. Huang, and S. Mehrotra, “Content-based image retrieval with relevance feedback in MARS,” in Proc. IEEE Int. Conf. Image Processing, Washington D.C., USA, pp. 815–818, 1997.G. Amarnath and J. Ramesh, “Visual information retrieval,” Communications of ACM, vol. 40, no. 5, pp. 70–79, May 1997.A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: content-based manipulation of image databases,” International Journal of Computer Vision, vol. 18, no. 3, pp. 233–254, June 1996.J. R. Smith and S. F. Chang, “VisualSEEk: a fully automated content-based image query system,” in Proc. ACM Multimedia, pp. 87–98, Nov. 1996.T. Gevers and A. W. M. Smeulders, “PicToSeek: Combining color and shape invariant features for image retrieval,” IEEE Trans. Image Processing, vol. 9, pp. 102–119, 2000.I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, “The Bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments,” IEEE Trans. Image Processing, vol. 9, no. 1, pp. 20–37, 2000.A. M. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no.12, pp. 1349–1380, Dec 2000.

Page 88: Computational Intelligence in Media Indexing and Retrieval

88

References (II)References (II)Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: A power tool for interactive content-based image retrieval,” IEEE Trans. Circuits and Video Technology, vol. 8, no. 5, pp. 644–655, 1998.W. Y. Ma and B. S. Manjunath, “NeTra: A toolbox for navigating large image databases,” Multimedia System, vol. 7, no. 3, pp. 184–198, 1999.J. Z. Wang, J. Li, and G. Weiderhold, “SIMPLIcity: Semantic-sensitive integrated matching for picture libraries,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no.9, pp.947–963, 2001.C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image segmentation using expectation-maximization and its application to image querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp.1026–1038, 2002.Y. X. Chen and J. Z. Wang, “A region-based fuzzy feature matching approach to content-based image retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no.9, pp.1252–1267, 2002.F. Jing, M. J. Li, H. J. Zhang, and B. Zhang, “Relevance feedback in region-based image retrieval,” IEEE Trans. Circuits and Systems for Video Technology, vol.14, no. 5, pp. 672–681, 2004.R. Zhang and Z. Zhang, “Hidden semantic concept discovery in region based image retrieval,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pp. 996–1001, June 2004.Z. Stejic, Y. Takama, and K. Hirota, “Relevance feedback-based image retrieval interface incorporating region and feature saliency patterns as visualizable image similarity criteria,” IEEE Trans. Industrial Electronics, vol. 50, no. 5, pp. 839–852, 2003.

Page 89: Computational Intelligence in Media Indexing and Retrieval

89

References (III)References (III)P. Muneesawang and L. Guan, “Automatic machine interactions for content-based image retrieval using a self-organizing tree map architecture,” IEEE Trans. Neural Networks, vol. 13, no. 4, pp. 821–834, July 2002.Y. Wu, Q. Tian, and T. S. Huang, “Discriminant-EM algorithm with application to image retrieval,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, South Carolina, pp. 222–227, 2000.L. Wang and K. L. Chan, “Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pp. 629–634, 2003.S. Tong and E. Chang, “Support vector machine active leaning for image retrieval,” in Proc. ACM Int. Conf. Multimedia, Ottawa Canada, pp. 107–118, 2001.Y. Rubner, C. Tomasi, and L. Guibas, “The Earth Mover’s Distance as a metric for image retrieval,” International Journal of Computer Vision, vol. 40, pp. 99–123, 2000.J. Jeon, V. Lavrenko and R. Manmatha, “Automatic image annotation and retrieval using cross-media relevance models,” in Proc. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 119–126, 2003.E. Chang, G. Kingshy, G. Sychay, and G. Wu, “CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machines,” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, no.1, pp. 26–38, Jan. 2003.V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.S. Haykin, Neural Networks a Comprehensive Foundation. Upper Saddle River, NJ: Prentice-Hall, 1999.

Page 90: Computational Intelligence in Media Indexing and Retrieval

90

THANK YOU! ☺

Questions?