research article measuring semantic relatedness between...

13
Research Article Measuring Semantic Relatedness between Flickr Images: From a Social Tag Based View Zheng Xu, 1 Xiangfeng Luo, 2 Yunhuai Liu, 1 Lin Mei, 1 and Chuanping Hu 1 1 e ird Research Institute of Ministry of Public Security, 339 Bisheng Road, Shanghai 201142, China 2 Shanghai University, 149 Yanchang Road, Shanghai 200472, China Correspondence should be addressed to Zheng Xu; [email protected] Received 29 October 2013; Accepted 19 December 2013; Published 23 February 2014 Academic Editors: J. Shu and F. Yu Copyright © 2014 Zheng Xu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Relatedness measurement between multimedia such as images and videos plays an important role in computer vision, which is a base for many multimedia related applications including clustering, searching, recommendation, and annotation. Recently, with the explosion of social media, users can upload media data and annotate content with descriptive tags. In this paper, we aim at measuring the semantic relatedness of Flickr images. Firstly, four information theory based functions are used to measure the semantic relatedness of tags. Secondly, the integration of tags pair based on bipartite graph is proposed to remove the noise and redundancy. irdly, the order information of tags is added to measure the semantic relatedness, which emphasizes the tags with high positions. e data sets including 1000 images from Flickr are used to evaluate the proposed method. Two data mining tasks including clustering and searching are performed by the proposed method, which shows the effectiveness and robustness of the proposed method. Moreover, some applications such as searching and faceted exploration are introduced using the proposed method, which shows that the proposed method has broad prospects on web based tasks. 1. Introduction Relatedness measurement especially similarity between mul- timedia such as images and videos plays an important role in computer vision. e image similarity is a base for many multimedia related applications including image clustering [1], searching [2, 3], recommendation [4], and annotation [5]. e relatedness problem is relevant to two aspects: images representation and relatedness measurement. e former aspect needs an appropriate model to reserve the related information of an image. e latter aspect requires an effect method to compute the relatedness accurately. In the early stage, relatedness measurement is based on the low-level visual features such as texture [6, 7], shape [8], and gradient [9]. ese visual features are used to represent effective information of an image. Some distance metrics including Chi-Square distance [10], Euclidean distance [11], histogram intersection [12], and EMD distance [13] is used. Overall, these methods ignore the high-level features such as semantic information which can be understood by machine and people easily. ese methods are limited to the applica- tions which need semantic level information. Recently, with the explosion of community contributed multimedia content available online, many social media repositories (e.g., Flickr (http://www.flickr.com), Youtube (http://www.youtube.com), and Zooomr (http://www. zooomr.com)) allow users to upload media data and annotate content with descriptive keywords which are called social tags. We take Flickr, one of the most popular and earliest photo sharing sites, as an example to study the relatedness measurement between images. Flickr provides an open platform for users to publish their personal images freely. e principal purpose of tagging is to make images better accessible to the public. e success of Flickr proves that users are willing to participate in this semantic context through manual annotations [14]. Flickr uses a promising approach for manual metadata generation named “social tagging,” which requires all the users in the social network to label the web resources with their own keywords and share with others. e characteristics of social tags are as follows. Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 758089, 12 pages http://dx.doi.org/10.1155/2014/758089

Upload: others

Post on 26-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

Research ArticleMeasuring Semantic Relatedness between Flickr ImagesFrom a Social Tag Based View

Zheng Xu1 Xiangfeng Luo2 Yunhuai Liu1 Lin Mei1 and Chuanping Hu1

1 TheThird Research Institute of Ministry of Public Security 339 Bisheng Road Shanghai 201142 China2 Shanghai University 149 Yanchang Road Shanghai 200472 China

Correspondence should be addressed to Zheng Xu xuzhengshueducn

Received 29 October 2013 Accepted 19 December 2013 Published 23 February 2014

Academic Editors J Shu and F Yu

Copyright copy 2014 Zheng Xu et alThis is an open access article distributed under theCreativeCommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Relatedness measurement between multimedia such as images and videos plays an important role in computer vision which is abase for many multimedia related applications including clustering searching recommendation and annotation Recently withthe explosion of social media users can upload media data and annotate content with descriptive tags In this paper we aim atmeasuring the semantic relatedness of Flickr images Firstly four information theory based functions are used to measure thesemantic relatedness of tags Secondly the integration of tags pair based on bipartite graph is proposed to remove the noise andredundancy Thirdly the order information of tags is added to measure the semantic relatedness which emphasizes the tags withhigh positions The data sets including 1000 images from Flickr are used to evaluate the proposed method Two data miningtasks including clustering and searching are performed by the proposed method which shows the effectiveness and robustnessof the proposedmethod Moreover some applications such as searching and faceted exploration are introduced using the proposedmethod which shows that the proposed method has broad prospects on web based tasks

1 Introduction

Relatedness measurement especially similarity between mul-timedia such as images and videos plays an important rolein computer vision The image similarity is a base for manymultimedia related applications including image clustering[1] searching [2 3] recommendation [4] and annotation [5]The relatedness problem is relevant to two aspects imagesrepresentation and relatedness measurement The formeraspect needs an appropriate model to reserve the relatedinformation of an image The latter aspect requires an effectmethod to compute the relatedness accurately

In the early stage relatedness measurement is based onthe low-level visual features such as texture [6 7] shape [8]and gradient [9] These visual features are used to representeffective information of an image Some distance metricsincluding Chi-Square distance [10] Euclidean distance [11]histogram intersection [12] and EMD distance [13] is usedOverall these methods ignore the high-level features such assemantic information which can be understood by machine

and people easily These methods are limited to the applica-tions which need semantic level information

Recently with the explosion of community contributedmultimedia content available online many social mediarepositories (eg Flickr (httpwwwflickrcom) Youtube(httpwwwyoutubecom) and Zooomr (httpwwwzooomrcom)) allow users to uploadmedia data and annotatecontent with descriptive keywords which are called socialtags We take Flickr one of the most popular and earliestphoto sharing sites as an example to study the relatednessmeasurement between images Flickr provides an openplatform for users to publish their personal images freelyThe principal purpose of tagging is to make images betteraccessible to the publicThe success of Flickr proves that usersare willing to participate in this semantic context throughmanual annotations [14] Flickr uses a promising approachfor manual metadata generation named ldquosocial taggingrdquowhich requires all the users in the social network to labelthe web resources with their own keywords and share withothers The characteristics of social tags are as follows

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 758089 12 pageshttpdxdoiorg1011552014758089

2 The Scientific World Journal

Synonymy Similar Meronymy Cooccurrence

Nurse

Hand

Finger

Doctor

Dog

Wolf

PC

Computer

Doctor

Figure 1 The illustration of four kinds of correlations between concepts

(1) Ontology free The ontology based labeling definesontology and then let users label the web resourcesusing the semantic markups in the ontology Socialtagging requires all the users in the social networkto label the web resources with their own keywordsand share with others Different from ontology basedannotation there is no predefined ontology or taxon-omy in social tagging Thus the tagging task is moreconvenient for users

(2) User oriented The users can annotate images withtheir favorite tags The tags of an image are deter-mined by usersrsquo cognitive ability To a same imageusers may give different tags Each image may be withone tag at least and each tag may appear in manydifferent images

(3) Semantic loss Irrelevant social tags frequently appearand users typically will not tag all semantic objects inthe image which is called semantic loss Polysemysynonyms and ambiguity are some drawbacks ofsocial tagging

Based on the above characteristics we aim at measuringsemantic relatedness between images using social tags It isobserved that the correlations between the concepts ofimages can be divided into four kinds synonymy similaritymeronymy and concurrence as illustrated in Figure 1 Syn-onymy means the same object with different names Simi-larity denotes that two objects are similar Meronymy meansthat two objects follow part-of relation Concurrence meansthat two objects appear frequently Overall the above fourcorrelations can be summarized as semantic relatedness [15]Semantic relatedness is amore generic concept than semanticsimilarity Similar concepts are usually considered to berelated for their likeness (synonymy) dissimilar conceptscan also be semantically related such as meronymy or con-currence In this paper we focus on measuring semanticrelatedness between images

(1) Semantic relatedness follows the cognitive mecha-nism of people In [16] the author suggests that theassociation relation is the basic mechanism of brainWhen people know a concept such as ldquohospitalrdquo shehe may index the related concept such as ldquodoctorrdquo forappropriate understanding of the original concept

Since the goal of relatedness measurement is to facili-tate related applications such as searching and recom-mendation the proposedmethod should follow userrsquoscognitive mechanism

(2) Semantic relatedness can be used to organize imagesbased on their associations In recent literatures suchas Linked Open Data (LOD) [17] and Semantic LinkNetwork (SLN) [18ndash20] the resources are managedby their semantic relations The proposed semanticrelatedness measures can be used to build semanticlinks between resources especially images which canbe easily applied in real applications

The major contributions of this paper are summarized asfollows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic relat-edness of images Finally a decline factor consider-ing the position information of tags is used in theproposed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and retrieval areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the associationrelation is the basic mechanism of brain The pro-posed relatedness measurement can facilitate relatedapplications such as searching and recommendation

The rest of the paper is organized as follows Section 2gives the related work of social tags and image similaritymeasures The problem definition is introduced in Section 3

The Scientific World Journal 3

Big Ben at night London through my lens London eye

Figure 2 The illustration of a pair of images from Flickr

Section 4 proposes the method for measuring semantic relat-edness of images Experiments are presented in Section 5Conclusions are made in the last section

2 Related Work

In this section we give two related aspects of the proposedwork Some researches about social tags are introducedfirst Then we give the related work about image similaritymeasures

21 On Social Tags In the area about the usage patternsand semantic values of social tags Golder and Huberman[21] mined usage patterns of social tags based on thedelicious (deliciouspost) data set Al-Khalifa and Davis[22] concluded that social tags were semantically richerthan automatically extracted keywords Suchanek et al [23]used YAGO (httpwwwmpi-infmpgdeyago-nagayago)and WordNet (httpwordnetprincetonedu) to check themeaning of social tags and concluded that top tags wereusually meaningful Halpin et al [24] examined why and howthe power law distribution of tag usage frequency was formedin a mature social tagging system over time

Beside research on mining social tags some researchesmodeled the network structure of social tags Cattuto et al[25] investigated the network features of social tags systemwhich is seen as a tripartite graph using metrics adaptedfrom classical networkmeasures Lambiotte andAusloos [26]described the social tags systems as a tripartite network withusers tags and annotated items The proposed tripartitenetwork was projected into the bipartite and unipartitenetwork to discover its structures In [27] the social tagssystem was modeled as a tripartite graph which extendsthe traditional bipartite model of ontologies with a socialdimension

Recently many researchers investigated the applicationsof social tags in information retrieval and ranking In [28]the authors empirically study the potential value of socialannotations for web search Zhou et al [29] proposed amodel using latent dirichlet allocation which incorporatesthe topical background of documents and social tags Xu etal [30] developed a language model for information retrieval

based on metadata property of social tags and their rela-tionships to annotated documents Bao et al [31] introducedtwo ranking methods SocialSimRank which ranked pagesbased on the semantic similarity between tags and pages andSocialPageRank which ranked returned pages based on theirpopularity Schenkel et al [32] developed a top-119896 algorithmwhich ranked search results based on the tags shared by theuser who issued the query and the users who annotated thereturned documents with the query tags

22 On Measuring Images Similarity Measuring semanticsimilarity is a basic issue in computer vision field Usu-ally some low-level visual features are used for similaritymeasures For example shape features texture features andgradient features can be extracted from images Based onthe extracted low-level features distance metrics such as theEuclidean distance the Chi-Square distance the histogramintersection and the EMDdistance are used In this paper theproposed method addresses the problem by semantic-levelfeatures such as social tags

Different from the methods using low-level featuresrecently a number of papers build image representationbased on the outputs of concept classifiers [33] Our obser-vation is that Flickr provides the related social tags by webusers which reflect how people on the internet tend toannotate images Several previous methods [34] learn objectmodels from internet images These methods tend to gathertraining examples using image search results Besides theirapproaches have to alternate between finding good examplesand updating object in order to robust against noisy imagesOn the other hand some papers [35] use images from Flickrgroups other than search engines which is claimed to be cleanenough to produce good classifiers

3 Problem Definition

In this paper we study the problem of measuring semanticrelatedness between images or videoswithmanually providedsocial tags Here a social tag refers to some concepts providedby users which is semantically related to the content of animage or a video The input of the proposed method is a pairof images or videos with social tagsThe goal of the proposedmethod is to identify the semantic relatedness between two

4 The Scientific World Journal

images or videos Figure 2 shows the illustration of a pairof images from Flickr with social tags These two imagesare about ldquoBig Benrdquo and ldquoLondon eyerdquo These two imagesmay be dissimilar according to the traditional similaritymea-surement since they do not share some common low levelvisual similarity But these two images are semantic relatedsince they are both the famous sightseeings of London In theproposedmethod we can compute their semantic relatednessthough they may share little similar visual features

31 Basic Definitions We first introduce three importantdefinitions in this paper the social tags set of an image andthe semantic relatedness between two images

Definition 1 (social tags set of an image) The social tags(denoted by 119905) set of an image 119891 (denoted by 119904(119891)) is a setof tags provided by users of an image

119904 (119891) = 1199051 1199052 119905

|119904(119891)| (1)

For example in Figure 2 the tags of the right images areldquoLondonrdquo and ldquoeyerdquo other than ldquoLondon eyerdquo Since Flickrprovides the related tags of each image we just download thetags by Flickr We do not perform any NLP operations on thetags

Definition 2 (semantic relatedness between tags) Thesemantic relatedness between tags (denoted by sr(119905

1 1199052)) is

the expected correlation of a pair of tags 1199051and 1199052

Definition 3 (semantic relatedness between images) Thesemantic relatedness between images (denoted by sr(119891

1 1198912))

is the expected correlation of a pair of images 1198911and 119891

2

The range of sr(1199051 1199052) and sr(119891

1 1198912) is from 0 to 1 A

high value indicates that semantic relatedness between tagsor images is more likely to be confidential Please notice thatthe definition of sr(119891

1 1198912) can also be extended to videos with

social tags

32 Basic Heuristics Based on common sense and ourobservations on real data we have five heuristics that serveas the base of our computation model

Heuristic 1 Usually each tag of an image appears only onetime

Different from writing sentences users usually annotatean image with different tags For example the possibilityof using tags ldquoapple apple applerdquo for an image is very lowTherefore in this paper we do not employ any weightingscheme for tags such as tf-idf [36]

Heuristic 2 The order of the tags may reflect the correlationagainst the annotated image

Different tag reflects the different aspects of an imageAccording to Heuristic 1 the weight of a tag against the imagecannot be obtained Fortunately the order of the tags can begotten since user may provide tags one by one

Heuristic 3 The number of tags of an image may not berelevant to the annotation correctness

Different users may give different tags about the sameimage For example users may give tags such as ldquoappleiPhonerdquo or ldquoiPhone4mobile phonerdquo for the same image aboutiPhone It is hardly to say which tag is better for annotationthough the latter annotation has three tags

Heuristic 4 Usually some tags may be redundant for anno-tating an image

Of course users may give similar tags for an image Forexample the tag ldquoapple iPhonerdquo may be redundant sinceiPhone is very semantic similar to apple

Heuristic 5 Usually some tags may be noisy for annotatingan image

Users may give inappropriate or even false tags for animage For example the tags ldquoiPhonerdquo are false for an imageabout the iPod

4 Computation Model

In this section we propose the computation model formeasuring semantic relatedness between images Based onthe above five heuristics the social tags provided by usersare used in our computation model Overall the proposedcomputation model is divided into three steps

(1) Tag relatedness computation In this step based onHeuristic 1 all of the tag pairs between two images arecomputed

(2) Semantic relatedness integration In this step basedon Heuristics 3ndash5 we measure semantic relatednessbetween images

(3) Tag order revision In this step based on Heuristic 2the image relatedness on step 2 is revised

Table 1 shows the variables and parameters used in thefollowing discussion Figure 3 illustrates an overview of theproposed computation model

41 Tag Relatedness Computation According to Definition 1an image can be represented as a set of tags provided byusers As for the semantic relatedness of a pair of images wecan measure the semantic relatedness between tags of theseimages For example two images with tags ldquoapple iPhonerdquoand ldquoiPod Nanordquo we can measure the semantic relatednessbetween these tags Since the number of each tag is usu-ally one according to Heuristic 1 the semantic relatednessbetween tags can be computed without considering theirweight

Many differentmethods of semantic relatednessmeasuresbetween concepts have been proposed which can be divided

The Scientific World Journal 5

(a) Input

(b) Tag extraction

(c) Page counts repository

(d) Tag relatedness computation

Big Ben

Night

Clock

Tower

England

London eye

(e) Assignment in bipartite graph

Big ben

Night

Clock

Tower

England

London eye

(f) Applications

Searching Recommendation

Clustering

Big Ben at night London through my lens

Big Ben at night London through my lens

London eye

f1 Big Ben night clock tower England

f2 London eye

Big Ben at night London through my lens

S1

S1S2S3S4S5

Figure 3 The illustration of the proposed method

Table 1 The variables and parameters used in the proposed com-putation model

Name Description119891 An image119905 A tag119904(119891) Tags set of an imagesr(1199051 1199052) Semantic relatedness of two tags

sr(1198911 1198912) Semantic relatedness of two images

119873(119905) Page counts of a tag119873(119904(119891)) Set of page counts of an imagepos(119905) Position information of a tag

into two aspects [37] taxonomy-based methods and web-based methods Taxonomy-based methods use informationtheory and hierarchical taxonomy such as WordNet tomeasure semantic relatedness On the contrary web-basedmethods use the web as a live and active corpus instead ofhierarchical taxonomy

In the proposed computation model each tag can beseen as a concept with explicit meaning Thus we use someequations based on cooccurrence of two concepts to measuretheir semantic relatedness The core idea is that ldquoyou shallknow a word by the company it keepsrdquo [38] In this section

four popular cooccurrence measures (ie Jaccard OverlapDice and PMI) are proposed tomeasure semantic relatednessbetween tags

Besides cooccurrence measures the page counts of eachtag from search engine are used Page counts mean thenumber of web pages containing the query 119902 For examplethe page counts of the query ldquoObamardquo inGoogle (httpwwwgooglecom) are 1210000000 (the data was gotten in thedate 9282012) Moreover page counts for the query ldquo119902119860119873119863 119901rdquo can be considered as a measure of cooccurrenceof queries 119902 and 119901 For the remainder of this paper we usethe notation 119873(119901) to denote the page counts of the tag 119901in Google However the respective page counts for the tagpair 119901 and 119902 are not enough for measuring semantic relat-edness The page counts for the query ldquo119902 119860119873119863 119901rdquo shouldbe considered For example when we query ldquoObamardquo andldquoUnited Statesrdquo in Google we can find 485000000 Webpages that is119873(ObamacapUnited States) = 485000000 Thefour cooccurrence measures (ie Jaccard Overlap Dice andPMI) between two tags 119901 and 119902 are as follows

Jaccard (119901 119902) =119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) minus 119873 (119901 cap 119902) (2)

119901 cap 119902 denotes the conjunction query ldquo119901 119860119873119863 119902rdquo

6 The Scientific World Journal

Consider

Overlap (119901 119902) =119873 (119901 cap 119902)

min (119873 (119901) 119873 (119902)) (3)

min(119873(119901)119873(119902))means the lower number of119873(119901) or119873(119902)Consider

Dice (119901 119902) =2 lowast 119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) (4)

According to probability and information theory the mutualinformation (MI) of two random variables is a quantitythat measures the mutual dependence of the two variablesPointwise mutual information (PMI) is a variant of MI (see(5))

PMI (119901 119902) =log ((119873 lowast 119873 (119901 cap 119902)) (119873 (119901) lowast 119873 (119902)))

log119873

(5)

where 119873 is the number of Web pages in the search enginewhich is set to119873 = 1011 according to the number of indexedpages reported by Google

Through (2)ndash(5) we can compute the tag relatedness asfollows

(1) Extracting the tags from two images 1198911and 1198912 which

are denoted by

119904 (1198911) = 119905

1 1199052 119905

|119904(1198911)|

119904 (1198912) = 119905

1 1199052 119905

|119904(1198912)|

(6)

(2) Issue the tags from 1198911and 119891

2as the query to the web

search engine (in this paper we choose Google for itsconvenient API (httpdevelopersgooglecom)) thepage counts can be denoted by

119873(119904 (1198911)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198911)|)

119873 (119904 (1198912)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198912)|)

(7)

(3) Computing the semantic relatedness between eachtags pair from 119891

1and 119891

2by (2)ndash(5) For example if

we use PMI to compute tag semantic relatedness theequation can be

sr (119905119894 119905119895) =

log ((119873 lowast 119873(119905119894cap 119905119895)) (119873 (119905

119894) lowast 119873 (119905

119895)))

log119873

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(8)

From the above steps the tags relatedness can be com-puted which is denoted as a triple ⟨119905

119894 119905119895 sr(119905119894 119905119895)⟩ In the next

section wewill give the detailed analysis for choosing the bestmeasures from (2)ndash(5)

Overall the page counts of each tag should be issuedThen some cooccurrence based measures are used to com-pute the semantic relatedness between tags The reasons forusing page counts based measures are as follows

(1) Appropriate computation complexity Since the relat-edness between each tag pair of two images shouldbe computed the proposed method must be withlow complexity Recently web search engines suchas Google provide API for users to index the pagecounts of each query The web search engine gives anappropriate interface for the proposed computationmodel

(2) Explicit semanticsThe tag given by usersmay not be acorrect concept in taxonomy For example users maygive a tag ldquoBling Blingrdquo for an image about a lovelygirl The word ldquoBlingrdquo cannot be indexed in manytaxonomy such as WorldNet The proposed methoduses web search engine as an open intermediate Theexplicit semantics of the newly emerge concepts canbe gotten by web easily

42 Semantic Relatedness Integration In Section 41 we com-pute the tag pair relatedness of two images Obviously the tagpair relatedness of two images 119891

1and 119891

2can be treated as a

bipartite graph which is denoted by

119866 = (119881 119864)

119881 = 1198911 1198912

119864 = ⟨119905119894 119905119895 sr (119905119894 119905119895)⟩ 119905

119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(9)

Based on (9) we change the semantic relatedness integra-tion of all tag pairs to the problemmdashassignment in bipartitegraph We want to assign a best matching of the bipartitegraph 119866

A matching is defined as119872 sube 119864 so that no two edges in119872 share a common end vertex An assignment in a bipartitegraph is a matching 119872 so that each node of the graph hasan incident edge in 119872 Suppose that the set of vertices arepartitioned in two sets 119891

1and 119891

2 and that the edges of the

graph have an associated weight given by a function 119891 (1198911 1198912) rarr [0 sdot sdot sdot 1] The function maxRel (119891 119891

1 1198912) rarr

[0 sdot sdot sdot 1] returns the maximum weighted assignment that isan assignment so that the average of the weights of theedges is highest Figure 4 shows a graphical representationof the semantic relatedness integration where the bold linesconstitute the matching119872

Based on the expression of the assignment in bipartitegraphs we have

maxRel (119891 1198911 1198912)

=

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198911)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)

1003816100381610038161003816 le1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198912)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)1003816100381610038161003816 gt1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

119868 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198911)

1003816100381610038161003816] 119869 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198912)

1003816100381610038161003816]

(10)

Using the assignment in bipartite graphs problem to ourcontext the variables 119891

1and 119891

2represent the two images to

compute the semantic relatedness For example that 1198911and

The Scientific World Journal 7

10

0702

0508

01

0101

07

0309

f1 f2

t1q1

t2

q2

t3

q3

q4

maxRel(f1 f2) = (sr(t1 q1) + sr(t2 q2) + sr(t3 q4))3 =

(10 + 08 + 07)3 = 083

Figure 4 Graphical representation of the assignment in bipartitegraphs problem

1198912are composed of the tags 119904(119891

1) and 119904(119891

2) |119904(119891

1)| gt |119904(119891

2)|

means that the number of tags in 119904(1198912) is lower than that of

119904(1198911) According to Heuristic 3 we divide the result of the

maximization by the lower cardinality of 119904(1198911) or 119904(119891

2) In this

way the influence of the number of tags is reduced and thesemantic relatedness of two images is symmetric

Beside the cardinality of two tags set 119904(1198911) and 119904(119891

2) the

maxRel function is affected by the relatedness between eachpair of tags According to Heuristics 4 and 5 the redundancyand noise should be avoided In maxRel function the one-to-one map is applied to the tags 119904(119891

1) and 119904(119891

2) Thus the

proposed maxRel function varies with respect to the natureof two images

Adopting the proposed maxRel function we are sure tofind the global maximum relatedness that can be obtainedpairing the elements in the two tags sets Alternativemethodsare able to find only the local maximum since they scroll theelements in the first set and after calculating the relatednesswith all the elements in the second set they select the onewith the maximum relatedness Since every element in oneset must be connected at most at one element in the otherset such a procedure is able to find only the local maximumsince it depends on the order inwhich the comparisons occurFor example considering the example in Figure 4 119905

1will be

paired to 1199021(weight = 10) But when analyzing 119905

3 the

maximum weight is with 1199022(weight = 09) This means that

1199052can nomore be paired to 119902

2even if the weight is maximum

since this is already matched to 1199053 As a consequence 119905

2will

be paired to 1199023and the average of the selected weights will be

(10 + 03 + 09)3 = 073 which is considerably lower thanusing MaxRel where the sum of the weights was (10 + 08 +07)3 = 083

Overall the cardinality of two tag sets is used to followHeuristic 3The one-to-one map of tags pair is used to followHeuristics 4 and 5 The MaxRel function is used to match abest semantic relatedness integration of two images

43 Tag Order Revision According to Heuristic 2 the orderof tags should be considered to compute the semantic relat-edness between two images Intuitively the tags appearingin the first position may be more important than the lattertags Some researches [39] suggest that people used to selectpopular items as their tags Meanwhile the top popular tagsare indeed the ldquomeaningfulrdquo ones

In this section the MaxRel function proposedin Section 42 is revised considering the order of tagsFor example the relatedness of tags pair with high positionshould be enhanced which is summarized as a constrainschema

Schema 1 (tag relatedness declining)This schema means thatthe identical tag pairs of two images 119891

1and 119891

2should be

pruned in MaxRel function In other words the semanticrelatedness of the same tag of two images is set as 0

We add a decline factor to the MaxRel function and thedetailed steps are as follows

(1) According to the MaxRel function in Section 42 thebest matching tag pairs are selected which is denotedby

maxRel (1198911 1198912) = sum sr (119905

119894 119905119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(11)

Of course the selected tag pairs are the best matchingof the bipartite graph between images 119891

1and 119891

2

(2) Computing the position information of each tagwhich is denoted by Pos(119905

119894)

Pos (119905119894) =

1003816100381610038161003816119904 (119891)1003816100381610038161003816 + 1 minus 119894

1003816100381610038161003816119904 (119891)1003816100381610038161003816

119905119894isin 119904 (119891) (12)

(3) Add the position information of each tag to (11)which can be seen as a decline factor

sr (1198911 1198912) = sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(13)

(4) Of course similar to MaxRel function equationshould divide the result of the maximization by

sr (1198911 1198912) =sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

sumPos (119905119894) lowast Pos (119905

119895)

(14)

We also consider the example in Figure 4 According to(14) the semantic relatedness is revised as

(1 sdot 10 sdot 1 +2

3sdot 08 sdot

3

4+1

3sdot 07 sdot

1

4)

times (1 sdot 1 +2

3sdot3

4+1

3sdot1

4)

minus1

= 092

(15)

Besides adding decline factor to the MaxRel function wealso add a constrain schema identical tag pruning

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

2 The Scientific World Journal

Synonymy Similar Meronymy Cooccurrence

Nurse

Hand

Finger

Doctor

Dog

Wolf

PC

Computer

Doctor

Figure 1 The illustration of four kinds of correlations between concepts

(1) Ontology free The ontology based labeling definesontology and then let users label the web resourcesusing the semantic markups in the ontology Socialtagging requires all the users in the social networkto label the web resources with their own keywordsand share with others Different from ontology basedannotation there is no predefined ontology or taxon-omy in social tagging Thus the tagging task is moreconvenient for users

(2) User oriented The users can annotate images withtheir favorite tags The tags of an image are deter-mined by usersrsquo cognitive ability To a same imageusers may give different tags Each image may be withone tag at least and each tag may appear in manydifferent images

(3) Semantic loss Irrelevant social tags frequently appearand users typically will not tag all semantic objects inthe image which is called semantic loss Polysemysynonyms and ambiguity are some drawbacks ofsocial tagging

Based on the above characteristics we aim at measuringsemantic relatedness between images using social tags It isobserved that the correlations between the concepts ofimages can be divided into four kinds synonymy similaritymeronymy and concurrence as illustrated in Figure 1 Syn-onymy means the same object with different names Simi-larity denotes that two objects are similar Meronymy meansthat two objects follow part-of relation Concurrence meansthat two objects appear frequently Overall the above fourcorrelations can be summarized as semantic relatedness [15]Semantic relatedness is amore generic concept than semanticsimilarity Similar concepts are usually considered to berelated for their likeness (synonymy) dissimilar conceptscan also be semantically related such as meronymy or con-currence In this paper we focus on measuring semanticrelatedness between images

(1) Semantic relatedness follows the cognitive mecha-nism of people In [16] the author suggests that theassociation relation is the basic mechanism of brainWhen people know a concept such as ldquohospitalrdquo shehe may index the related concept such as ldquodoctorrdquo forappropriate understanding of the original concept

Since the goal of relatedness measurement is to facili-tate related applications such as searching and recom-mendation the proposedmethod should follow userrsquoscognitive mechanism

(2) Semantic relatedness can be used to organize imagesbased on their associations In recent literatures suchas Linked Open Data (LOD) [17] and Semantic LinkNetwork (SLN) [18ndash20] the resources are managedby their semantic relations The proposed semanticrelatedness measures can be used to build semanticlinks between resources especially images which canbe easily applied in real applications

The major contributions of this paper are summarized asfollows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic relat-edness of images Finally a decline factor consider-ing the position information of tags is used in theproposed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and retrieval areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the associationrelation is the basic mechanism of brain The pro-posed relatedness measurement can facilitate relatedapplications such as searching and recommendation

The rest of the paper is organized as follows Section 2gives the related work of social tags and image similaritymeasures The problem definition is introduced in Section 3

The Scientific World Journal 3

Big Ben at night London through my lens London eye

Figure 2 The illustration of a pair of images from Flickr

Section 4 proposes the method for measuring semantic relat-edness of images Experiments are presented in Section 5Conclusions are made in the last section

2 Related Work

In this section we give two related aspects of the proposedwork Some researches about social tags are introducedfirst Then we give the related work about image similaritymeasures

21 On Social Tags In the area about the usage patternsand semantic values of social tags Golder and Huberman[21] mined usage patterns of social tags based on thedelicious (deliciouspost) data set Al-Khalifa and Davis[22] concluded that social tags were semantically richerthan automatically extracted keywords Suchanek et al [23]used YAGO (httpwwwmpi-infmpgdeyago-nagayago)and WordNet (httpwordnetprincetonedu) to check themeaning of social tags and concluded that top tags wereusually meaningful Halpin et al [24] examined why and howthe power law distribution of tag usage frequency was formedin a mature social tagging system over time

Beside research on mining social tags some researchesmodeled the network structure of social tags Cattuto et al[25] investigated the network features of social tags systemwhich is seen as a tripartite graph using metrics adaptedfrom classical networkmeasures Lambiotte andAusloos [26]described the social tags systems as a tripartite network withusers tags and annotated items The proposed tripartitenetwork was projected into the bipartite and unipartitenetwork to discover its structures In [27] the social tagssystem was modeled as a tripartite graph which extendsthe traditional bipartite model of ontologies with a socialdimension

Recently many researchers investigated the applicationsof social tags in information retrieval and ranking In [28]the authors empirically study the potential value of socialannotations for web search Zhou et al [29] proposed amodel using latent dirichlet allocation which incorporatesthe topical background of documents and social tags Xu etal [30] developed a language model for information retrieval

based on metadata property of social tags and their rela-tionships to annotated documents Bao et al [31] introducedtwo ranking methods SocialSimRank which ranked pagesbased on the semantic similarity between tags and pages andSocialPageRank which ranked returned pages based on theirpopularity Schenkel et al [32] developed a top-119896 algorithmwhich ranked search results based on the tags shared by theuser who issued the query and the users who annotated thereturned documents with the query tags

22 On Measuring Images Similarity Measuring semanticsimilarity is a basic issue in computer vision field Usu-ally some low-level visual features are used for similaritymeasures For example shape features texture features andgradient features can be extracted from images Based onthe extracted low-level features distance metrics such as theEuclidean distance the Chi-Square distance the histogramintersection and the EMDdistance are used In this paper theproposed method addresses the problem by semantic-levelfeatures such as social tags

Different from the methods using low-level featuresrecently a number of papers build image representationbased on the outputs of concept classifiers [33] Our obser-vation is that Flickr provides the related social tags by webusers which reflect how people on the internet tend toannotate images Several previous methods [34] learn objectmodels from internet images These methods tend to gathertraining examples using image search results Besides theirapproaches have to alternate between finding good examplesand updating object in order to robust against noisy imagesOn the other hand some papers [35] use images from Flickrgroups other than search engines which is claimed to be cleanenough to produce good classifiers

3 Problem Definition

In this paper we study the problem of measuring semanticrelatedness between images or videoswithmanually providedsocial tags Here a social tag refers to some concepts providedby users which is semantically related to the content of animage or a video The input of the proposed method is a pairof images or videos with social tagsThe goal of the proposedmethod is to identify the semantic relatedness between two

4 The Scientific World Journal

images or videos Figure 2 shows the illustration of a pairof images from Flickr with social tags These two imagesare about ldquoBig Benrdquo and ldquoLondon eyerdquo These two imagesmay be dissimilar according to the traditional similaritymea-surement since they do not share some common low levelvisual similarity But these two images are semantic relatedsince they are both the famous sightseeings of London In theproposedmethod we can compute their semantic relatednessthough they may share little similar visual features

31 Basic Definitions We first introduce three importantdefinitions in this paper the social tags set of an image andthe semantic relatedness between two images

Definition 1 (social tags set of an image) The social tags(denoted by 119905) set of an image 119891 (denoted by 119904(119891)) is a setof tags provided by users of an image

119904 (119891) = 1199051 1199052 119905

|119904(119891)| (1)

For example in Figure 2 the tags of the right images areldquoLondonrdquo and ldquoeyerdquo other than ldquoLondon eyerdquo Since Flickrprovides the related tags of each image we just download thetags by Flickr We do not perform any NLP operations on thetags

Definition 2 (semantic relatedness between tags) Thesemantic relatedness between tags (denoted by sr(119905

1 1199052)) is

the expected correlation of a pair of tags 1199051and 1199052

Definition 3 (semantic relatedness between images) Thesemantic relatedness between images (denoted by sr(119891

1 1198912))

is the expected correlation of a pair of images 1198911and 119891

2

The range of sr(1199051 1199052) and sr(119891

1 1198912) is from 0 to 1 A

high value indicates that semantic relatedness between tagsor images is more likely to be confidential Please notice thatthe definition of sr(119891

1 1198912) can also be extended to videos with

social tags

32 Basic Heuristics Based on common sense and ourobservations on real data we have five heuristics that serveas the base of our computation model

Heuristic 1 Usually each tag of an image appears only onetime

Different from writing sentences users usually annotatean image with different tags For example the possibilityof using tags ldquoapple apple applerdquo for an image is very lowTherefore in this paper we do not employ any weightingscheme for tags such as tf-idf [36]

Heuristic 2 The order of the tags may reflect the correlationagainst the annotated image

Different tag reflects the different aspects of an imageAccording to Heuristic 1 the weight of a tag against the imagecannot be obtained Fortunately the order of the tags can begotten since user may provide tags one by one

Heuristic 3 The number of tags of an image may not berelevant to the annotation correctness

Different users may give different tags about the sameimage For example users may give tags such as ldquoappleiPhonerdquo or ldquoiPhone4mobile phonerdquo for the same image aboutiPhone It is hardly to say which tag is better for annotationthough the latter annotation has three tags

Heuristic 4 Usually some tags may be redundant for anno-tating an image

Of course users may give similar tags for an image Forexample the tag ldquoapple iPhonerdquo may be redundant sinceiPhone is very semantic similar to apple

Heuristic 5 Usually some tags may be noisy for annotatingan image

Users may give inappropriate or even false tags for animage For example the tags ldquoiPhonerdquo are false for an imageabout the iPod

4 Computation Model

In this section we propose the computation model formeasuring semantic relatedness between images Based onthe above five heuristics the social tags provided by usersare used in our computation model Overall the proposedcomputation model is divided into three steps

(1) Tag relatedness computation In this step based onHeuristic 1 all of the tag pairs between two images arecomputed

(2) Semantic relatedness integration In this step basedon Heuristics 3ndash5 we measure semantic relatednessbetween images

(3) Tag order revision In this step based on Heuristic 2the image relatedness on step 2 is revised

Table 1 shows the variables and parameters used in thefollowing discussion Figure 3 illustrates an overview of theproposed computation model

41 Tag Relatedness Computation According to Definition 1an image can be represented as a set of tags provided byusers As for the semantic relatedness of a pair of images wecan measure the semantic relatedness between tags of theseimages For example two images with tags ldquoapple iPhonerdquoand ldquoiPod Nanordquo we can measure the semantic relatednessbetween these tags Since the number of each tag is usu-ally one according to Heuristic 1 the semantic relatednessbetween tags can be computed without considering theirweight

Many differentmethods of semantic relatednessmeasuresbetween concepts have been proposed which can be divided

The Scientific World Journal 5

(a) Input

(b) Tag extraction

(c) Page counts repository

(d) Tag relatedness computation

Big Ben

Night

Clock

Tower

England

London eye

(e) Assignment in bipartite graph

Big ben

Night

Clock

Tower

England

London eye

(f) Applications

Searching Recommendation

Clustering

Big Ben at night London through my lens

Big Ben at night London through my lens

London eye

f1 Big Ben night clock tower England

f2 London eye

Big Ben at night London through my lens

S1

S1S2S3S4S5

Figure 3 The illustration of the proposed method

Table 1 The variables and parameters used in the proposed com-putation model

Name Description119891 An image119905 A tag119904(119891) Tags set of an imagesr(1199051 1199052) Semantic relatedness of two tags

sr(1198911 1198912) Semantic relatedness of two images

119873(119905) Page counts of a tag119873(119904(119891)) Set of page counts of an imagepos(119905) Position information of a tag

into two aspects [37] taxonomy-based methods and web-based methods Taxonomy-based methods use informationtheory and hierarchical taxonomy such as WordNet tomeasure semantic relatedness On the contrary web-basedmethods use the web as a live and active corpus instead ofhierarchical taxonomy

In the proposed computation model each tag can beseen as a concept with explicit meaning Thus we use someequations based on cooccurrence of two concepts to measuretheir semantic relatedness The core idea is that ldquoyou shallknow a word by the company it keepsrdquo [38] In this section

four popular cooccurrence measures (ie Jaccard OverlapDice and PMI) are proposed tomeasure semantic relatednessbetween tags

Besides cooccurrence measures the page counts of eachtag from search engine are used Page counts mean thenumber of web pages containing the query 119902 For examplethe page counts of the query ldquoObamardquo inGoogle (httpwwwgooglecom) are 1210000000 (the data was gotten in thedate 9282012) Moreover page counts for the query ldquo119902119860119873119863 119901rdquo can be considered as a measure of cooccurrenceof queries 119902 and 119901 For the remainder of this paper we usethe notation 119873(119901) to denote the page counts of the tag 119901in Google However the respective page counts for the tagpair 119901 and 119902 are not enough for measuring semantic relat-edness The page counts for the query ldquo119902 119860119873119863 119901rdquo shouldbe considered For example when we query ldquoObamardquo andldquoUnited Statesrdquo in Google we can find 485000000 Webpages that is119873(ObamacapUnited States) = 485000000 Thefour cooccurrence measures (ie Jaccard Overlap Dice andPMI) between two tags 119901 and 119902 are as follows

Jaccard (119901 119902) =119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) minus 119873 (119901 cap 119902) (2)

119901 cap 119902 denotes the conjunction query ldquo119901 119860119873119863 119902rdquo

6 The Scientific World Journal

Consider

Overlap (119901 119902) =119873 (119901 cap 119902)

min (119873 (119901) 119873 (119902)) (3)

min(119873(119901)119873(119902))means the lower number of119873(119901) or119873(119902)Consider

Dice (119901 119902) =2 lowast 119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) (4)

According to probability and information theory the mutualinformation (MI) of two random variables is a quantitythat measures the mutual dependence of the two variablesPointwise mutual information (PMI) is a variant of MI (see(5))

PMI (119901 119902) =log ((119873 lowast 119873 (119901 cap 119902)) (119873 (119901) lowast 119873 (119902)))

log119873

(5)

where 119873 is the number of Web pages in the search enginewhich is set to119873 = 1011 according to the number of indexedpages reported by Google

Through (2)ndash(5) we can compute the tag relatedness asfollows

(1) Extracting the tags from two images 1198911and 1198912 which

are denoted by

119904 (1198911) = 119905

1 1199052 119905

|119904(1198911)|

119904 (1198912) = 119905

1 1199052 119905

|119904(1198912)|

(6)

(2) Issue the tags from 1198911and 119891

2as the query to the web

search engine (in this paper we choose Google for itsconvenient API (httpdevelopersgooglecom)) thepage counts can be denoted by

119873(119904 (1198911)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198911)|)

119873 (119904 (1198912)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198912)|)

(7)

(3) Computing the semantic relatedness between eachtags pair from 119891

1and 119891

2by (2)ndash(5) For example if

we use PMI to compute tag semantic relatedness theequation can be

sr (119905119894 119905119895) =

log ((119873 lowast 119873(119905119894cap 119905119895)) (119873 (119905

119894) lowast 119873 (119905

119895)))

log119873

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(8)

From the above steps the tags relatedness can be com-puted which is denoted as a triple ⟨119905

119894 119905119895 sr(119905119894 119905119895)⟩ In the next

section wewill give the detailed analysis for choosing the bestmeasures from (2)ndash(5)

Overall the page counts of each tag should be issuedThen some cooccurrence based measures are used to com-pute the semantic relatedness between tags The reasons forusing page counts based measures are as follows

(1) Appropriate computation complexity Since the relat-edness between each tag pair of two images shouldbe computed the proposed method must be withlow complexity Recently web search engines suchas Google provide API for users to index the pagecounts of each query The web search engine gives anappropriate interface for the proposed computationmodel

(2) Explicit semanticsThe tag given by usersmay not be acorrect concept in taxonomy For example users maygive a tag ldquoBling Blingrdquo for an image about a lovelygirl The word ldquoBlingrdquo cannot be indexed in manytaxonomy such as WorldNet The proposed methoduses web search engine as an open intermediate Theexplicit semantics of the newly emerge concepts canbe gotten by web easily

42 Semantic Relatedness Integration In Section 41 we com-pute the tag pair relatedness of two images Obviously the tagpair relatedness of two images 119891

1and 119891

2can be treated as a

bipartite graph which is denoted by

119866 = (119881 119864)

119881 = 1198911 1198912

119864 = ⟨119905119894 119905119895 sr (119905119894 119905119895)⟩ 119905

119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(9)

Based on (9) we change the semantic relatedness integra-tion of all tag pairs to the problemmdashassignment in bipartitegraph We want to assign a best matching of the bipartitegraph 119866

A matching is defined as119872 sube 119864 so that no two edges in119872 share a common end vertex An assignment in a bipartitegraph is a matching 119872 so that each node of the graph hasan incident edge in 119872 Suppose that the set of vertices arepartitioned in two sets 119891

1and 119891

2 and that the edges of the

graph have an associated weight given by a function 119891 (1198911 1198912) rarr [0 sdot sdot sdot 1] The function maxRel (119891 119891

1 1198912) rarr

[0 sdot sdot sdot 1] returns the maximum weighted assignment that isan assignment so that the average of the weights of theedges is highest Figure 4 shows a graphical representationof the semantic relatedness integration where the bold linesconstitute the matching119872

Based on the expression of the assignment in bipartitegraphs we have

maxRel (119891 1198911 1198912)

=

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198911)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)

1003816100381610038161003816 le1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198912)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)1003816100381610038161003816 gt1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

119868 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198911)

1003816100381610038161003816] 119869 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198912)

1003816100381610038161003816]

(10)

Using the assignment in bipartite graphs problem to ourcontext the variables 119891

1and 119891

2represent the two images to

compute the semantic relatedness For example that 1198911and

The Scientific World Journal 7

10

0702

0508

01

0101

07

0309

f1 f2

t1q1

t2

q2

t3

q3

q4

maxRel(f1 f2) = (sr(t1 q1) + sr(t2 q2) + sr(t3 q4))3 =

(10 + 08 + 07)3 = 083

Figure 4 Graphical representation of the assignment in bipartitegraphs problem

1198912are composed of the tags 119904(119891

1) and 119904(119891

2) |119904(119891

1)| gt |119904(119891

2)|

means that the number of tags in 119904(1198912) is lower than that of

119904(1198911) According to Heuristic 3 we divide the result of the

maximization by the lower cardinality of 119904(1198911) or 119904(119891

2) In this

way the influence of the number of tags is reduced and thesemantic relatedness of two images is symmetric

Beside the cardinality of two tags set 119904(1198911) and 119904(119891

2) the

maxRel function is affected by the relatedness between eachpair of tags According to Heuristics 4 and 5 the redundancyand noise should be avoided In maxRel function the one-to-one map is applied to the tags 119904(119891

1) and 119904(119891

2) Thus the

proposed maxRel function varies with respect to the natureof two images

Adopting the proposed maxRel function we are sure tofind the global maximum relatedness that can be obtainedpairing the elements in the two tags sets Alternativemethodsare able to find only the local maximum since they scroll theelements in the first set and after calculating the relatednesswith all the elements in the second set they select the onewith the maximum relatedness Since every element in oneset must be connected at most at one element in the otherset such a procedure is able to find only the local maximumsince it depends on the order inwhich the comparisons occurFor example considering the example in Figure 4 119905

1will be

paired to 1199021(weight = 10) But when analyzing 119905

3 the

maximum weight is with 1199022(weight = 09) This means that

1199052can nomore be paired to 119902

2even if the weight is maximum

since this is already matched to 1199053 As a consequence 119905

2will

be paired to 1199023and the average of the selected weights will be

(10 + 03 + 09)3 = 073 which is considerably lower thanusing MaxRel where the sum of the weights was (10 + 08 +07)3 = 083

Overall the cardinality of two tag sets is used to followHeuristic 3The one-to-one map of tags pair is used to followHeuristics 4 and 5 The MaxRel function is used to match abest semantic relatedness integration of two images

43 Tag Order Revision According to Heuristic 2 the orderof tags should be considered to compute the semantic relat-edness between two images Intuitively the tags appearingin the first position may be more important than the lattertags Some researches [39] suggest that people used to selectpopular items as their tags Meanwhile the top popular tagsare indeed the ldquomeaningfulrdquo ones

In this section the MaxRel function proposedin Section 42 is revised considering the order of tagsFor example the relatedness of tags pair with high positionshould be enhanced which is summarized as a constrainschema

Schema 1 (tag relatedness declining)This schema means thatthe identical tag pairs of two images 119891

1and 119891

2should be

pruned in MaxRel function In other words the semanticrelatedness of the same tag of two images is set as 0

We add a decline factor to the MaxRel function and thedetailed steps are as follows

(1) According to the MaxRel function in Section 42 thebest matching tag pairs are selected which is denotedby

maxRel (1198911 1198912) = sum sr (119905

119894 119905119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(11)

Of course the selected tag pairs are the best matchingof the bipartite graph between images 119891

1and 119891

2

(2) Computing the position information of each tagwhich is denoted by Pos(119905

119894)

Pos (119905119894) =

1003816100381610038161003816119904 (119891)1003816100381610038161003816 + 1 minus 119894

1003816100381610038161003816119904 (119891)1003816100381610038161003816

119905119894isin 119904 (119891) (12)

(3) Add the position information of each tag to (11)which can be seen as a decline factor

sr (1198911 1198912) = sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(13)

(4) Of course similar to MaxRel function equationshould divide the result of the maximization by

sr (1198911 1198912) =sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

sumPos (119905119894) lowast Pos (119905

119895)

(14)

We also consider the example in Figure 4 According to(14) the semantic relatedness is revised as

(1 sdot 10 sdot 1 +2

3sdot 08 sdot

3

4+1

3sdot 07 sdot

1

4)

times (1 sdot 1 +2

3sdot3

4+1

3sdot1

4)

minus1

= 092

(15)

Besides adding decline factor to the MaxRel function wealso add a constrain schema identical tag pruning

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

The Scientific World Journal 3

Big Ben at night London through my lens London eye

Figure 2 The illustration of a pair of images from Flickr

Section 4 proposes the method for measuring semantic relat-edness of images Experiments are presented in Section 5Conclusions are made in the last section

2 Related Work

In this section we give two related aspects of the proposedwork Some researches about social tags are introducedfirst Then we give the related work about image similaritymeasures

21 On Social Tags In the area about the usage patternsand semantic values of social tags Golder and Huberman[21] mined usage patterns of social tags based on thedelicious (deliciouspost) data set Al-Khalifa and Davis[22] concluded that social tags were semantically richerthan automatically extracted keywords Suchanek et al [23]used YAGO (httpwwwmpi-infmpgdeyago-nagayago)and WordNet (httpwordnetprincetonedu) to check themeaning of social tags and concluded that top tags wereusually meaningful Halpin et al [24] examined why and howthe power law distribution of tag usage frequency was formedin a mature social tagging system over time

Beside research on mining social tags some researchesmodeled the network structure of social tags Cattuto et al[25] investigated the network features of social tags systemwhich is seen as a tripartite graph using metrics adaptedfrom classical networkmeasures Lambiotte andAusloos [26]described the social tags systems as a tripartite network withusers tags and annotated items The proposed tripartitenetwork was projected into the bipartite and unipartitenetwork to discover its structures In [27] the social tagssystem was modeled as a tripartite graph which extendsthe traditional bipartite model of ontologies with a socialdimension

Recently many researchers investigated the applicationsof social tags in information retrieval and ranking In [28]the authors empirically study the potential value of socialannotations for web search Zhou et al [29] proposed amodel using latent dirichlet allocation which incorporatesthe topical background of documents and social tags Xu etal [30] developed a language model for information retrieval

based on metadata property of social tags and their rela-tionships to annotated documents Bao et al [31] introducedtwo ranking methods SocialSimRank which ranked pagesbased on the semantic similarity between tags and pages andSocialPageRank which ranked returned pages based on theirpopularity Schenkel et al [32] developed a top-119896 algorithmwhich ranked search results based on the tags shared by theuser who issued the query and the users who annotated thereturned documents with the query tags

22 On Measuring Images Similarity Measuring semanticsimilarity is a basic issue in computer vision field Usu-ally some low-level visual features are used for similaritymeasures For example shape features texture features andgradient features can be extracted from images Based onthe extracted low-level features distance metrics such as theEuclidean distance the Chi-Square distance the histogramintersection and the EMDdistance are used In this paper theproposed method addresses the problem by semantic-levelfeatures such as social tags

Different from the methods using low-level featuresrecently a number of papers build image representationbased on the outputs of concept classifiers [33] Our obser-vation is that Flickr provides the related social tags by webusers which reflect how people on the internet tend toannotate images Several previous methods [34] learn objectmodels from internet images These methods tend to gathertraining examples using image search results Besides theirapproaches have to alternate between finding good examplesand updating object in order to robust against noisy imagesOn the other hand some papers [35] use images from Flickrgroups other than search engines which is claimed to be cleanenough to produce good classifiers

3 Problem Definition

In this paper we study the problem of measuring semanticrelatedness between images or videoswithmanually providedsocial tags Here a social tag refers to some concepts providedby users which is semantically related to the content of animage or a video The input of the proposed method is a pairof images or videos with social tagsThe goal of the proposedmethod is to identify the semantic relatedness between two

4 The Scientific World Journal

images or videos Figure 2 shows the illustration of a pairof images from Flickr with social tags These two imagesare about ldquoBig Benrdquo and ldquoLondon eyerdquo These two imagesmay be dissimilar according to the traditional similaritymea-surement since they do not share some common low levelvisual similarity But these two images are semantic relatedsince they are both the famous sightseeings of London In theproposedmethod we can compute their semantic relatednessthough they may share little similar visual features

31 Basic Definitions We first introduce three importantdefinitions in this paper the social tags set of an image andthe semantic relatedness between two images

Definition 1 (social tags set of an image) The social tags(denoted by 119905) set of an image 119891 (denoted by 119904(119891)) is a setof tags provided by users of an image

119904 (119891) = 1199051 1199052 119905

|119904(119891)| (1)

For example in Figure 2 the tags of the right images areldquoLondonrdquo and ldquoeyerdquo other than ldquoLondon eyerdquo Since Flickrprovides the related tags of each image we just download thetags by Flickr We do not perform any NLP operations on thetags

Definition 2 (semantic relatedness between tags) Thesemantic relatedness between tags (denoted by sr(119905

1 1199052)) is

the expected correlation of a pair of tags 1199051and 1199052

Definition 3 (semantic relatedness between images) Thesemantic relatedness between images (denoted by sr(119891

1 1198912))

is the expected correlation of a pair of images 1198911and 119891

2

The range of sr(1199051 1199052) and sr(119891

1 1198912) is from 0 to 1 A

high value indicates that semantic relatedness between tagsor images is more likely to be confidential Please notice thatthe definition of sr(119891

1 1198912) can also be extended to videos with

social tags

32 Basic Heuristics Based on common sense and ourobservations on real data we have five heuristics that serveas the base of our computation model

Heuristic 1 Usually each tag of an image appears only onetime

Different from writing sentences users usually annotatean image with different tags For example the possibilityof using tags ldquoapple apple applerdquo for an image is very lowTherefore in this paper we do not employ any weightingscheme for tags such as tf-idf [36]

Heuristic 2 The order of the tags may reflect the correlationagainst the annotated image

Different tag reflects the different aspects of an imageAccording to Heuristic 1 the weight of a tag against the imagecannot be obtained Fortunately the order of the tags can begotten since user may provide tags one by one

Heuristic 3 The number of tags of an image may not berelevant to the annotation correctness

Different users may give different tags about the sameimage For example users may give tags such as ldquoappleiPhonerdquo or ldquoiPhone4mobile phonerdquo for the same image aboutiPhone It is hardly to say which tag is better for annotationthough the latter annotation has three tags

Heuristic 4 Usually some tags may be redundant for anno-tating an image

Of course users may give similar tags for an image Forexample the tag ldquoapple iPhonerdquo may be redundant sinceiPhone is very semantic similar to apple

Heuristic 5 Usually some tags may be noisy for annotatingan image

Users may give inappropriate or even false tags for animage For example the tags ldquoiPhonerdquo are false for an imageabout the iPod

4 Computation Model

In this section we propose the computation model formeasuring semantic relatedness between images Based onthe above five heuristics the social tags provided by usersare used in our computation model Overall the proposedcomputation model is divided into three steps

(1) Tag relatedness computation In this step based onHeuristic 1 all of the tag pairs between two images arecomputed

(2) Semantic relatedness integration In this step basedon Heuristics 3ndash5 we measure semantic relatednessbetween images

(3) Tag order revision In this step based on Heuristic 2the image relatedness on step 2 is revised

Table 1 shows the variables and parameters used in thefollowing discussion Figure 3 illustrates an overview of theproposed computation model

41 Tag Relatedness Computation According to Definition 1an image can be represented as a set of tags provided byusers As for the semantic relatedness of a pair of images wecan measure the semantic relatedness between tags of theseimages For example two images with tags ldquoapple iPhonerdquoand ldquoiPod Nanordquo we can measure the semantic relatednessbetween these tags Since the number of each tag is usu-ally one according to Heuristic 1 the semantic relatednessbetween tags can be computed without considering theirweight

Many differentmethods of semantic relatednessmeasuresbetween concepts have been proposed which can be divided

The Scientific World Journal 5

(a) Input

(b) Tag extraction

(c) Page counts repository

(d) Tag relatedness computation

Big Ben

Night

Clock

Tower

England

London eye

(e) Assignment in bipartite graph

Big ben

Night

Clock

Tower

England

London eye

(f) Applications

Searching Recommendation

Clustering

Big Ben at night London through my lens

Big Ben at night London through my lens

London eye

f1 Big Ben night clock tower England

f2 London eye

Big Ben at night London through my lens

S1

S1S2S3S4S5

Figure 3 The illustration of the proposed method

Table 1 The variables and parameters used in the proposed com-putation model

Name Description119891 An image119905 A tag119904(119891) Tags set of an imagesr(1199051 1199052) Semantic relatedness of two tags

sr(1198911 1198912) Semantic relatedness of two images

119873(119905) Page counts of a tag119873(119904(119891)) Set of page counts of an imagepos(119905) Position information of a tag

into two aspects [37] taxonomy-based methods and web-based methods Taxonomy-based methods use informationtheory and hierarchical taxonomy such as WordNet tomeasure semantic relatedness On the contrary web-basedmethods use the web as a live and active corpus instead ofhierarchical taxonomy

In the proposed computation model each tag can beseen as a concept with explicit meaning Thus we use someequations based on cooccurrence of two concepts to measuretheir semantic relatedness The core idea is that ldquoyou shallknow a word by the company it keepsrdquo [38] In this section

four popular cooccurrence measures (ie Jaccard OverlapDice and PMI) are proposed tomeasure semantic relatednessbetween tags

Besides cooccurrence measures the page counts of eachtag from search engine are used Page counts mean thenumber of web pages containing the query 119902 For examplethe page counts of the query ldquoObamardquo inGoogle (httpwwwgooglecom) are 1210000000 (the data was gotten in thedate 9282012) Moreover page counts for the query ldquo119902119860119873119863 119901rdquo can be considered as a measure of cooccurrenceof queries 119902 and 119901 For the remainder of this paper we usethe notation 119873(119901) to denote the page counts of the tag 119901in Google However the respective page counts for the tagpair 119901 and 119902 are not enough for measuring semantic relat-edness The page counts for the query ldquo119902 119860119873119863 119901rdquo shouldbe considered For example when we query ldquoObamardquo andldquoUnited Statesrdquo in Google we can find 485000000 Webpages that is119873(ObamacapUnited States) = 485000000 Thefour cooccurrence measures (ie Jaccard Overlap Dice andPMI) between two tags 119901 and 119902 are as follows

Jaccard (119901 119902) =119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) minus 119873 (119901 cap 119902) (2)

119901 cap 119902 denotes the conjunction query ldquo119901 119860119873119863 119902rdquo

6 The Scientific World Journal

Consider

Overlap (119901 119902) =119873 (119901 cap 119902)

min (119873 (119901) 119873 (119902)) (3)

min(119873(119901)119873(119902))means the lower number of119873(119901) or119873(119902)Consider

Dice (119901 119902) =2 lowast 119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) (4)

According to probability and information theory the mutualinformation (MI) of two random variables is a quantitythat measures the mutual dependence of the two variablesPointwise mutual information (PMI) is a variant of MI (see(5))

PMI (119901 119902) =log ((119873 lowast 119873 (119901 cap 119902)) (119873 (119901) lowast 119873 (119902)))

log119873

(5)

where 119873 is the number of Web pages in the search enginewhich is set to119873 = 1011 according to the number of indexedpages reported by Google

Through (2)ndash(5) we can compute the tag relatedness asfollows

(1) Extracting the tags from two images 1198911and 1198912 which

are denoted by

119904 (1198911) = 119905

1 1199052 119905

|119904(1198911)|

119904 (1198912) = 119905

1 1199052 119905

|119904(1198912)|

(6)

(2) Issue the tags from 1198911and 119891

2as the query to the web

search engine (in this paper we choose Google for itsconvenient API (httpdevelopersgooglecom)) thepage counts can be denoted by

119873(119904 (1198911)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198911)|)

119873 (119904 (1198912)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198912)|)

(7)

(3) Computing the semantic relatedness between eachtags pair from 119891

1and 119891

2by (2)ndash(5) For example if

we use PMI to compute tag semantic relatedness theequation can be

sr (119905119894 119905119895) =

log ((119873 lowast 119873(119905119894cap 119905119895)) (119873 (119905

119894) lowast 119873 (119905

119895)))

log119873

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(8)

From the above steps the tags relatedness can be com-puted which is denoted as a triple ⟨119905

119894 119905119895 sr(119905119894 119905119895)⟩ In the next

section wewill give the detailed analysis for choosing the bestmeasures from (2)ndash(5)

Overall the page counts of each tag should be issuedThen some cooccurrence based measures are used to com-pute the semantic relatedness between tags The reasons forusing page counts based measures are as follows

(1) Appropriate computation complexity Since the relat-edness between each tag pair of two images shouldbe computed the proposed method must be withlow complexity Recently web search engines suchas Google provide API for users to index the pagecounts of each query The web search engine gives anappropriate interface for the proposed computationmodel

(2) Explicit semanticsThe tag given by usersmay not be acorrect concept in taxonomy For example users maygive a tag ldquoBling Blingrdquo for an image about a lovelygirl The word ldquoBlingrdquo cannot be indexed in manytaxonomy such as WorldNet The proposed methoduses web search engine as an open intermediate Theexplicit semantics of the newly emerge concepts canbe gotten by web easily

42 Semantic Relatedness Integration In Section 41 we com-pute the tag pair relatedness of two images Obviously the tagpair relatedness of two images 119891

1and 119891

2can be treated as a

bipartite graph which is denoted by

119866 = (119881 119864)

119881 = 1198911 1198912

119864 = ⟨119905119894 119905119895 sr (119905119894 119905119895)⟩ 119905

119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(9)

Based on (9) we change the semantic relatedness integra-tion of all tag pairs to the problemmdashassignment in bipartitegraph We want to assign a best matching of the bipartitegraph 119866

A matching is defined as119872 sube 119864 so that no two edges in119872 share a common end vertex An assignment in a bipartitegraph is a matching 119872 so that each node of the graph hasan incident edge in 119872 Suppose that the set of vertices arepartitioned in two sets 119891

1and 119891

2 and that the edges of the

graph have an associated weight given by a function 119891 (1198911 1198912) rarr [0 sdot sdot sdot 1] The function maxRel (119891 119891

1 1198912) rarr

[0 sdot sdot sdot 1] returns the maximum weighted assignment that isan assignment so that the average of the weights of theedges is highest Figure 4 shows a graphical representationof the semantic relatedness integration where the bold linesconstitute the matching119872

Based on the expression of the assignment in bipartitegraphs we have

maxRel (119891 1198911 1198912)

=

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198911)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)

1003816100381610038161003816 le1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198912)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)1003816100381610038161003816 gt1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

119868 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198911)

1003816100381610038161003816] 119869 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198912)

1003816100381610038161003816]

(10)

Using the assignment in bipartite graphs problem to ourcontext the variables 119891

1and 119891

2represent the two images to

compute the semantic relatedness For example that 1198911and

The Scientific World Journal 7

10

0702

0508

01

0101

07

0309

f1 f2

t1q1

t2

q2

t3

q3

q4

maxRel(f1 f2) = (sr(t1 q1) + sr(t2 q2) + sr(t3 q4))3 =

(10 + 08 + 07)3 = 083

Figure 4 Graphical representation of the assignment in bipartitegraphs problem

1198912are composed of the tags 119904(119891

1) and 119904(119891

2) |119904(119891

1)| gt |119904(119891

2)|

means that the number of tags in 119904(1198912) is lower than that of

119904(1198911) According to Heuristic 3 we divide the result of the

maximization by the lower cardinality of 119904(1198911) or 119904(119891

2) In this

way the influence of the number of tags is reduced and thesemantic relatedness of two images is symmetric

Beside the cardinality of two tags set 119904(1198911) and 119904(119891

2) the

maxRel function is affected by the relatedness between eachpair of tags According to Heuristics 4 and 5 the redundancyand noise should be avoided In maxRel function the one-to-one map is applied to the tags 119904(119891

1) and 119904(119891

2) Thus the

proposed maxRel function varies with respect to the natureof two images

Adopting the proposed maxRel function we are sure tofind the global maximum relatedness that can be obtainedpairing the elements in the two tags sets Alternativemethodsare able to find only the local maximum since they scroll theelements in the first set and after calculating the relatednesswith all the elements in the second set they select the onewith the maximum relatedness Since every element in oneset must be connected at most at one element in the otherset such a procedure is able to find only the local maximumsince it depends on the order inwhich the comparisons occurFor example considering the example in Figure 4 119905

1will be

paired to 1199021(weight = 10) But when analyzing 119905

3 the

maximum weight is with 1199022(weight = 09) This means that

1199052can nomore be paired to 119902

2even if the weight is maximum

since this is already matched to 1199053 As a consequence 119905

2will

be paired to 1199023and the average of the selected weights will be

(10 + 03 + 09)3 = 073 which is considerably lower thanusing MaxRel where the sum of the weights was (10 + 08 +07)3 = 083

Overall the cardinality of two tag sets is used to followHeuristic 3The one-to-one map of tags pair is used to followHeuristics 4 and 5 The MaxRel function is used to match abest semantic relatedness integration of two images

43 Tag Order Revision According to Heuristic 2 the orderof tags should be considered to compute the semantic relat-edness between two images Intuitively the tags appearingin the first position may be more important than the lattertags Some researches [39] suggest that people used to selectpopular items as their tags Meanwhile the top popular tagsare indeed the ldquomeaningfulrdquo ones

In this section the MaxRel function proposedin Section 42 is revised considering the order of tagsFor example the relatedness of tags pair with high positionshould be enhanced which is summarized as a constrainschema

Schema 1 (tag relatedness declining)This schema means thatthe identical tag pairs of two images 119891

1and 119891

2should be

pruned in MaxRel function In other words the semanticrelatedness of the same tag of two images is set as 0

We add a decline factor to the MaxRel function and thedetailed steps are as follows

(1) According to the MaxRel function in Section 42 thebest matching tag pairs are selected which is denotedby

maxRel (1198911 1198912) = sum sr (119905

119894 119905119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(11)

Of course the selected tag pairs are the best matchingof the bipartite graph between images 119891

1and 119891

2

(2) Computing the position information of each tagwhich is denoted by Pos(119905

119894)

Pos (119905119894) =

1003816100381610038161003816119904 (119891)1003816100381610038161003816 + 1 minus 119894

1003816100381610038161003816119904 (119891)1003816100381610038161003816

119905119894isin 119904 (119891) (12)

(3) Add the position information of each tag to (11)which can be seen as a decline factor

sr (1198911 1198912) = sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(13)

(4) Of course similar to MaxRel function equationshould divide the result of the maximization by

sr (1198911 1198912) =sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

sumPos (119905119894) lowast Pos (119905

119895)

(14)

We also consider the example in Figure 4 According to(14) the semantic relatedness is revised as

(1 sdot 10 sdot 1 +2

3sdot 08 sdot

3

4+1

3sdot 07 sdot

1

4)

times (1 sdot 1 +2

3sdot3

4+1

3sdot1

4)

minus1

= 092

(15)

Besides adding decline factor to the MaxRel function wealso add a constrain schema identical tag pruning

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

4 The Scientific World Journal

images or videos Figure 2 shows the illustration of a pairof images from Flickr with social tags These two imagesare about ldquoBig Benrdquo and ldquoLondon eyerdquo These two imagesmay be dissimilar according to the traditional similaritymea-surement since they do not share some common low levelvisual similarity But these two images are semantic relatedsince they are both the famous sightseeings of London In theproposedmethod we can compute their semantic relatednessthough they may share little similar visual features

31 Basic Definitions We first introduce three importantdefinitions in this paper the social tags set of an image andthe semantic relatedness between two images

Definition 1 (social tags set of an image) The social tags(denoted by 119905) set of an image 119891 (denoted by 119904(119891)) is a setof tags provided by users of an image

119904 (119891) = 1199051 1199052 119905

|119904(119891)| (1)

For example in Figure 2 the tags of the right images areldquoLondonrdquo and ldquoeyerdquo other than ldquoLondon eyerdquo Since Flickrprovides the related tags of each image we just download thetags by Flickr We do not perform any NLP operations on thetags

Definition 2 (semantic relatedness between tags) Thesemantic relatedness between tags (denoted by sr(119905

1 1199052)) is

the expected correlation of a pair of tags 1199051and 1199052

Definition 3 (semantic relatedness between images) Thesemantic relatedness between images (denoted by sr(119891

1 1198912))

is the expected correlation of a pair of images 1198911and 119891

2

The range of sr(1199051 1199052) and sr(119891

1 1198912) is from 0 to 1 A

high value indicates that semantic relatedness between tagsor images is more likely to be confidential Please notice thatthe definition of sr(119891

1 1198912) can also be extended to videos with

social tags

32 Basic Heuristics Based on common sense and ourobservations on real data we have five heuristics that serveas the base of our computation model

Heuristic 1 Usually each tag of an image appears only onetime

Different from writing sentences users usually annotatean image with different tags For example the possibilityof using tags ldquoapple apple applerdquo for an image is very lowTherefore in this paper we do not employ any weightingscheme for tags such as tf-idf [36]

Heuristic 2 The order of the tags may reflect the correlationagainst the annotated image

Different tag reflects the different aspects of an imageAccording to Heuristic 1 the weight of a tag against the imagecannot be obtained Fortunately the order of the tags can begotten since user may provide tags one by one

Heuristic 3 The number of tags of an image may not berelevant to the annotation correctness

Different users may give different tags about the sameimage For example users may give tags such as ldquoappleiPhonerdquo or ldquoiPhone4mobile phonerdquo for the same image aboutiPhone It is hardly to say which tag is better for annotationthough the latter annotation has three tags

Heuristic 4 Usually some tags may be redundant for anno-tating an image

Of course users may give similar tags for an image Forexample the tag ldquoapple iPhonerdquo may be redundant sinceiPhone is very semantic similar to apple

Heuristic 5 Usually some tags may be noisy for annotatingan image

Users may give inappropriate or even false tags for animage For example the tags ldquoiPhonerdquo are false for an imageabout the iPod

4 Computation Model

In this section we propose the computation model formeasuring semantic relatedness between images Based onthe above five heuristics the social tags provided by usersare used in our computation model Overall the proposedcomputation model is divided into three steps

(1) Tag relatedness computation In this step based onHeuristic 1 all of the tag pairs between two images arecomputed

(2) Semantic relatedness integration In this step basedon Heuristics 3ndash5 we measure semantic relatednessbetween images

(3) Tag order revision In this step based on Heuristic 2the image relatedness on step 2 is revised

Table 1 shows the variables and parameters used in thefollowing discussion Figure 3 illustrates an overview of theproposed computation model

41 Tag Relatedness Computation According to Definition 1an image can be represented as a set of tags provided byusers As for the semantic relatedness of a pair of images wecan measure the semantic relatedness between tags of theseimages For example two images with tags ldquoapple iPhonerdquoand ldquoiPod Nanordquo we can measure the semantic relatednessbetween these tags Since the number of each tag is usu-ally one according to Heuristic 1 the semantic relatednessbetween tags can be computed without considering theirweight

Many differentmethods of semantic relatednessmeasuresbetween concepts have been proposed which can be divided

The Scientific World Journal 5

(a) Input

(b) Tag extraction

(c) Page counts repository

(d) Tag relatedness computation

Big Ben

Night

Clock

Tower

England

London eye

(e) Assignment in bipartite graph

Big ben

Night

Clock

Tower

England

London eye

(f) Applications

Searching Recommendation

Clustering

Big Ben at night London through my lens

Big Ben at night London through my lens

London eye

f1 Big Ben night clock tower England

f2 London eye

Big Ben at night London through my lens

S1

S1S2S3S4S5

Figure 3 The illustration of the proposed method

Table 1 The variables and parameters used in the proposed com-putation model

Name Description119891 An image119905 A tag119904(119891) Tags set of an imagesr(1199051 1199052) Semantic relatedness of two tags

sr(1198911 1198912) Semantic relatedness of two images

119873(119905) Page counts of a tag119873(119904(119891)) Set of page counts of an imagepos(119905) Position information of a tag

into two aspects [37] taxonomy-based methods and web-based methods Taxonomy-based methods use informationtheory and hierarchical taxonomy such as WordNet tomeasure semantic relatedness On the contrary web-basedmethods use the web as a live and active corpus instead ofhierarchical taxonomy

In the proposed computation model each tag can beseen as a concept with explicit meaning Thus we use someequations based on cooccurrence of two concepts to measuretheir semantic relatedness The core idea is that ldquoyou shallknow a word by the company it keepsrdquo [38] In this section

four popular cooccurrence measures (ie Jaccard OverlapDice and PMI) are proposed tomeasure semantic relatednessbetween tags

Besides cooccurrence measures the page counts of eachtag from search engine are used Page counts mean thenumber of web pages containing the query 119902 For examplethe page counts of the query ldquoObamardquo inGoogle (httpwwwgooglecom) are 1210000000 (the data was gotten in thedate 9282012) Moreover page counts for the query ldquo119902119860119873119863 119901rdquo can be considered as a measure of cooccurrenceof queries 119902 and 119901 For the remainder of this paper we usethe notation 119873(119901) to denote the page counts of the tag 119901in Google However the respective page counts for the tagpair 119901 and 119902 are not enough for measuring semantic relat-edness The page counts for the query ldquo119902 119860119873119863 119901rdquo shouldbe considered For example when we query ldquoObamardquo andldquoUnited Statesrdquo in Google we can find 485000000 Webpages that is119873(ObamacapUnited States) = 485000000 Thefour cooccurrence measures (ie Jaccard Overlap Dice andPMI) between two tags 119901 and 119902 are as follows

Jaccard (119901 119902) =119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) minus 119873 (119901 cap 119902) (2)

119901 cap 119902 denotes the conjunction query ldquo119901 119860119873119863 119902rdquo

6 The Scientific World Journal

Consider

Overlap (119901 119902) =119873 (119901 cap 119902)

min (119873 (119901) 119873 (119902)) (3)

min(119873(119901)119873(119902))means the lower number of119873(119901) or119873(119902)Consider

Dice (119901 119902) =2 lowast 119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) (4)

According to probability and information theory the mutualinformation (MI) of two random variables is a quantitythat measures the mutual dependence of the two variablesPointwise mutual information (PMI) is a variant of MI (see(5))

PMI (119901 119902) =log ((119873 lowast 119873 (119901 cap 119902)) (119873 (119901) lowast 119873 (119902)))

log119873

(5)

where 119873 is the number of Web pages in the search enginewhich is set to119873 = 1011 according to the number of indexedpages reported by Google

Through (2)ndash(5) we can compute the tag relatedness asfollows

(1) Extracting the tags from two images 1198911and 1198912 which

are denoted by

119904 (1198911) = 119905

1 1199052 119905

|119904(1198911)|

119904 (1198912) = 119905

1 1199052 119905

|119904(1198912)|

(6)

(2) Issue the tags from 1198911and 119891

2as the query to the web

search engine (in this paper we choose Google for itsconvenient API (httpdevelopersgooglecom)) thepage counts can be denoted by

119873(119904 (1198911)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198911)|)

119873 (119904 (1198912)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198912)|)

(7)

(3) Computing the semantic relatedness between eachtags pair from 119891

1and 119891

2by (2)ndash(5) For example if

we use PMI to compute tag semantic relatedness theequation can be

sr (119905119894 119905119895) =

log ((119873 lowast 119873(119905119894cap 119905119895)) (119873 (119905

119894) lowast 119873 (119905

119895)))

log119873

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(8)

From the above steps the tags relatedness can be com-puted which is denoted as a triple ⟨119905

119894 119905119895 sr(119905119894 119905119895)⟩ In the next

section wewill give the detailed analysis for choosing the bestmeasures from (2)ndash(5)

Overall the page counts of each tag should be issuedThen some cooccurrence based measures are used to com-pute the semantic relatedness between tags The reasons forusing page counts based measures are as follows

(1) Appropriate computation complexity Since the relat-edness between each tag pair of two images shouldbe computed the proposed method must be withlow complexity Recently web search engines suchas Google provide API for users to index the pagecounts of each query The web search engine gives anappropriate interface for the proposed computationmodel

(2) Explicit semanticsThe tag given by usersmay not be acorrect concept in taxonomy For example users maygive a tag ldquoBling Blingrdquo for an image about a lovelygirl The word ldquoBlingrdquo cannot be indexed in manytaxonomy such as WorldNet The proposed methoduses web search engine as an open intermediate Theexplicit semantics of the newly emerge concepts canbe gotten by web easily

42 Semantic Relatedness Integration In Section 41 we com-pute the tag pair relatedness of two images Obviously the tagpair relatedness of two images 119891

1and 119891

2can be treated as a

bipartite graph which is denoted by

119866 = (119881 119864)

119881 = 1198911 1198912

119864 = ⟨119905119894 119905119895 sr (119905119894 119905119895)⟩ 119905

119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(9)

Based on (9) we change the semantic relatedness integra-tion of all tag pairs to the problemmdashassignment in bipartitegraph We want to assign a best matching of the bipartitegraph 119866

A matching is defined as119872 sube 119864 so that no two edges in119872 share a common end vertex An assignment in a bipartitegraph is a matching 119872 so that each node of the graph hasan incident edge in 119872 Suppose that the set of vertices arepartitioned in two sets 119891

1and 119891

2 and that the edges of the

graph have an associated weight given by a function 119891 (1198911 1198912) rarr [0 sdot sdot sdot 1] The function maxRel (119891 119891

1 1198912) rarr

[0 sdot sdot sdot 1] returns the maximum weighted assignment that isan assignment so that the average of the weights of theedges is highest Figure 4 shows a graphical representationof the semantic relatedness integration where the bold linesconstitute the matching119872

Based on the expression of the assignment in bipartitegraphs we have

maxRel (119891 1198911 1198912)

=

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198911)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)

1003816100381610038161003816 le1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198912)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)1003816100381610038161003816 gt1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

119868 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198911)

1003816100381610038161003816] 119869 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198912)

1003816100381610038161003816]

(10)

Using the assignment in bipartite graphs problem to ourcontext the variables 119891

1and 119891

2represent the two images to

compute the semantic relatedness For example that 1198911and

The Scientific World Journal 7

10

0702

0508

01

0101

07

0309

f1 f2

t1q1

t2

q2

t3

q3

q4

maxRel(f1 f2) = (sr(t1 q1) + sr(t2 q2) + sr(t3 q4))3 =

(10 + 08 + 07)3 = 083

Figure 4 Graphical representation of the assignment in bipartitegraphs problem

1198912are composed of the tags 119904(119891

1) and 119904(119891

2) |119904(119891

1)| gt |119904(119891

2)|

means that the number of tags in 119904(1198912) is lower than that of

119904(1198911) According to Heuristic 3 we divide the result of the

maximization by the lower cardinality of 119904(1198911) or 119904(119891

2) In this

way the influence of the number of tags is reduced and thesemantic relatedness of two images is symmetric

Beside the cardinality of two tags set 119904(1198911) and 119904(119891

2) the

maxRel function is affected by the relatedness between eachpair of tags According to Heuristics 4 and 5 the redundancyand noise should be avoided In maxRel function the one-to-one map is applied to the tags 119904(119891

1) and 119904(119891

2) Thus the

proposed maxRel function varies with respect to the natureof two images

Adopting the proposed maxRel function we are sure tofind the global maximum relatedness that can be obtainedpairing the elements in the two tags sets Alternativemethodsare able to find only the local maximum since they scroll theelements in the first set and after calculating the relatednesswith all the elements in the second set they select the onewith the maximum relatedness Since every element in oneset must be connected at most at one element in the otherset such a procedure is able to find only the local maximumsince it depends on the order inwhich the comparisons occurFor example considering the example in Figure 4 119905

1will be

paired to 1199021(weight = 10) But when analyzing 119905

3 the

maximum weight is with 1199022(weight = 09) This means that

1199052can nomore be paired to 119902

2even if the weight is maximum

since this is already matched to 1199053 As a consequence 119905

2will

be paired to 1199023and the average of the selected weights will be

(10 + 03 + 09)3 = 073 which is considerably lower thanusing MaxRel where the sum of the weights was (10 + 08 +07)3 = 083

Overall the cardinality of two tag sets is used to followHeuristic 3The one-to-one map of tags pair is used to followHeuristics 4 and 5 The MaxRel function is used to match abest semantic relatedness integration of two images

43 Tag Order Revision According to Heuristic 2 the orderof tags should be considered to compute the semantic relat-edness between two images Intuitively the tags appearingin the first position may be more important than the lattertags Some researches [39] suggest that people used to selectpopular items as their tags Meanwhile the top popular tagsare indeed the ldquomeaningfulrdquo ones

In this section the MaxRel function proposedin Section 42 is revised considering the order of tagsFor example the relatedness of tags pair with high positionshould be enhanced which is summarized as a constrainschema

Schema 1 (tag relatedness declining)This schema means thatthe identical tag pairs of two images 119891

1and 119891

2should be

pruned in MaxRel function In other words the semanticrelatedness of the same tag of two images is set as 0

We add a decline factor to the MaxRel function and thedetailed steps are as follows

(1) According to the MaxRel function in Section 42 thebest matching tag pairs are selected which is denotedby

maxRel (1198911 1198912) = sum sr (119905

119894 119905119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(11)

Of course the selected tag pairs are the best matchingof the bipartite graph between images 119891

1and 119891

2

(2) Computing the position information of each tagwhich is denoted by Pos(119905

119894)

Pos (119905119894) =

1003816100381610038161003816119904 (119891)1003816100381610038161003816 + 1 minus 119894

1003816100381610038161003816119904 (119891)1003816100381610038161003816

119905119894isin 119904 (119891) (12)

(3) Add the position information of each tag to (11)which can be seen as a decline factor

sr (1198911 1198912) = sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(13)

(4) Of course similar to MaxRel function equationshould divide the result of the maximization by

sr (1198911 1198912) =sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

sumPos (119905119894) lowast Pos (119905

119895)

(14)

We also consider the example in Figure 4 According to(14) the semantic relatedness is revised as

(1 sdot 10 sdot 1 +2

3sdot 08 sdot

3

4+1

3sdot 07 sdot

1

4)

times (1 sdot 1 +2

3sdot3

4+1

3sdot1

4)

minus1

= 092

(15)

Besides adding decline factor to the MaxRel function wealso add a constrain schema identical tag pruning

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

The Scientific World Journal 5

(a) Input

(b) Tag extraction

(c) Page counts repository

(d) Tag relatedness computation

Big Ben

Night

Clock

Tower

England

London eye

(e) Assignment in bipartite graph

Big ben

Night

Clock

Tower

England

London eye

(f) Applications

Searching Recommendation

Clustering

Big Ben at night London through my lens

Big Ben at night London through my lens

London eye

f1 Big Ben night clock tower England

f2 London eye

Big Ben at night London through my lens

S1

S1S2S3S4S5

Figure 3 The illustration of the proposed method

Table 1 The variables and parameters used in the proposed com-putation model

Name Description119891 An image119905 A tag119904(119891) Tags set of an imagesr(1199051 1199052) Semantic relatedness of two tags

sr(1198911 1198912) Semantic relatedness of two images

119873(119905) Page counts of a tag119873(119904(119891)) Set of page counts of an imagepos(119905) Position information of a tag

into two aspects [37] taxonomy-based methods and web-based methods Taxonomy-based methods use informationtheory and hierarchical taxonomy such as WordNet tomeasure semantic relatedness On the contrary web-basedmethods use the web as a live and active corpus instead ofhierarchical taxonomy

In the proposed computation model each tag can beseen as a concept with explicit meaning Thus we use someequations based on cooccurrence of two concepts to measuretheir semantic relatedness The core idea is that ldquoyou shallknow a word by the company it keepsrdquo [38] In this section

four popular cooccurrence measures (ie Jaccard OverlapDice and PMI) are proposed tomeasure semantic relatednessbetween tags

Besides cooccurrence measures the page counts of eachtag from search engine are used Page counts mean thenumber of web pages containing the query 119902 For examplethe page counts of the query ldquoObamardquo inGoogle (httpwwwgooglecom) are 1210000000 (the data was gotten in thedate 9282012) Moreover page counts for the query ldquo119902119860119873119863 119901rdquo can be considered as a measure of cooccurrenceof queries 119902 and 119901 For the remainder of this paper we usethe notation 119873(119901) to denote the page counts of the tag 119901in Google However the respective page counts for the tagpair 119901 and 119902 are not enough for measuring semantic relat-edness The page counts for the query ldquo119902 119860119873119863 119901rdquo shouldbe considered For example when we query ldquoObamardquo andldquoUnited Statesrdquo in Google we can find 485000000 Webpages that is119873(ObamacapUnited States) = 485000000 Thefour cooccurrence measures (ie Jaccard Overlap Dice andPMI) between two tags 119901 and 119902 are as follows

Jaccard (119901 119902) =119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) minus 119873 (119901 cap 119902) (2)

119901 cap 119902 denotes the conjunction query ldquo119901 119860119873119863 119902rdquo

6 The Scientific World Journal

Consider

Overlap (119901 119902) =119873 (119901 cap 119902)

min (119873 (119901) 119873 (119902)) (3)

min(119873(119901)119873(119902))means the lower number of119873(119901) or119873(119902)Consider

Dice (119901 119902) =2 lowast 119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) (4)

According to probability and information theory the mutualinformation (MI) of two random variables is a quantitythat measures the mutual dependence of the two variablesPointwise mutual information (PMI) is a variant of MI (see(5))

PMI (119901 119902) =log ((119873 lowast 119873 (119901 cap 119902)) (119873 (119901) lowast 119873 (119902)))

log119873

(5)

where 119873 is the number of Web pages in the search enginewhich is set to119873 = 1011 according to the number of indexedpages reported by Google

Through (2)ndash(5) we can compute the tag relatedness asfollows

(1) Extracting the tags from two images 1198911and 1198912 which

are denoted by

119904 (1198911) = 119905

1 1199052 119905

|119904(1198911)|

119904 (1198912) = 119905

1 1199052 119905

|119904(1198912)|

(6)

(2) Issue the tags from 1198911and 119891

2as the query to the web

search engine (in this paper we choose Google for itsconvenient API (httpdevelopersgooglecom)) thepage counts can be denoted by

119873(119904 (1198911)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198911)|)

119873 (119904 (1198912)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198912)|)

(7)

(3) Computing the semantic relatedness between eachtags pair from 119891

1and 119891

2by (2)ndash(5) For example if

we use PMI to compute tag semantic relatedness theequation can be

sr (119905119894 119905119895) =

log ((119873 lowast 119873(119905119894cap 119905119895)) (119873 (119905

119894) lowast 119873 (119905

119895)))

log119873

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(8)

From the above steps the tags relatedness can be com-puted which is denoted as a triple ⟨119905

119894 119905119895 sr(119905119894 119905119895)⟩ In the next

section wewill give the detailed analysis for choosing the bestmeasures from (2)ndash(5)

Overall the page counts of each tag should be issuedThen some cooccurrence based measures are used to com-pute the semantic relatedness between tags The reasons forusing page counts based measures are as follows

(1) Appropriate computation complexity Since the relat-edness between each tag pair of two images shouldbe computed the proposed method must be withlow complexity Recently web search engines suchas Google provide API for users to index the pagecounts of each query The web search engine gives anappropriate interface for the proposed computationmodel

(2) Explicit semanticsThe tag given by usersmay not be acorrect concept in taxonomy For example users maygive a tag ldquoBling Blingrdquo for an image about a lovelygirl The word ldquoBlingrdquo cannot be indexed in manytaxonomy such as WorldNet The proposed methoduses web search engine as an open intermediate Theexplicit semantics of the newly emerge concepts canbe gotten by web easily

42 Semantic Relatedness Integration In Section 41 we com-pute the tag pair relatedness of two images Obviously the tagpair relatedness of two images 119891

1and 119891

2can be treated as a

bipartite graph which is denoted by

119866 = (119881 119864)

119881 = 1198911 1198912

119864 = ⟨119905119894 119905119895 sr (119905119894 119905119895)⟩ 119905

119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(9)

Based on (9) we change the semantic relatedness integra-tion of all tag pairs to the problemmdashassignment in bipartitegraph We want to assign a best matching of the bipartitegraph 119866

A matching is defined as119872 sube 119864 so that no two edges in119872 share a common end vertex An assignment in a bipartitegraph is a matching 119872 so that each node of the graph hasan incident edge in 119872 Suppose that the set of vertices arepartitioned in two sets 119891

1and 119891

2 and that the edges of the

graph have an associated weight given by a function 119891 (1198911 1198912) rarr [0 sdot sdot sdot 1] The function maxRel (119891 119891

1 1198912) rarr

[0 sdot sdot sdot 1] returns the maximum weighted assignment that isan assignment so that the average of the weights of theedges is highest Figure 4 shows a graphical representationof the semantic relatedness integration where the bold linesconstitute the matching119872

Based on the expression of the assignment in bipartitegraphs we have

maxRel (119891 1198911 1198912)

=

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198911)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)

1003816100381610038161003816 le1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198912)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)1003816100381610038161003816 gt1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

119868 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198911)

1003816100381610038161003816] 119869 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198912)

1003816100381610038161003816]

(10)

Using the assignment in bipartite graphs problem to ourcontext the variables 119891

1and 119891

2represent the two images to

compute the semantic relatedness For example that 1198911and

The Scientific World Journal 7

10

0702

0508

01

0101

07

0309

f1 f2

t1q1

t2

q2

t3

q3

q4

maxRel(f1 f2) = (sr(t1 q1) + sr(t2 q2) + sr(t3 q4))3 =

(10 + 08 + 07)3 = 083

Figure 4 Graphical representation of the assignment in bipartitegraphs problem

1198912are composed of the tags 119904(119891

1) and 119904(119891

2) |119904(119891

1)| gt |119904(119891

2)|

means that the number of tags in 119904(1198912) is lower than that of

119904(1198911) According to Heuristic 3 we divide the result of the

maximization by the lower cardinality of 119904(1198911) or 119904(119891

2) In this

way the influence of the number of tags is reduced and thesemantic relatedness of two images is symmetric

Beside the cardinality of two tags set 119904(1198911) and 119904(119891

2) the

maxRel function is affected by the relatedness between eachpair of tags According to Heuristics 4 and 5 the redundancyand noise should be avoided In maxRel function the one-to-one map is applied to the tags 119904(119891

1) and 119904(119891

2) Thus the

proposed maxRel function varies with respect to the natureof two images

Adopting the proposed maxRel function we are sure tofind the global maximum relatedness that can be obtainedpairing the elements in the two tags sets Alternativemethodsare able to find only the local maximum since they scroll theelements in the first set and after calculating the relatednesswith all the elements in the second set they select the onewith the maximum relatedness Since every element in oneset must be connected at most at one element in the otherset such a procedure is able to find only the local maximumsince it depends on the order inwhich the comparisons occurFor example considering the example in Figure 4 119905

1will be

paired to 1199021(weight = 10) But when analyzing 119905

3 the

maximum weight is with 1199022(weight = 09) This means that

1199052can nomore be paired to 119902

2even if the weight is maximum

since this is already matched to 1199053 As a consequence 119905

2will

be paired to 1199023and the average of the selected weights will be

(10 + 03 + 09)3 = 073 which is considerably lower thanusing MaxRel where the sum of the weights was (10 + 08 +07)3 = 083

Overall the cardinality of two tag sets is used to followHeuristic 3The one-to-one map of tags pair is used to followHeuristics 4 and 5 The MaxRel function is used to match abest semantic relatedness integration of two images

43 Tag Order Revision According to Heuristic 2 the orderof tags should be considered to compute the semantic relat-edness between two images Intuitively the tags appearingin the first position may be more important than the lattertags Some researches [39] suggest that people used to selectpopular items as their tags Meanwhile the top popular tagsare indeed the ldquomeaningfulrdquo ones

In this section the MaxRel function proposedin Section 42 is revised considering the order of tagsFor example the relatedness of tags pair with high positionshould be enhanced which is summarized as a constrainschema

Schema 1 (tag relatedness declining)This schema means thatthe identical tag pairs of two images 119891

1and 119891

2should be

pruned in MaxRel function In other words the semanticrelatedness of the same tag of two images is set as 0

We add a decline factor to the MaxRel function and thedetailed steps are as follows

(1) According to the MaxRel function in Section 42 thebest matching tag pairs are selected which is denotedby

maxRel (1198911 1198912) = sum sr (119905

119894 119905119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(11)

Of course the selected tag pairs are the best matchingof the bipartite graph between images 119891

1and 119891

2

(2) Computing the position information of each tagwhich is denoted by Pos(119905

119894)

Pos (119905119894) =

1003816100381610038161003816119904 (119891)1003816100381610038161003816 + 1 minus 119894

1003816100381610038161003816119904 (119891)1003816100381610038161003816

119905119894isin 119904 (119891) (12)

(3) Add the position information of each tag to (11)which can be seen as a decline factor

sr (1198911 1198912) = sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(13)

(4) Of course similar to MaxRel function equationshould divide the result of the maximization by

sr (1198911 1198912) =sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

sumPos (119905119894) lowast Pos (119905

119895)

(14)

We also consider the example in Figure 4 According to(14) the semantic relatedness is revised as

(1 sdot 10 sdot 1 +2

3sdot 08 sdot

3

4+1

3sdot 07 sdot

1

4)

times (1 sdot 1 +2

3sdot3

4+1

3sdot1

4)

minus1

= 092

(15)

Besides adding decline factor to the MaxRel function wealso add a constrain schema identical tag pruning

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

6 The Scientific World Journal

Consider

Overlap (119901 119902) =119873 (119901 cap 119902)

min (119873 (119901) 119873 (119902)) (3)

min(119873(119901)119873(119902))means the lower number of119873(119901) or119873(119902)Consider

Dice (119901 119902) =2 lowast 119873 (119901 cap 119902)

119873 (119901) + 119873 (119902) (4)

According to probability and information theory the mutualinformation (MI) of two random variables is a quantitythat measures the mutual dependence of the two variablesPointwise mutual information (PMI) is a variant of MI (see(5))

PMI (119901 119902) =log ((119873 lowast 119873 (119901 cap 119902)) (119873 (119901) lowast 119873 (119902)))

log119873

(5)

where 119873 is the number of Web pages in the search enginewhich is set to119873 = 1011 according to the number of indexedpages reported by Google

Through (2)ndash(5) we can compute the tag relatedness asfollows

(1) Extracting the tags from two images 1198911and 1198912 which

are denoted by

119904 (1198911) = 119905

1 1199052 119905

|119904(1198911)|

119904 (1198912) = 119905

1 1199052 119905

|119904(1198912)|

(6)

(2) Issue the tags from 1198911and 119891

2as the query to the web

search engine (in this paper we choose Google for itsconvenient API (httpdevelopersgooglecom)) thepage counts can be denoted by

119873(119904 (1198911)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198911)|)

119873 (119904 (1198912)) = 119873 (119905

1) 119873 (119905

2) 119873 (119905

|119904(1198912)|)

(7)

(3) Computing the semantic relatedness between eachtags pair from 119891

1and 119891

2by (2)ndash(5) For example if

we use PMI to compute tag semantic relatedness theequation can be

sr (119905119894 119905119895) =

log ((119873 lowast 119873(119905119894cap 119905119895)) (119873 (119905

119894) lowast 119873 (119905

119895)))

log119873

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(8)

From the above steps the tags relatedness can be com-puted which is denoted as a triple ⟨119905

119894 119905119895 sr(119905119894 119905119895)⟩ In the next

section wewill give the detailed analysis for choosing the bestmeasures from (2)ndash(5)

Overall the page counts of each tag should be issuedThen some cooccurrence based measures are used to com-pute the semantic relatedness between tags The reasons forusing page counts based measures are as follows

(1) Appropriate computation complexity Since the relat-edness between each tag pair of two images shouldbe computed the proposed method must be withlow complexity Recently web search engines suchas Google provide API for users to index the pagecounts of each query The web search engine gives anappropriate interface for the proposed computationmodel

(2) Explicit semanticsThe tag given by usersmay not be acorrect concept in taxonomy For example users maygive a tag ldquoBling Blingrdquo for an image about a lovelygirl The word ldquoBlingrdquo cannot be indexed in manytaxonomy such as WorldNet The proposed methoduses web search engine as an open intermediate Theexplicit semantics of the newly emerge concepts canbe gotten by web easily

42 Semantic Relatedness Integration In Section 41 we com-pute the tag pair relatedness of two images Obviously the tagpair relatedness of two images 119891

1and 119891

2can be treated as a

bipartite graph which is denoted by

119866 = (119881 119864)

119881 = 1198911 1198912

119864 = ⟨119905119894 119905119895 sr (119905119894 119905119895)⟩ 119905

119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(9)

Based on (9) we change the semantic relatedness integra-tion of all tag pairs to the problemmdashassignment in bipartitegraph We want to assign a best matching of the bipartitegraph 119866

A matching is defined as119872 sube 119864 so that no two edges in119872 share a common end vertex An assignment in a bipartitegraph is a matching 119872 so that each node of the graph hasan incident edge in 119872 Suppose that the set of vertices arepartitioned in two sets 119891

1and 119891

2 and that the edges of the

graph have an associated weight given by a function 119891 (1198911 1198912) rarr [0 sdot sdot sdot 1] The function maxRel (119891 119891

1 1198912) rarr

[0 sdot sdot sdot 1] returns the maximum weighted assignment that isan assignment so that the average of the weights of theedges is highest Figure 4 shows a graphical representationof the semantic relatedness integration where the bold linesconstitute the matching119872

Based on the expression of the assignment in bipartitegraphs we have

maxRel (119891 1198911 1198912)

=

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198911)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)

1003816100381610038161003816 le1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

maxsum119895isin119869119894isin119868119904 (119905119894 119905119895)

1003816100381610038161003816119904 (1198912)1003816100381610038161003816

1003816100381610038161003816119904 (1198911)1003816100381610038161003816 gt1003816100381610038161003816119904 (1198912)

1003816100381610038161003816

119868 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198911)

1003816100381610038161003816] 119869 = [1 sdot sdot sdot1003816100381610038161003816119904 (1198912)

1003816100381610038161003816]

(10)

Using the assignment in bipartite graphs problem to ourcontext the variables 119891

1and 119891

2represent the two images to

compute the semantic relatedness For example that 1198911and

The Scientific World Journal 7

10

0702

0508

01

0101

07

0309

f1 f2

t1q1

t2

q2

t3

q3

q4

maxRel(f1 f2) = (sr(t1 q1) + sr(t2 q2) + sr(t3 q4))3 =

(10 + 08 + 07)3 = 083

Figure 4 Graphical representation of the assignment in bipartitegraphs problem

1198912are composed of the tags 119904(119891

1) and 119904(119891

2) |119904(119891

1)| gt |119904(119891

2)|

means that the number of tags in 119904(1198912) is lower than that of

119904(1198911) According to Heuristic 3 we divide the result of the

maximization by the lower cardinality of 119904(1198911) or 119904(119891

2) In this

way the influence of the number of tags is reduced and thesemantic relatedness of two images is symmetric

Beside the cardinality of two tags set 119904(1198911) and 119904(119891

2) the

maxRel function is affected by the relatedness between eachpair of tags According to Heuristics 4 and 5 the redundancyand noise should be avoided In maxRel function the one-to-one map is applied to the tags 119904(119891

1) and 119904(119891

2) Thus the

proposed maxRel function varies with respect to the natureof two images

Adopting the proposed maxRel function we are sure tofind the global maximum relatedness that can be obtainedpairing the elements in the two tags sets Alternativemethodsare able to find only the local maximum since they scroll theelements in the first set and after calculating the relatednesswith all the elements in the second set they select the onewith the maximum relatedness Since every element in oneset must be connected at most at one element in the otherset such a procedure is able to find only the local maximumsince it depends on the order inwhich the comparisons occurFor example considering the example in Figure 4 119905

1will be

paired to 1199021(weight = 10) But when analyzing 119905

3 the

maximum weight is with 1199022(weight = 09) This means that

1199052can nomore be paired to 119902

2even if the weight is maximum

since this is already matched to 1199053 As a consequence 119905

2will

be paired to 1199023and the average of the selected weights will be

(10 + 03 + 09)3 = 073 which is considerably lower thanusing MaxRel where the sum of the weights was (10 + 08 +07)3 = 083

Overall the cardinality of two tag sets is used to followHeuristic 3The one-to-one map of tags pair is used to followHeuristics 4 and 5 The MaxRel function is used to match abest semantic relatedness integration of two images

43 Tag Order Revision According to Heuristic 2 the orderof tags should be considered to compute the semantic relat-edness between two images Intuitively the tags appearingin the first position may be more important than the lattertags Some researches [39] suggest that people used to selectpopular items as their tags Meanwhile the top popular tagsare indeed the ldquomeaningfulrdquo ones

In this section the MaxRel function proposedin Section 42 is revised considering the order of tagsFor example the relatedness of tags pair with high positionshould be enhanced which is summarized as a constrainschema

Schema 1 (tag relatedness declining)This schema means thatthe identical tag pairs of two images 119891

1and 119891

2should be

pruned in MaxRel function In other words the semanticrelatedness of the same tag of two images is set as 0

We add a decline factor to the MaxRel function and thedetailed steps are as follows

(1) According to the MaxRel function in Section 42 thebest matching tag pairs are selected which is denotedby

maxRel (1198911 1198912) = sum sr (119905

119894 119905119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(11)

Of course the selected tag pairs are the best matchingof the bipartite graph between images 119891

1and 119891

2

(2) Computing the position information of each tagwhich is denoted by Pos(119905

119894)

Pos (119905119894) =

1003816100381610038161003816119904 (119891)1003816100381610038161003816 + 1 minus 119894

1003816100381610038161003816119904 (119891)1003816100381610038161003816

119905119894isin 119904 (119891) (12)

(3) Add the position information of each tag to (11)which can be seen as a decline factor

sr (1198911 1198912) = sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(13)

(4) Of course similar to MaxRel function equationshould divide the result of the maximization by

sr (1198911 1198912) =sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

sumPos (119905119894) lowast Pos (119905

119895)

(14)

We also consider the example in Figure 4 According to(14) the semantic relatedness is revised as

(1 sdot 10 sdot 1 +2

3sdot 08 sdot

3

4+1

3sdot 07 sdot

1

4)

times (1 sdot 1 +2

3sdot3

4+1

3sdot1

4)

minus1

= 092

(15)

Besides adding decline factor to the MaxRel function wealso add a constrain schema identical tag pruning

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

The Scientific World Journal 7

10

0702

0508

01

0101

07

0309

f1 f2

t1q1

t2

q2

t3

q3

q4

maxRel(f1 f2) = (sr(t1 q1) + sr(t2 q2) + sr(t3 q4))3 =

(10 + 08 + 07)3 = 083

Figure 4 Graphical representation of the assignment in bipartitegraphs problem

1198912are composed of the tags 119904(119891

1) and 119904(119891

2) |119904(119891

1)| gt |119904(119891

2)|

means that the number of tags in 119904(1198912) is lower than that of

119904(1198911) According to Heuristic 3 we divide the result of the

maximization by the lower cardinality of 119904(1198911) or 119904(119891

2) In this

way the influence of the number of tags is reduced and thesemantic relatedness of two images is symmetric

Beside the cardinality of two tags set 119904(1198911) and 119904(119891

2) the

maxRel function is affected by the relatedness between eachpair of tags According to Heuristics 4 and 5 the redundancyand noise should be avoided In maxRel function the one-to-one map is applied to the tags 119904(119891

1) and 119904(119891

2) Thus the

proposed maxRel function varies with respect to the natureof two images

Adopting the proposed maxRel function we are sure tofind the global maximum relatedness that can be obtainedpairing the elements in the two tags sets Alternativemethodsare able to find only the local maximum since they scroll theelements in the first set and after calculating the relatednesswith all the elements in the second set they select the onewith the maximum relatedness Since every element in oneset must be connected at most at one element in the otherset such a procedure is able to find only the local maximumsince it depends on the order inwhich the comparisons occurFor example considering the example in Figure 4 119905

1will be

paired to 1199021(weight = 10) But when analyzing 119905

3 the

maximum weight is with 1199022(weight = 09) This means that

1199052can nomore be paired to 119902

2even if the weight is maximum

since this is already matched to 1199053 As a consequence 119905

2will

be paired to 1199023and the average of the selected weights will be

(10 + 03 + 09)3 = 073 which is considerably lower thanusing MaxRel where the sum of the weights was (10 + 08 +07)3 = 083

Overall the cardinality of two tag sets is used to followHeuristic 3The one-to-one map of tags pair is used to followHeuristics 4 and 5 The MaxRel function is used to match abest semantic relatedness integration of two images

43 Tag Order Revision According to Heuristic 2 the orderof tags should be considered to compute the semantic relat-edness between two images Intuitively the tags appearingin the first position may be more important than the lattertags Some researches [39] suggest that people used to selectpopular items as their tags Meanwhile the top popular tagsare indeed the ldquomeaningfulrdquo ones

In this section the MaxRel function proposedin Section 42 is revised considering the order of tagsFor example the relatedness of tags pair with high positionshould be enhanced which is summarized as a constrainschema

Schema 1 (tag relatedness declining)This schema means thatthe identical tag pairs of two images 119891

1and 119891

2should be

pruned in MaxRel function In other words the semanticrelatedness of the same tag of two images is set as 0

We add a decline factor to the MaxRel function and thedetailed steps are as follows

(1) According to the MaxRel function in Section 42 thebest matching tag pairs are selected which is denotedby

maxRel (1198911 1198912) = sum sr (119905

119894 119905119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(11)

Of course the selected tag pairs are the best matchingof the bipartite graph between images 119891

1and 119891

2

(2) Computing the position information of each tagwhich is denoted by Pos(119905

119894)

Pos (119905119894) =

1003816100381610038161003816119904 (119891)1003816100381610038161003816 + 1 minus 119894

1003816100381610038161003816119904 (119891)1003816100381610038161003816

119905119894isin 119904 (119891) (12)

(3) Add the position information of each tag to (11)which can be seen as a decline factor

sr (1198911 1198912) = sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

119905119894isin 119904 (119891

1) and 119905119895isin 119904 (119891

2)

(13)

(4) Of course similar to MaxRel function equationshould divide the result of the maximization by

sr (1198911 1198912) =sumPos (119905

119894) lowast sr (119905

119894 119905119895) lowast Pos (119905

119895)

sumPos (119905119894) lowast Pos (119905

119895)

(14)

We also consider the example in Figure 4 According to(14) the semantic relatedness is revised as

(1 sdot 10 sdot 1 +2

3sdot 08 sdot

3

4+1

3sdot 07 sdot

1

4)

times (1 sdot 1 +2

3sdot3

4+1

3sdot1

4)

minus1

= 092

(15)

Besides adding decline factor to the MaxRel function wealso add a constrain schema identical tag pruning

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

8 The Scientific World Journal

InputThe tags set of two images 1198911and 119891

2 which is 119904(119891

1) and 119904(119891

2)

OutputThe semantic relatedness of two images 1198911and 119891

2

for each 119905119894isin 119904(1198911) lowastpage counts and position initiallowast

119873(119904(1198911)) larr 119873(119905

119894)

Pos(119904(1198911)) larr Pos(119905

119894)

for each 119905119895isin 119904(1198912)

119873(119904(1198912)) larr 119873(119905

119895)

Pos(119904(1198912)) larr Pos(119905

119895)

for each 119905119894isin 119904(1198911)

for each 119905119895isin 119904(1198912)

if (119905119894== 119905119895) sr(119905119894 119905119895) = 0 lowastpruninglowast

else sr(119905119894 119905119895) = 119891(119873(119905

119894)119873(119905

119895)) lowastrelatednesslowast

return 119898119886119909119877119890119897(1198911 1198912) = 119891(Pos(119905

119894)Pos(119905

119895) sr(119905

119894 119905119895))

Algorithm 1 MaxRel

Schema 2 (identical tag pruning)This schemameans that theidentical tag pairs of two images 119891

1and 119891

2should be pruned

in MaxRel function In other words the semantic relatednessof the same tag of two images is set as 0

The above schema is used to ensure the relatednessmeasures of two images If we do not prune the identicaltag pairs of two images the proposed method will be trans-formed to the similarity measures For example the cosinesimilarity [36] between two tags is to find the number ofidentical elements of two vectorsThe overall algorithmof theproposed computation mode is presented in Algorithm 1

5 Experimental Results

In this section we evaluate the results of using the proposedmethod for relatedness measurement In Section 51 weintroduce the data set for the evaluation In Section 52we determine to use the cooccurrence function for tagrelatedness measures In Sections 53 and 54 clustering andretrieval are used to evaluate the proposed method

51 The Data Sets We choose Flickr groups as the resourcesfor building data sets Users on online photo sharing sites likeFlickr have organizedmanymillions of photos into hundredsof thousands of semantically themed groups These groupsexpose implicit choices that users make about which imagesare similar Flickr groupmembership is usually less noisy thanFlickr tags because images are screened by group membersWe download 1000 images from ten groupsThese ten groupscan be divided into two classes The first class includes fivegroups which are car phone flower dog and boat Thesecond class consists of another five groups which are LouisVuitton Dior Gucci Cartier and Chanel Of course theseimages are selected by humans which reduce the noise of thedata set The reason why we choose two classes of groups isthat we want to test the accuracy of the proposed methodagainst the semantic relatedness of data set The semanticrelatedness of the second set is higher than the first setsince the second class is all about the luxury brands Forexample almost all these brands produce handbags Thus ifthe proposedmethod can do well in these groups wemay say

Table 2 The detailed information of the data set

Group 1 Average tags perimage Group 2 Average tags per

imageCar 44 Louis Vuitton 31Phone 35 Dior 32Flower 22 Gucci 29Dog 56 Cartier 28Boat 31 Chanel 26

that it can measure the semantic relatedness between Flickrimages accurately and robustly Table 2 gives the detailedinformation of the data set Table 3 gives some selected tagsfrom group 2

52 Relatedness Function Selection In Section 41 four cooc-currence measures (ie Jaccard Overlap Dice and PMI)are given for relatedness measures between tags In [40]Rubenstein and Goodenough proposed a data set containing28 word pairs rating by a group of 51 human subjects whichis a reliable benchmark for evaluating semantic similaritymeasures The higher the correlation coefficient against R-G ratings is the more accurate the methods for measuringsemantic similarity between words are Figure 5 gives thecorrelation coefficient of four functions against R-G test setFrom Figure 5 we can say that PMI performs best on relat-edness measures for its highest correlation coefficient Thusin the later experiments we select PMI as the relatednessmeasures between tags

53 Evaluation on Image Clustering In this section we eval-uate the correctness of using tag order In Section 43 we addthe position information of each tag to the semantic related-ness measures The tags with high position are treated as themajor element for sematic relatedness measures We evaluatethe using of tag order by the clustering task We employthe proposed semantic relatedness of images into 119870-means[41] clustering model Since the 119870-means model depends onthe initial points we random select core points 100 timesWe evaluate the effectiveness of document clustering withthree quality measures 119865-measure Purity and Entropy [41]We treat each cluster as if it were the result of the proposed

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

The Scientific World Journal 9

Table 3 The selected tags of group 2 from Flickr

Group 2 Tags Tags Tags Tags Tags

Louis Vuitton ldquoLouis VuittonrdquoldquoKeepallrdquo

ldquoLouis VuittonrdquoldquoAlmardquo

ldquoLouis VuittonrdquoldquoTivolirdquo

ldquoLouis VuittonrdquoldquoBolsasrdquo

ldquoLVrdquoldquoMulticolorerdquo

DiorldquoDIORrdquoldquolipstickrdquoldquomakeuprdquo

ldquoDiorrdquoldquoDiorskin NuderdquoldquoTan Sun Powderrdquo

ldquoDiorrdquoldquoMakeuprdquoldquoPaletterdquo

ldquoDiorrdquoldquoAddict 2rdquo

ldquoDiorrdquoldquoJadorerdquoldquoPerfumerdquo

Gucci ldquoGuccirdquoldquoLeather Beltsrdquo

ldquoGuccirdquoldquoTrainersrdquo

ldquoGuccirdquoldquoJolie Leopardrdquo

ldquoOrangerdquo

ldquoReplicardquoldquoGuccirdquo

ldquoHandbagsrdquo

ldquoGuccirdquoldquoCruiserdquo

CartierldquoCartierrdquoldquoPashardquo

ldquoChronographrdquo

ldquoCARTIERrdquoldquoLove Braceletrdquo

ldquoCartierrdquoldquoSantos Galbeerdquo

ldquoCalibrerdquoldquoCartierrdquo

ldquoCartier WatchrdquoldquoTank

Francaiserdquo

Chanel ldquoChanelrdquoldquoCoco Noirrdquo

ldquoChanelrdquoldquoChanel Rivardquo

ldquoChanel nail polishrdquo

ldquoCocoMademoisellerdquo

ldquoChanelrdquoldquoNo 5rdquo

ldquoChancerdquoldquoChanelrdquo

method and each class as if it were the desired set of imagesGenerally we would like to maximize the 119865-measure andPurity and minimize the Entropy of the clusters to achievea high-quality document clustering Moreover we comparethe clustering results between the proposedmethod using tagorder or not Figures 6 and 7 give the clustering results ofgroup 1 and group 2 data sets From Figures 6 and 7 we canconclude the following

(1) The proposed method performs better than cosinebased clusteringThis result can be obtained fromFig-ures 6 and 7 The three metrics including 119865-measurepurity and entropy of the proposedmethod are betterthan cosine based clustering This may be caused bythe inherent feature of the proposedmethodThepro-posed method is based on the semantic relatednessother than the cooccurrence of the cosine based clus-tering If the tags of two images are not overlappedthe cosine based clustering may be unavailable

(2) The schema on using of tag order is effective Thisresult can also be obtained from Figures 6 and 7The three metrics including 119865-measure purity andentropy of using tag order are the highest The posi-tion information reflects the importance of each tagThe proposed method emphasizes the tags with highorder which raises the performance on images clus-tering

(3) The proposed method is robust in different data setsThe proposed method performs well in group 1 andgroup 2 data set It is worth noting that the differencebetween the proposed method and cosine method ofgroup 2 is higher than that of group 1 The reason ofthat is due to the semantic correlation of group 2 beingstronger than group 1 In other words the perfor-mance of the proposedmethod relies on the semanticcorrelation of classes in data sets The stronger thesemantic correlation between classes of data the bet-ter the proposed method performance

54 Evaluation on Image Searching In this section we eval-uate the proposed method query-based image searchingtask Five queries from group 2 are selected as the test setincluding ldquoLouis Vuittonrdquo ldquoGuccirdquo ldquoChanelrdquo ldquoCartierrdquo andldquoDiorrdquoThese queries are searched in FlickrThe top 50 imagesare obtained as the data set Moreover we remove the querieson the tags of each image For example the tag ldquoCartierrdquo of thetop 50 images is removed of the query ldquoCartierrdquo The reasonfor that operation is that the proposed method is based onthe semantic relatedness other than cooccurrenceWe choosecut-off point precision to evaluate the proposed method onimage searching The cut-off point precision (119875119899) means thatthe percentage of the correct result of the top 119899 returnedresults We compute the 1198751 1198755 and 11987510 of the group 2 testset Table 4 lists the comparison of the cut-off point precisionbetween the proposed method and Flickr From the experi-mental results we can conclude the following

(1) The proposed method performs better than Flickr InTable 4 the 1198751 1198755 and 11987510 of the proposed methodare higher than FlickrThe experimental results provethe correctness of the proposed method on imagesearching task

(2) The proposed method can handle the relatednesssearching problem The proposed method can mea-sure the semantic relatedness of two images robustlyand correctly

(3) The proposed method can support the faceted explo-ration of image search Faceted exploration of searchresults is widely used in search interfaces for struc-tured databases Recently the faceted exploration isalso appearing in online search engine in the form ofsearch assistants The proposed method can measurethe semantic relatedness of two images Given thesearch queries we can select the related images forfaceted search

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

10 The Scientific World Journal

Table 4 The comparison of the cut-off point precision between theproposed method and Flickr

Cut-off point Louis Vuitton Gucci Dior Chanel Cartier1198751 100 100 100 100 1001198751 (Flickr) 100 100 0 100 1001198755 100 100 100 100 1001198755 (Flickr) 80 60 60 60 8011987510 100 100 100 100 10011987510 (Flickr) 90 70 70 80 80

6 Conclusions

This paper mainly discusses the semantic relatedness mea-sures systematically puts forward a method to measure thesemantic relatedness of two images based on their tags andjustifies its validity through the experiments The major con-tributions are summarized as follows

(1) We propose a framework to measure semantic relat-edness between Flickr images using tags Firstlythe cooccurrence measures are used to compute therelatedness of tags between two images Secondlywe transform the tags relatedness integration to theassignment in bipartite graph problem which canfind an appropriate matching to the semantic related-ness of images Finally a decline factor consideringthe position information of tags is used in the pro-posed framework which reduces the noise andredundancy in the social tags

(2) A real data set including 1000 images from Flickr withten classes is used in our experiments Two evalua-tion methods including clustering and searching areperformed which shows that the proposed methodcan measure the semantic relatedness between Flickrimages accurately and robustly

(3) We extend the relatedness measures between con-cepts to the level of images Since the association rela-tion is the basic mechanism of brain The proposedrelatedness measurement can facilitate related appli-cations such as searching and recommendation

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National Science andTechnologyMajor Project under Grant no 2013ZX01033002-003 in part by the National High Technology Research andDevelopment Program of China (863 Program) underGrant nos 2013AA014601 and 2013AA014603 in part by

03460395 0421

0579

0

01

02

03

04

05

06

07

Jaccard

Cor

rela

tion

Dice Overlap PMI

Figure 5 The correlation of four selected functions

0912 0967

0011

0857 0922

0018

0732 0751

00560

02

04

06

08

1

F-measure Purity Entropy

Group 1

Using tag orderNot usingCosine

Cor

rela

tion

Figure 6 The clustering results of group 1 data sets

0876 0927

0023

0827 0852

0031

0632 0655

00850

010203040506070809

1

F-measure Purity Entropy

Group 2

Cor

rela

tion

Using tag orderNot usingCosine

Figure 7 The clustering results of group 2 data sets

National Key Technology Support Program under Grant no2012BAH07B01 in part by the National Science Foundationof China underGrant no 61300202 and in part by the ScienceFoundation of Shanghai under Grant no 13ZR1452900

References

[1] J Goldberger S Gordon and H Greenspan ldquoUnsupervisedimage-set clustering using an information theoretic frame-workrdquo IEEE Transactions on Image Processing vol 15 no 2 pp449ndash458 2006

[2] T Evgeniou M Pontil C Papageorgiou and T Poggio ldquoImagerepresentations and feature selection for multimedia databasesearchrdquo IEEE Transactions on Knowledge and Data Engineeringvol 15 no 4 pp 911ndash920 2003

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

The Scientific World Journal 11

[3] R Ji H Yao X Sun B Zhong andW Gao ldquoTowards semanticembedding in visual vocabularyrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 918ndash925 June 2010

[4] J Fan D A Keim Y Gao H Luo and Z Li ldquoJustClick per-sonalized image recommendation via exploratory search fromlarge-scale Flickr imagesrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 19 no 2 pp 273ndash288 2009

[5] T Gong S Li and C L Tan ldquoA semantic similarity languagemodel to improve automatic image annotationrdquo in Proceedingsof the 22nd International Conference on Tools with ArtificialIntelligence (ICTAI rsquo10) pp 197ndash203 October 2010

[6] C Schmid and R Mohr ldquoLocal grayvalue invariants for imageretrievalrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 19 no 5 pp 530ndash535 1997

[7] M Varma and A Zisserman ldquoA statistical approach to textureclassification from single imagesrdquo International Journal of Com-puter Vision vol 62 no 1-2 pp 61ndash81 2005

[8] S Belongie JMalik and J Puzicha ldquoShapematching and objectrecognition using shape contextsrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 4 pp 509ndash5222002

[9] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[10] D Huang M Ardabilian Y Wang and L Chen ldquoAsymmetric3D2D face recognition based on LBP facial representation andcanonical correlation analysisrdquo in Proceedings of the 16th IEEEInternational Conference on Image Processing (ICIP rsquo09) pp3325ndash3328 November 2009

[11] L Wang Y Zhang and J Feng ldquoOn the Euclidean distanceof imagesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 27 no 8 pp 1334ndash1339 2005

[12] W Jia H Zhang X He and Q Wu ldquoGaussian weighted his-togram intersection for license plate classificationrdquo in Proceed-ings of the 18th International Conference on Pattern Recognition(ICPR rsquo06) pp 574ndash577 August 2006

[13] Y Rubner C Tomasi and L J Guibas ldquoA Metric for distribu-tions with applications to image databasesrdquo in Proceedings of theIEEE 6th International Conference on Computer Vision pp 59ndash66 January 1998

[14] L Wu X-S Hua N Yu W-Y Ma and S Li ldquoFlickr distancea relationship measure for visual conceptsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 34 no 5 pp863ndash875 2012

[15] D Cai ldquoAn information-theoretic foundation for the mea-surement of discrimination informationrdquo IEEE Transactions onKnowledge and Data Engineering vol 22 no 9 pp 1262ndash12732010

[16] P van den Broek ldquoUsing texts in science education cognitiveprocesses and knowledge representationrdquo Science vol 328 no5977 pp 453ndash456 2010

[17] C Bizer T Heath and T Berners-Lee ldquoLinked datamdashthe storyso farrdquo International Journal on Semantic Web and InformationSystems vol 5 no 3 pp 1ndash22 2009

[18] H Zhuge ldquoCommunities and emerging semantics in semanticlink network discovery and learningrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 6 pp 785ndash7992009

[19] H Zhuge ldquoSemantic linking through spaces for cyber-physical-socio intelligence a methodologyrdquo Artificial Intelligence vol175 no 5-6 pp 988ndash1019 2011

[20] X Luo Z Xu J Yu and X Chen ldquoBuilding association linknetwork for semantic link on web resourcesrdquo IEEE Transactionson Automation Science and Engineering vol 8 no 3 pp 482ndash494 2011

[21] S A Golder and B A Huberman ldquoUsage patterns of collabo-rative tagging systemsrdquo Journal of Information Science vol 32no 2 pp 198ndash208 2006

[22] H S Al-Khalifa and H C Davis ldquoMeasuring the semanticvalue of folksonomiesrdquo in Proceedings of the Innovations inInformation Technology (IIT rsquo06) pp 1ndash5 November 2006

[23] F M Suchanek M Vojnovic and D Gunawardena ldquoSocialtags meaning and suggestionsrdquo in Proceedings of the 17th ACMConference on Information and Knowledge Management (CIKMrsquo08) pp 223ndash232 October 2008

[24] H Halpin V Robu and H Shepherd ldquoThe complex dynamicsof collaborative taggingrdquo in Proceedings of the 16th InternationalWorld Wide Web Conference (WWW rsquo07) pp 211ndash220 May2007

[25] C Cattuto C Schmitz A Baldassarri et al ldquoNetwork prop-erties of folksonomiesrdquo AI Communications vol 20 no 4 pp245ndash262 2007

[26] R Lambiotte and M Ausloos ldquoCollaborative tagging as a tri-partite networkrdquo in Computational Science vol 3393 of LectureNotes in Computer Science pp 1114ndash1117 2006

[27] U Maulik S Bandyopadhyay and I Saha ldquoIntegrating cluster-ing and supervised learning for categorical data analysisrdquo IEEETransactions on Systems Man and Cybernetics A vol 40 no 4pp 664ndash675 2010

[28] D Ramage P Heymann C D Manning and H Garcia-Molina ldquoClustering the tagged webrdquo in Proceedings of the 2ndACM International Conference on Web Search and Data Mining(WSDM rsquo09) pp 54ndash63 February 2009

[29] D Zhou J Bian S Zheng H Zha and C L G C Lee GilesldquoExploring social annotations for information retrievalrdquo inProceedings of the 17th International Conference on World WideWeb (WWW rsquo08) pp 715ndash724 April 2008

[30] S Xu S Bad Y Cao and Y Yu ldquoUsing social annotations toimprove language model for information retrievalrdquo in Proceed-ings of the 16th ACM Conference on Information and KnowledgeManagement (CIKM rsquo07) pp 1003ndash1006 November 2007

[31] S Bao G Xue X Wu Y Yu B Fei and Z Su ldquoOptimizingweb search using social annotationsrdquo in Proceedings of the 16thInternationalWorldWideWeb Conference (WWW rsquo07) pp 501ndash510 May 2007

[32] R Schenkel T Crecelius M Kacimi et al ldquoEfficient top-k que-rying over social-tagging networksrdquo in Proceedings of the 31stAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (ACM SIGIR rsquo08) pp523ndash530 July 2008

[33] N Rasiwasia P J Moreno and N Vasconcelos ldquoBridging thegap query by semantic examplerdquo IEEE Transactions on Multi-media vol 9 no 5 pp 923ndash938 2007

[34] R Fergus L Fei-Fei P Perona and A Zisserman ldquoLearningobject categories from Googlersquos image searchrdquo in Proceedingsof the 10th IEEE International Conference on Computer Vision(ICCV rsquo05) vol 2 pp 1816ndash1823 October 2005

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

12 The Scientific World Journal

[35] GWang D Hoiem and D Forsyth ldquoLearning image similarityfrom flickr groups using fast kernel machinesrdquo IEEE Transac-tions on Pattern Analysis and Machine Intelligence vol 34 no11 pp 2177ndash2188 2012

[36] G Salton A Wong and C S Yang ldquoA vector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[37] Z Xu X Luo J Yu andW Xu ldquoMeasuring semantic similaritybetween words by removing noise and redundancy in websnippetsrdquo Concurrency Computation Practice and Experiencevol 23 no 18 pp 2496ndash2510 2011

[38] R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo in Studiesin Linguistic Analysis Philological Society Oxford UK 1957

[39] M Vojnovic J Cruise D Gunawardena and P MarbachldquoRanking and suggesting popular itemsrdquo IEEE Transactions onKnowledge and Data Engineering vol 21 no 8 pp 1133ndash11462009

[40] H Rubenstein and B Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[41] M Steinbach G Karypis and V Kumar ldquoA comparison of doc-ument clustering techniquesrdquo in Proceedings of the KDDWork-shop on Text Mining 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article Measuring Semantic Relatedness between ...downloads.hindawi.com/journals/tswj/2014/758089.pdf · as Linked Open Data (LOD) [ ]andSemanticLink Network (SLN) [ ], the

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014