tag ranking present by jie xiao dept. of computer science univ. of texas at san antonio
TRANSCRIPT
Outline
Problem
Probabilistic tag relevance estimation
Random walk tag relevance refinement
Experiment
Conclusion
Problem
There are millions of social images on internet, which are very attractive for the research purpose.
The tags associated with images are not ordered by the relevance.
Tag relevance
There are two types of relevance to be considered.
The relevance between a tag and an image
The relevance between two tags for the same image.
Probabilistic Tag Relevance Estimation
Similarity between a tag and an image
x : an imaget : tag i associated with image xP(t|x) : the probability that given an image x, we have the tag t.P(t) : the prior probability of tag t occurred in the dataset
After applying Bayes’ rule, we can derive that
Probabilistic Relevance Estimation (Cont)
Since the target is to rank that tags for the individual image and p(x) is identical for these tags, we refine it as
Density Estimation
Let (x1, x2, …, xn) be an iid sample drawn from some distribution with an unknown density ƒ.
Two types of methods to describe the densityHistogram
Kernel density estimator
Probabilistic Relevance Estimation (Cont)
Kernel Density Estimation (KDE) is adopted to estimate the probability density function p(x|t).
Xi : the image set containing tag tixk : the top k near neighbor image in image set XiK : density kernel function used to estimate the probability|x| : cardinality of Xi
Relevance between tags
ti, tag i associated with image x
tj, tag j associated with image x
, the image set containing tag i
, the image set containing tag j
N: the top N nearest neighbor for image x
Relevance between tags (Cont.)
Co-occurrence similarity between tags
f(ti) : the # of images containing tag tif(ti,tj) : the # of images containing both tag ti and tag tjG : the total # of images in Flickr
Random walk over tag graph
P: n by n transition matrix.
pij : the probability of the transition from node i to j
rk(j): relevance score of node i at iteration k
Experiments
Dataset: 50,000 image crawled from Flickr
Popular tags:
Raw tags: more than 100,000 unique tags
Filtered tags: 13,330 unique tags
Performance Metric
Normalized Discounted Cumulative Gain
(NDCG)
r(i) : the relevance level of the i - th tag
Zn : a normalization constant that is chosen so that the optimalranking’s NDCG score is 1.
Conclusion
Estimate the tag - image relevance by kernel density estimation.
Estimate the tag – tag relevance by visual similarity and tag co-occurrence.
A random walk based approach is used to refine the ranking performance.