learning term-weighting functions for similarity measures
Post on 07-Feb-2016
67 Views
Preview:
DESCRIPTION
TRANSCRIPT
Learning Term-weighting Functions for Similarity Measures
Scott Wen-tau YihMicrosoft Research
Applications of Similarity Measures
Query Suggestion
How similar are they?
mariners vs. seattle marinersmariners vs. 1st mariner bank
querymariners
Applications of Similarity Measures
Ad Relevancequerymovie theater tickets
Similarity Measures based on TFIDF VectorsDigital Camera ReviewThe new flagship of Canon’s S-series, PowerShot S80 digital camera, incorporates 8 megapixels for shooting still images and a movie mode that records an impressive 1024 x 768 pixels.
vp = { digital: 1.35, camera: 0.89, review: 0.32, … }
Dptf (“review”, Dp) idf
(“review”)
Sim(Dp,Dq) fsim(vp,vq)fsim could be cosine, overlap, Jaccard, etc.
Vector-based Similarity Measures Pros & Cons
AdvantagesSimple & EfficientConcise representationEffective in many applications
IssuesNot trivial to adapt to target domain
Lots of variations of TFIDF formulasNot clear how to incorporate other information
e.g., term position, query log frequency, etc.
Approach: Learn Term-weighting FunctionsTWEAK – Term-weighting Learning
FrameworkInstead of a fixed TFIDF formula, learn the term-weighting functions
Preserve the engineering advantages of the vector-based similarity measuresAble to incorporate other term information and fine tune the similarity measureFlexible in choosing various loss functions to match the true objectives in the target applications
OutlineIntroductionProblem Statement & Model
Formal definitionLoss functions
ExperimentsQuery suggestionAd page relevance
Conclusions
Vector-based Similarity Measures Formal Definition
Compute the similarity between Dp and Dq
Vocabulary: Term-vector:Term-weighting score:
npS
vp1qS
vq𝑆𝑝1 𝑆𝑞𝑛
𝑓sim൫𝐯𝑝,𝐯𝑞൯
𝑉= {𝑡1,𝑡2,⋯,𝑡𝑛} 𝐯𝑝 = {𝑠𝑝1,𝑠𝑝2,⋯,𝑠𝑝𝑛} 𝑠𝑝𝑖 ≡ tw(𝑡𝑖,𝐷𝑝)
TFIDF Cosine Similarity
Use the same fsim(∙, ∙) (i.e., cosine)Linear term-weighting function
npS
vp1qS
vq𝑆𝑝1 𝑆𝑞𝑛
𝑓sim൫𝐯𝑝,𝐯𝑞൯ 𝑓𝑠𝑖𝑚൫𝐯𝑝,𝐯𝑞൯= 𝐯𝑝 ⋅ 𝐯𝑞ฮ𝐯𝑝ฮ ⋅ ฮ𝐯𝑞ฮ
tw൫𝑡𝑖,𝐷𝑝൯≡ 𝑡𝑓൫𝑡𝑖,𝐷𝑝൯⋅ log൬ 𝑁𝑑𝑓(𝑡𝑖)൰
tw𝛌൫𝑡𝑖,𝐷𝑝൯≡ 𝜆𝑗 ⋅ 𝜙𝑗(𝑡𝑖,𝐷𝑝)𝑗
Learning Similarity MetricTraining examples: document pairs
Loss functions
Sum-of-squares error
Log-loss
Smoothing
ቀ𝑦1,൫𝐷𝑝1,𝐷𝑞1൯ቁ,⋯,ቀ𝑦𝑚,൫𝐷𝑝𝑚,𝐷𝑞𝑚൯ቁ
𝐿sseሺ𝛌ሻ= 12 ቀ𝑦𝑘,𝑓𝑠𝑖𝑚(𝐯𝑝𝑘,𝐯𝑞𝑘)ቁ2mk
𝐿logሺ𝛌ሻ= −𝑦𝑘 log൬𝑓𝑠𝑖𝑚ቀ𝐯𝑝𝑘,𝐯𝑞𝑘ቁ൰− (1−𝑦𝑘)mk ቀ1− log൬𝑓𝑠𝑖𝑚ቀ𝐯𝑝𝑘,𝐯𝑞𝑘ቁ൰ቁ α2ԡ𝛌ԡ2
Learning Preference OrderingTraining examples: pairs of document pairs
LogExpLoss [Dekel et al. NIPS-03]
Upper bound the pairwise accuracy
ቀ𝑦1,൫𝑥𝑎1,𝑥𝑏1൯ቁ,⋯,ቀ𝑦𝑚,൫𝑥𝑎𝑚 ,𝑥𝑏𝑚൯ቁ 𝑥𝑎𝑘 = ቀ𝐷𝑝𝑎𝑘,𝐷𝑝𝑎𝑘ቁ,𝑥𝑏𝑘 = ቀ𝐷𝑝𝑏𝑘,𝐷𝑝𝑏𝑘ቁ
∆𝑘= 𝑓𝑠𝑖𝑚 ቀ𝐯𝑎𝑘,𝐯𝑞𝑎𝑘ቁ− 𝑓𝑠𝑖𝑚 ቀ𝐯𝑏𝑘,𝐯𝑞𝑏𝑘ቁ
𝐿ሺ𝛌ሻ= log(1+ exp(−y𝑘Δ𝑘 −ሺ1− y𝑘ሻሺ−Δ𝑘ሻ𝑚
𝑘=1 ))
OutlineIntroductionProblem Definition & Model
Term-weighting functionsObjective functions
ExperimentsQuery suggestionAd page relevance
Conclusions
Experiment – Query SuggestionData: Query suggestion dataset
[Metzler et al. ’07; Yih&Meek ‘07]
|Q| = 122, |(Q,S)| = 4852; {Ex,Good} vs. {Fair,Bad}
Query Suggestion Labelshell oil credit card shell gas cards Excellentshell oil credit card texaco credit card Fairtarrant county college
fresno city college Bad
tarrant county college
dallas county schools
Good
Term Vector Construction and FeaturesQuery expansion of x using a search
engineIssue the query x to a search engineConcatenate top-n search result snippets
Titles and summaries of top-n returned documents
Features (of each term w.r.t. the document)
Term Frequency, Capitalization, LocationDocument Frequency, Query Log Frequency
Results – Query Suggestion
1 20
0.10.20.30.40.50.60.70.8
0.782
0.597
Series1Series2Series3Series4
10 fold CV; smoothing parameter selected on dev set
Experiment – Ad Page RelevanceData: a random sample of queries and ad landing pages collected during 2008
Collected 13,341 query/page pairs with reliable labels (8,309 – relevant; 5,032 – irrelevant)
Apply the same query expansion on queriesAdditional HTML Features
Hypertext, URL, TitleMeta-keywords, Meta-Description
Results – Ad Page RelevanceFeatures
AUC
TFIDF 0.794
TF&DF 0.806
Plaintext 0.832
HTML 0.855
Preference order learning on different feature sets
0 0.1 0.2 0.3 0.4 0.5 0.60
0.10.20.30.40.50.60.70.80.9
1
Series1Series3Series5Series7
Results – Ad Page Relevance
Features
AUC
TFIDF 0.794
TF&DF 0.806
Plaintext 0.832
HTML 0.855Preference order learning on different feature
sets
Related Work“Siamese” neural network framework
Vectors of objects being compared are generated by two-layer neural networksApplications: fingerprint matching, face matchingTWEAK can be viewed as a single-layer neural network with many (vocabulary size) output nodes
Learning directly the term-weighting scores [Bilenko&Mooney ‘03]
May work for limited vocabulary sizeLearning to combine multiple similarity measures [Yih&Meek ‘07]
Features of each pair: similarity scores from different measuresComplementary to TWEAK
Future Work – Other ApplicationsNear-duplicate detection
Existing methods (e.g., shingles, I-Match)Create hash code of n-grams in document as fingerprintsDetect duplicates when identical fingerprints are found
Learn which fingerprints are importantParaphrase recognition
Vector-based similarity for surface matchingDeep NLP analysis may be needed and encoded as features for sentence pairs
Future Work – Model ImprovementLearn additional weights on terms
Create an indicator feature for each termCreate a two-layer neural network, where each term is a node; learn the weight of each term as well
A joint model for term-weighting learning and similarity function (e.g., kernel) learning
The final similarity function combines multiple similarity functions and incorporates pair-level featuresThe vector construction and term-weighting scores are trained using TWEAK
ConclusionsTWEAK: A term-weighting learning framework for improving vector-based similarity measuresGiven labels of text pairs, learns the term-weighting functionA principled way to incorporate more information and adapt to target applicationsCan replace existing TFIDF methods directlyFlexible in using various loss functionsPotential for more applications and model enhancement
top related