personalized web search using clickthrough history
DESCRIPTION
Personalized Web Search using Clickthrough History. U. Rohini 200407019 [email protected] Language Technologies Research Center (LTRC) International Institute of Information Technology (IIIT) Hyderabad, India. Outline of the talk. Introduction Current Search Engines – Problems - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/1.jpg)
11
Personalized Web Personalized Web Search using Search using
Clickthrough HistoryClickthrough HistoryU. RohiniU. Rohini
[email protected]@research.iiit.ac.in
Language Technologies Research Center (LTRC)Language Technologies Research Center (LTRC)International Institute of Information Technology (IIIT)International Institute of Information Technology (IIIT)
Hyderabad, India Hyderabad, India
![Page 2: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/2.jpg)
22
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web SearchI Search : A suite of approaches for Personalized Web Search Personalized Search using user Relevance Feedback: Statistical Language modeling Personalized Search using user Relevance Feedback: Statistical Language modeling
based approachesbased approaches Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Personalized Search using user Relevance Feedback: Machine Learning based Personalized Search using user Relevance Feedback: Machine Learning based approachapproach
Ranking SVM based methodRanking SVM based method Personalization without Relevance Feedback: Simple Statistical Language modeling Personalization without Relevance Feedback: Simple Statistical Language modeling
based methodbased method ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 3: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/3.jpg)
33
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web SearchI Search : A suite of approaches for Personalized Web Search Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 4: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/4.jpg)
44
IntroductionIntroduction
Current Web Search enginesCurrent Web Search engines Provide users with documents “relevant” to their Provide users with documents “relevant” to their
information needinformation need IssuesIssues
Information overloadInformation overload To cater Hundreds of millions of usersTo cater Hundreds of millions of users Terabytes of dataTerabytes of data
Poor description of Information needPoor description of Information need Short queries - Difficult to understand Short queries - Difficult to understand Word ambiguitiesWord ambiguities
Users only see top few resultsUsers only see top few results RelevanceRelevance
subjective – depends on the usersubjective – depends on the user
One size Fits all ???One size Fits all ???
![Page 5: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/5.jpg)
55
MotivationMotivation
Search is not a solved problem!Search is not a solved problem! Poorly described information need Poorly described information need
JavaJava – (Java island / Java programming language ) – (Java island / Java programming language ) JaguarJaguar – (cat /car) – (cat /car) LemurLemur – (animal / lemur tool kit) – (animal / lemur tool kit) SBHSBH – (State bank of Hyderbad/Syracuse Behavioral – (State bank of Hyderbad/Syracuse Behavioral
Health care) Health care)
Given prior information Given prior information I am into biology – best guess for I am into biology – best guess for JaguarJaguar?? past queries - { information retrieval, language modeling } – past queries - { information retrieval, language modeling } –
best guess for best guess for lemur?lemur?
![Page 6: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/6.jpg)
66
BackgroundBackground
Prior Information – user feedbackPrior Information – user feedback
ContextContext Short termShort term Long termLong term
ImplicitImplicit Immediately clicked/printed/saved Immediately clicked/printed/saved documentdocument
Past query Past query loglog
ExplicitExplicit Document marked relevant just Document marked relevant just beforebefore
Hobbies, Hobbies, interestsinterests
![Page 7: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/7.jpg)
77
Problem DescriptionProblem Description
Personalized SearchPersonalized SearchCustomize search results according to Customize search results according to
each individual usereach individual userPersonalized Search - IssuesPersonalized Search - Issues
What to use to Personalize?What to use to Personalize?
How to Personalize?How to Personalize?When not to Personalize?When not to Personalize?How to know Personalization helped?How to know Personalization helped?
![Page 8: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/8.jpg)
88
Problem StatementProblem Statement
Problem:Problem:
How to Personalize?How to Personalize? Our Direction: Our Direction:
Use past Search historyUse past Search history Long term learningLong term learning
Sub ProblemsSub Problems
Broken down into 2 sub problems Broken down into 2 sub problems 1.1. How to model and represent past search contextsHow to model and represent past search contexts
2.2. How to use it to improve search resultsHow to use it to improve search results
![Page 9: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/9.jpg)
99
Solution OutlineSolution Outline
1. How to model and represent past search 1. How to model and represent past search contextscontexts
Past search history from user over a period of time – query logsPast search history from user over a period of time – query logs User contexts – triples : {user,query,{relevant documents}}User contexts – triples : {user,query,{relevant documents}} Apply appropriate method, learn from user contexts, build Apply appropriate method, learn from user contexts, build
model – user profilemodel – user profile
User Profile LearningUser Profile Learning
2. How to use it to improve search results2. How to use it to improve search results Get Initial Search resultsGet Initial Search results Take top few documents, re-score using user profile and sort Take top few documents, re-score using user profile and sort
againagain
RerankingReranking
![Page 10: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/10.jpg)
1010
ContributionsContributions
I Search : A suite of approaches for I Search : A suite of approaches for Personalized Web SearchPersonalized Web SearchProposed Personalized search Proposed Personalized search
approachesapproachesBaselineBaselineBasic Retrieval methodsBasic Retrieval methodsAutomatic EvaluationAutomatic Evaluation
Analysis of Query LogAnalysis of Query LogCreating Simulated FeedbackCreating Simulated Feedback
![Page 11: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/11.jpg)
1111
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web SearchI Search : A suite of approaches for Personalized Web Search Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 12: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/12.jpg)
1212
Review of Personalized Review of Personalized SearchSearch
Personalized SearchPersonalized Search
Query logs Machine learning Language modeling Community based Query logs Machine learning Language modeling Community based OthersOthers
![Page 13: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/13.jpg)
1313
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web I Search : A suite of approaches for Personalized Web
SearchSearch Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 14: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/14.jpg)
1414
I Search : A suite of I Search : A suite of approaches for Personalized approaches for Personalized
SearchSearchSuite of ApproachesSuite of Approaches
Statistical Language modeling based Statistical Language modeling based approachesapproachesSimple N-gram based methodsSimple N-gram based methodsNoisy Channel Model based methodNoisy Channel Model based method
Machine learning based approachMachine learning based approachRanking SVM based methodRanking SVM based method
Personalization without relevance Personalization without relevance feedbackfeedbackSimple N-gram based methodSimple N-gram based method
![Page 15: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/15.jpg)
1515
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web I Search : A suite of approaches for Personalized Web Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple Language model based methodSimple Language model based method Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 16: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/16.jpg)
1616
Statistical Language Modeling Statistical Language Modeling based Approaches: Introductionbased Approaches: Introduction
Statistical language modeling : task Statistical language modeling : task of estimating probability distribution of estimating probability distribution that captures statistical regularities that captures statistical regularities of natural languageof natural language
Applied to a number of problems – Applied to a number of problems – Speech, Machine Translation, IR, Speech, Machine Translation, IR, SummarizationSummarization
![Page 17: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/17.jpg)
1717
Statistical Language Modeling Statistical Language Modeling based Approaches: Backgroundbased Approaches: Background
Query FormulationModel
User Information need
Ideal Document
Given a query, which is most likely to be the Ideal Document?
Lemur
Query
In spite of the progress, not much work In spite of the progress, not much work to capture, model and integrate user to capture, model and integrate user context !context !
![Page 18: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/18.jpg)
1818
Motivation for our Motivation for our approachapproach
Information retrieval
Information retrieval (IR) is the science of searching for information in documents,
searching for documents themselves, searching for metadata which
User Past Search Contexts
Ideal document
Encyclopedia gives a brief description of the physical traits of this animal.
The Lemur toolkit for language modeling and information retrieval is documented and made available for download.
![Page 19: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/19.jpg)
1919
Statistical Language Modeling Statistical Language Modeling based Approaches : Overviewbased Approaches : Overview
From user contexts, capture From user contexts, capture statistical properties of textsstatistical properties of texts
Use the same to improve search Use the same to improve search resultsresults
Different ContextsDifferent Contexts Unigram and BigramsUnigram and Bigrams
Simple N-gram based approachesSimple N-gram based approaches Relationship between query and Relationship between query and
document wordsdocument words Noisy Channel based approach Noisy Channel based approach
![Page 20: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/20.jpg)
2020
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web I Search : A suite of approaches for Personalized Web Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 21: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/21.jpg)
2121
N-gram based Approaches: N-gram based Approaches: MotivationMotivation
Information retrieval
Information retrieval (IR) is the science of searching for information in documents,
searching for documents themselves, searching for metadata which
Past Search Contexts
Ideal document
Lemur - Encyclopedia gives a brief description of the physical traits of this animal.
The Lemur toolkit for language modeling and information retrieval is documented and made available for download.
Unigrams
Information
Retrieval
Documents
…
Bigrams
Information retrieval
Searching documents
Information documents
…
![Page 22: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/22.jpg)
2222
Sample user profileSample user profile
![Page 23: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/23.jpg)
2323
Learning user profileLearning user profile
Given Past search historyGiven Past search history
HHuu = {(q = {(q11, rf, rf11), (q), (q22, rf, rf22), …, (q), …, (qnn, rf, rfnn)})}
rfrfall all = contentation of all rf= contentation of all rf
For each unigram wFor each unigram wii
User profileUser profile
![Page 24: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/24.jpg)
2424
RerankingReranking
Recall, in general LM for IRRecall, in general LM for IR
Our ApproachOur Approach
![Page 25: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/25.jpg)
2525
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web I Search : A suite of approaches for Personalized Web Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 26: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/26.jpg)
2626
Noisy Channel based Noisy Channel based ApproachApproach
Documents and Queries different Documents and Queries different information spacesinformation spacesQueries – short, conciseQueries – short, conciseDocuments – more descriptiveDocuments – more descriptiveMost methods to retrieval or Most methods to retrieval or
personalized web search do not model personalized web search do not model thisthis
We capture relationship between We capture relationship between query and document wordsquery and document words
![Page 27: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/27.jpg)
2727
Noisy Channel based approachNoisy Channel based approach Motivation Motivation
Query Generation Process(Noisy Channel)
Ideal Document
Retrieval
Query Generation Process(Noisy Channel)
![Page 28: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/28.jpg)
2828
Similar to Statistical Machine Similar to Statistical Machine TranslationTranslation
Given an english sentence translate into french Given an english sentence translate into french
Given a query, retrieve documents closer to ideal documentGiven a query, retrieve documents closer to ideal document
Noisy channel 1French
Sentence
English
Sentence
Noisy Channel 2Ideal
Document
Query
P(e/f)
P(q/w)
![Page 29: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/29.jpg)
2929
Learning user profileLearning user profile
User profile: Translation ModelUser profile: Translation Model Triples : (qw,dw,p(qw/dw))Triples : (qw,dw,p(qw/dw))Use Statistical Machine Translation Use Statistical Machine Translation
methodsmethodsLearning user profile training a Learning user profile training a
translation modeltranslation model In SMT: Training a translation modelIn SMT: Training a translation model
From Parallel textsFrom Parallel textsUsing EM algorithmUsing EM algorithm
![Page 30: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/30.jpg)
3030
Learning User profileLearning User profile
Extracting Parallel TextsExtracting Parallel Texts From Queries and corresponding snippets from From Queries and corresponding snippets from
clicked documentsclicked documents
Training a Translation ModelTraining a Translation Model GIZA++ - an open source tool kit widely used for GIZA++ - an open source tool kit widely used for
training translation models in Statistical Machine training translation models in Statistical Machine Translation research.Translation research.
U. Rohini, Vamshi Ambati, and Vasudeva Varma. Statistical machine transla-tion models for personalized search. Technical report, International Institute ofInformation Technology, 2007
![Page 31: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/31.jpg)
3131
Sample user profileSample user profile
![Page 32: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/32.jpg)
3232
RerankingReranking
Recall, in general LM for IRRecall, in general LM for IR
Noisy Channel based approachNoisy Channel based approach
Lemur - Encyclopedia gives a brief description of the physical traits of this
animal.
The Lemur toolkit for language modeling and information retrieval is documented and made available for download.
lemur
Lemur encyclopedia … brief …
Lemur toolkit … information retireval …
P(retrieval/lemur)
D4 :D1 :
![Page 33: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/33.jpg)
3333
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web I Search : A suite of approaches for Personalized Web Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback Experiments Experiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 34: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/34.jpg)
3434
Machine Learning based Machine Learning based Approaches:IntroductionApproaches:Introduction
Most machine learning for IR - Binary Most machine learning for IR - Binary classification problem – “relevant” and classification problem – “relevant” and “non-relevant”“non-relevant”
Click through data Click through data Click is not an absolute relevance but relative Click is not an absolute relevance but relative
relevancerelevancei.e., assuming clicked – relevant, un i.e., assuming clicked – relevant, un
clicked - irrelevant is wrong.clicked - irrelevant is wrong. Clicks – biasedClicks – biased Partial relative relevance - Clicked documents Partial relative relevance - Clicked documents
are more relevant than the un clicked are more relevant than the un clicked documents.documents.
![Page 35: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/35.jpg)
3535
BackgroundBackground
Ranking SVMRanking SVMA variation of SVMA variation of SVMLearns from Partial Relevance DataLearns from Partial Relevance DataLearning similar to classification SVMLearning similar to classification SVM
![Page 36: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/36.jpg)
3636
Ranking SVMs based methodRanking SVMs based method
Use Ranking SVMs for learning user Use Ranking SVMs for learning user profileprofile
ExperimentedExperimentedDifferent featuresDifferent features
Unigram, bigramUnigram, bigramDifferent Feature weightsDifferent Feature weights
Boolean, Term Frequency, Normalized Term Boolean, Term Frequency, Normalized Term FrequencyFrequency
![Page 37: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/37.jpg)
3737
Learning user profileLearning user profile
User profile : a weight vectorUser profile : a weight vector Learning: Training an SVM ModelLearning: Training an SVM Model StepsSteps
Extracting FeaturesExtracting Features Computing Feature WeightsComputing Feature Weights Training SVMTraining SVM
1. Uppuluri R, Ambati V, Improving web search results using collaborative filtering, In proceedings of 3rd International Workshop on Web Personalization (ITWP), held in conjunction with AAAI 2006, 2006.
2. U. Rohini and Vasudeva Varma. A novel approach for re-ranking of search results using collaborative filtering. In Proceeedings of International Conference on Computing: Theory and Applications (ICCTA’07), pages 491–495, Kolkota, India, March 2007
![Page 38: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/38.jpg)
3838
Extracting FeaturesExtracting Features
Features : unigram, bigramFeatures : unigram, bigram
Given Past search historyGiven Past search history
HHuu = {(q = {(q11, rf, rf11), (q), (q22, rf, rf22), …, (q), …, (qnn, rf, rfnn)})}
rfrfall all = contentation of all rf= contentation of all rf
Remove stop words from rfRemove stop words from rfallall
Extract all unigrams (or bigrams) from Extract all unigrams (or bigrams) from rfrfallall
![Page 39: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/39.jpg)
3939
Computing Feature WeightsComputing Feature Weights
In each Relevant Document (di), In each Relevant Document (di), compute weights of features:compute weights of features:Boolean WeightingBoolean Weighting
1 or 01 or 0Term Frequency WeightingTerm Frequency Weighting
tfw – Number of times it occurs in ditfw – Number of times it occurs in diNormalized Term Frequency WeightingNormalized Term Frequency Weighting
tfw/ |di| |Q|tfw/ |di| |Q|
![Page 40: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/40.jpg)
4040
Training SVMTraining SVM
Each relevant document – represent Each relevant document – represent as a string of features and as a string of features and corresponding weightscorresponding weights
We used SVMWe used SVMlightlight for training for training
![Page 41: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/41.jpg)
4141
Sample Training
Sample User Profile
![Page 42: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/42.jpg)
4242
RerankingReranking
Sim(Q,D) = W. Sim(Q,D) = W. ФФ(Q,D)(Q,D) W – weight vector/user profileW – weight vector/user profile ФФ(Q,D) – vector of term and their weights(Q,D) – vector of term and their weights
Measure of similarity between Q and DMeasure of similarity between Q and D Each term – term in the queryEach term – term in the query Term weight – product of weights in the query Term weight – product of weights in the query
and the document (boolean, term and the document (boolean, term frequency,normalized term frequency)frequency,normalized term frequency)
![Page 43: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/43.jpg)
4343
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web I Search : A suite of approaches for Personalized Web Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 44: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/44.jpg)
4444
Personalized Search without Personalized Search without Relevance Relevance
Feedback:IntroductionFeedback:IntroductionCan personalized be done without Can personalized be done without
relevance feedback about which relevance feedback about which documents are relevantdocuments are relevant
How much informative are the How much informative are the queries posed by usersqueries posed by users
Is information contained in the Is information contained in the queries enough to personalize?queries enough to personalize?
![Page 45: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/45.jpg)
4545
ApproachApproach
Past queries of the user availablePast queries of the user availableMake effective use of past queriesMake effective use of past queriesSimple N-gram based approach Simple N-gram based approach
![Page 46: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/46.jpg)
4646
Learning user profileLearning user profile
Given Past search historyGiven Past search history
HHuu = {q = {q11 q q22, q, qnn } }
qqconcatconcat : Concatenation of all queries : Concatenation of all queries
For each unigram wFor each unigram wii
User profileUser profile
![Page 47: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/47.jpg)
4747
Sample user profileSample user profile
![Page 48: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/48.jpg)
4848
RerankingReranking
In general LM for IRIn general LM for IR
Our ApproachOur Approach
U. Rohini, Vamshi Ambati, and Vasudeva Varma. Personalized search without relevance feedback. Technical report, International Institute of Information Technology, 2007
![Page 49: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/49.jpg)
4949
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web I Search : A suite of approaches for Personalized Web Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 50: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/50.jpg)
5050
Experiments: Introduction, Experiments: Introduction, ProblemsProblems
Aim: To see how they perform by comparing it Aim: To see how they perform by comparing it with a baseline with a baseline
ProblemsProblems No standard evaluation framework No standard evaluation framework Data Data
Lack of standardization Lack of standardization Comparison with previous work difficultComparison with previous work difficult Difficult to repeat previously conducted experimentsDifficult to repeat previously conducted experiments Difficult to share results and observationsDifficult to share results and observations Repeating effort to collect data over and overRepeating effort to collect data over and over Identified as a problem and need for standardization (Allan Identified as a problem and need for standardization (Allan
et al. 2003)et al. 2003) Lack of standard personalized search baselinesLack of standard personalized search baselines
In our work, used a variation of the Rocchio AlgorithmIn our work, used a variation of the Rocchio Algorithm MetricsMetrics
![Page 51: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/51.jpg)
5151
Experiments: DataExperiments: Data
Click through data from a popular Click through data from a popular search enginesearch engine
Data collected from 250k million Data collected from 250k million users over 3 months data in 2006.users over 3 months data in 2006.
Consists of (anonymous id, query, Consists of (anonymous id, query, timestamp,position of the timestamp,position of the click,domain name of the click url)click,domain name of the click url)
![Page 52: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/52.jpg)
5252
Experiments: Sample DataExperiments: Sample Data
AnonID
Query QueryTime
Position url
2722 charles drew 2006-03-01 18:00:07
10 http://www.cdhcmedical.com
2722 military rental benefits
2006-03-10 09:32:38
4 http://www.valoans.com
2722 tricare 2006-03-16 19:07:38
2 http://www.tricareonline.com
142 rentdirect.com 2006-03-01 07:17:12
142 westchester.gov 2006-03-20 03:55:57
1 http://www.westchestergov.com
142 vera.org 2006-04-08 08:38:42
1 http://www.vera.org
142 broadway.vera.org 2006-04-08 08:39:30
![Page 53: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/53.jpg)
5353
Issues with the query log Issues with the query log datadata
Web Search enginesWeb Search enginesChanging search engine indicesChanging search engine indicesHowever, top 10 results mostly sameHowever, top 10 results mostly same
Implicit feedback – Partial relevance Implicit feedback – Partial relevance feedbackfeedback
90% of the users click only top 10 results.90% of the users click only top 10 results.95% only top 5 results95% only top 5 results
Only contained the domain name of Only contained the domain name of the clicked URLsthe clicked URLs
![Page 54: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/54.jpg)
5454
Extracting Data SetExtracting Data Set ConditionsConditions
A query should have at least 1 clickA query should have at least 1 click Exhibit long term behaviour (pose query over 3 months and exhibit Exhibit long term behaviour (pose query over 3 months and exhibit
similar interests)similar interests) AssumptionsAssumptions
Each anonymous id corresponds to one userEach anonymous id corresponds to one user Use the domain name of the click url while comparingUse the domain name of the click url while comparing
Final Data SetFinal Data Set How to split the data for training (learning user profile) and testing ?How to split the data for training (learning user profile) and testing ?
Temporally Temporally Training data – learning user profile, Testing data – Testing Training data – learning user profile, Testing data – Testing First 2 months for training, third month for testingFirst 2 months for training, third month for testing
17 users17 users 51.88 average queries in train set and 12.64 average queries in test 51.88 average queries in train set and 12.64 average queries in test
set.set.
![Page 55: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/55.jpg)
5555
BaselineBaseline
Variation of Rocchio algorithm (Rocchio Variation of Rocchio algorithm (Rocchio 1971)1971)
Learning profileLearning profile User profile Vector of word and weightsUser profile Vector of word and weights For each queryFor each query
For each clicked documentFor each clicked document Collect corresonding snippet from search engineCollect corresonding snippet from search engine
Concatenate all such snippets for all Concatenate all such snippets for all queiresqueires
Compute frequency distribution of words Compute frequency distribution of words RerankingReranking
Sim (Q,D) = (tfSim (Q,D) = (tfqq/|Q| +tf/|Q| +tfruprup/|RUP|). tf/|RUP|). tfDD/|D|/|D|
![Page 56: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/56.jpg)
5656
MetricsMetrics
MRR – Mean Reciprocal RankMRR – Mean Reciprocal RankMrr(Q,D,u) = Mrr(Q,D,u) = ∑ ∑q q ЄЄ Q Q rr(q,R rr(q,RQ,D,u Q,D,u ))
----------------------------------------------
|Q||Q|
rr(q,Rrr(q,RQ,D,u Q,D,u ) – position of the first relevant ) – position of the first relevant document and 0 if no relevant result in document and 0 if no relevant result in the top N(=10).the top N(=10).
![Page 57: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/57.jpg)
5757
Compare top n urls
Set upSet up
Test DataQuery+clicked urls
Results
MRR, P@n
Reranked Reslts
Reranker
1. Rerank top M(=10) resuts – click through data2. First get the results from google, Ignore ranks given by Google (Similar to Tan, Shen & Zhai 2006)3. Rescore the results using appropriately4. Sort in descending order and return
Query
Clicked urls
Top m urls
![Page 58: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/58.jpg)
5858
Results Results Simple N-gram based MethodsSimple N-gram based Methods
Method MRR Improvement(%)
Baseline 0.305
unigrams 0.332 8.85
bigrams 0.338 11.18
![Page 59: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/59.jpg)
5959
Noisy Channel Based MethodNoisy Channel Based Method
Experiment 1Experiment 1Comparison with baselineComparison with baseline
Experiment 2Experiment 2Different methods of extracting parallel Different methods of extracting parallel
textstextsExperiment 3Experiment 3
Different training schemesDifferent training schemesDifferent contexts for trainingDifferent contexts for trainingDifferent training modelsDifferent training models
![Page 60: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/60.jpg)
6060
Experiment 1Experiment 1
Comparison with baseline
MethodMethod MRRMRR Improvement Improvement (%)(%)
BaselineBaseline 0.3050.305
Noisy Noisy ChannelChannel
0.3390.339 11.5111.51
![Page 61: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/61.jpg)
6161
Experiment 2Experiment 2
Extracting Parallel Texts : Comparison of Extracting Parallel Texts : Comparison of methodsmethodsMethoMetho
ddSynthetic queriesSynthetic queries Parallel textsParallel texts
NS1NS1 NoNo {Queries || Corr. Rel. Docs.}{Queries || Corr. Rel. Docs.}
NS2NS2 YesYes
Trigrams from Snippets of each Trigrams from Snippets of each Rel. DocRel. Doc
{Queries || Corr. Rel. Docs} {Queries || Corr. Rel. Docs}
UU
{Synthetic Queries || Corr. Rel. {Synthetic Queries || Corr. Rel. Docs }Docs }
NS3NS3 YesYes
Trigrams from Snippets of each Trigrams from Snippets of each Rel. DocRel. Doc
+ Document Title+ Document Title
{Queries || Corr. Rel. Docs} {Queries || Corr. Rel. Docs}
UU
{Synthetic Queries || Corr. Rel. {Synthetic Queries || Corr. Rel. Docs }Docs }
NS4NS4 YesYes
Trigrams from Snippets of each Trigrams from Snippets of each Rel. DocRel. Doc
{Queries || Corr. Rel. Docs} {Queries || Corr. Rel. Docs}
UU
{Synthetic Queries || Corr. Rel. {Synthetic Queries || Corr. Rel. Docs }Docs }
UU
{Queries || Document Titles}{Queries || Document Titles}
![Page 62: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/62.jpg)
6262
Results Results
MethodMethod MRRMRR Improvement(Improvement(%)%)
BaselineBaseline 0.3050.305
NS1NS1 0.3390.339 11.5111.51
NS2NS2 0.3780.378 24.3424.34
NS3NS3 0.3860.386 26.9726.97
NS4NS4 0.3740.374 23.0223.02
NS1 – Query || Snippets of relevant documents NS3 – Query || Snippets of relevant documents
+ document Title || Snippets
+Synthetic query || Snippets
NS2 - Query || Snippets of relevant documents NS2 - Query|| Snippets of relevant documents +Synthetic query || Snippets +Synthetic query || Snippets
+ query || document title
![Page 63: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/63.jpg)
6363
Experiment 3Experiment 3
Different training schemesDifferent training schemesDifferent contexts for trainingDifferent contexts for training
Snippet Vs DocumentSnippet Vs DocumentDifferent training modelsDifferent training models
Different Training ModelsDifferent Training Models
![Page 64: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/64.jpg)
6464
•Data
•Explicit Feedback data collected from 7 users
•For each query, each user examined top 10 documents and identified top 10 documents
•Collected the top 10 results for all queries. Total documents 3469 documents
•Set up
•3469 documents - created lucene index.
•For reranking, first retrieve the results using lucene and then rerank them using the noisy channel approach.
•We perform 10 fold cross validation
Data and Set upData and Set up
![Page 65: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/65.jpg)
6565
ResultsResults
Training Model
IBM Model1 GIZA++
Document Train
Snippet Train
Document Train
Snippet Train
Document Test
0.2062 0.2333 0.1799 0.2075
Snippet Test
0.2028 0.2488 0.1834 0.2034
![Page 66: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/66.jpg)
6666
ResultsResults
I - Document Training and Document TestingII - Document Training and Snippet Testing III - Snippet Training and Document Testing IV - Snippet Training and Snippet Testing
![Page 67: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/67.jpg)
6767
Results Results SVMSVM
Method MRR Improvement(%)
Baseline 0.305
SVM1 0.290 -4.61
SVM2 0.334 9.689
SVM3 0.369 21.38
SVM4 0.304 0
SVM5 0.359 18.09
SVM1 - unigram, Binary
SVM2 - unigram, Term Frequency
SVM3 - unigram, normalized term frequency
SVM4 - bigram, normalized term frequency
SVM4 – unigrams + bigrams, normalized term frequency
![Page 68: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/68.jpg)
6868
Results Results Personalization without Relevance Personalization without Relevance
FeedbackFeedback
Method MRR Improvement(%)
Baseline 0.305
LM 0.332 8.85
PWRF 0.350 15.131
PRWF+smoothing
0.370 21.31
PRWF – personalization without relevance feedback using only the profile learnt from queries alone
PRWF+Smoothing – smoothing the probabilities from the user profile using huge query language model obtained from all the queries from all the users in collection 01 of the click through data
![Page 69: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/69.jpg)
6969
Experiments: SummaryExperiments: Summary
Language Modeling – Best Results! Language Modeling – Best Results! Interesting framework Personalized SearchInteresting framework Personalized Search Simple N-gram based approaches also worked wellSimple N-gram based approaches also worked well Noisy Channel model worked bestNoisy Channel model worked best
Extracting Synthetic Queries helpedExtracting Synthetic Queries helped Different Training schemesDifferent Training schemes
IBM Model1 Vs GIZA++IBM Model1 Vs GIZA++ Snippet Vs DocumentSnippet Vs Document
Machine Learning – competitive resultsMachine Learning – competitive results Different Features and weightsDifferent Features and weights
Without Relevance Feedback – Very encouraging Without Relevance Feedback – Very encouraging resultsresults Simple Approach worked wellSimple Approach worked well Sparsity – Query log was usefulSparsity – Query log was useful
![Page 70: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/70.jpg)
7070
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web SearchI Search : A suite of approaches for Personalized Web Search Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 71: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/71.jpg)
7171
Query Log Study: IntroductionQuery Log Study: Introduction
Large interest in finding patterns and Large interest in finding patterns and computing statistics from query logscomputing statistics from query logs
Previous workPrevious workPatterns & statistics of queries : Common Patterns & statistics of queries : Common
queries, avg. no. of words, avg. no. of queries, avg. no. of words, avg. no. of queries per session etc.queries per session etc.
Little work on analyzing click Little work on analyzing click behaviour of usersbehaviour of usersGranka et. al - Eye tracking studyGranka et. al - Eye tracking study
![Page 72: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/72.jpg)
7272
Query Log Study: Our Query Log Study: Our AnalysisAnalysis
Analyzing clicking behaviour of usersAnalyzing clicking behaviour of usersStudy if any general pattern in clicking Study if any general pattern in clicking
behaviourbehaviourAim to answer the following Aim to answer the following
Expt1Expt1: Do all users view results from : Do all users view results from top to bottom?top to bottom?
Expt2Expt2: Do all users view same number : Do all users view same number of results? of results?
![Page 73: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/73.jpg)
7373
Query Log Study: Query Log Study: ObservationsObservations
Expt1: All users view results from top to Expt1: All users view results from top to bottom?bottom? YES!! - For 90% of Queries YES!! - For 90% of Queries Why is this important ?Why is this important ?
Expt2: Expt2: How many top results does the user How many top results does the user view? => Deepest click made by usersview? => Deepest click made by users Statistical Analysis showed that deepest clicks made by Statistical Analysis showed that deepest clicks made by
a sample of users follow a Zipf’s distribution or Power a sample of users follow a Zipf’s distribution or Power lawlaw
Many users view only top 5 (about 90/95%), few users view Many users view only top 5 (about 90/95%), few users view top 10, much fewer view top 20 and so ontop 10, much fewer view top 20 and so on
Why is this important?Why is this important?
![Page 74: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/74.jpg)
7474
Outline of the talkOutline of the talk IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground Problem DescriptionProblem Description Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search I Search : A suite of approaches for Personalized Web SearchI Search : A suite of approaches for Personalized Web Search Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple N-gram based methodsSimple N-gram based methods Noisy Channel based methodNoisy Channel based method
Machine Learning based approachMachine Learning based approach Ranking SVM based methodRanking SVM based method
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Query Log StudyQuery Log Study Simulated FeedbackSimulated Feedback Conclusions and Future DirectionsConclusions and Future Directions
![Page 75: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/75.jpg)
7575
Simulated Feedback: Simulated Feedback: IntroductionIntroduction
Relevance Feedback : Types, Relevance Feedback : Types, problemsproblemsExplicit Explicit
Difficult to collectDifficult to collectImplicitImplicit
Clickthrough data from search engines not Clickthrough data from search engines not availableavailable
Repeatability of experiments – Problem!Repeatability of experiments – Problem!Web – Dynamic data collections : Web – Dynamic data collections :
Feedback collected becomes staleFeedback collected becomes stalePrivacyPrivacy
![Page 76: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/76.jpg)
7676
Simulated Feedback: Simulated Feedback: MotivationMotivation
Simulated Feedback: Like from explicit and Simulated Feedback: Like from explicit and implicit feedbackimplicit feedback
Potential area – outcome useful for web Potential area – outcome useful for web search and personalizationsearch and personalization
Easy to create Easy to create CustomizableCustomizable Large amounts can be createdLarge amounts can be created RepeatableRepeatable Testing specific domainsTesting specific domains
![Page 77: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/77.jpg)
7777
Simulated Feedback Creation Simulated Feedback Creation
Simulator
User Creator
Web Search Behaviour SimulatorStep1: Formulate queryStep1: Formulate queryStep2: Posing to a search engineStep2: Posing to a search engineStep3: Looking at results returned by Step3: Looking at results returned by search enginesearch engineStep4: Possibly clicking one or more Step4: Possibly clicking one or more
resultsresults
Parameters
User User IdId
QueryQuery Simulated ClickSimulated Click
11 LemurLemur www.en.wikipedia.org/wiki/Lemurwww.en.wikipedia.org/wiki/Lemur
www.thewildones.org/Animals/lemurwww.thewildones.org/Animals/lemur.html.html
Simulated Feedback
SIMULATOR
![Page 78: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/78.jpg)
7878
OutlineOutline IntroductionIntroduction
Current Search Engines – ProblemsCurrent Search Engines – Problems MotivationMotivation BackgroundBackground ProblemProblem Solution OutlineSolution Outline ContributionsContributions
Review of Personalized SearchReview of Personalized Search Thesis OutlineThesis Outline Statistical Language modeling based approachesStatistical Language modeling based approaches
Simple Language model based approachesSimple Language model based approaches Noisy ChannelNoisy Channel
Machine Learning based approachMachine Learning based approach Ranking SVMRanking SVM
Personalization without Relevance FeedbackPersonalization without Relevance Feedback ExperimentsExperiments Conclusions and Future DirectionsConclusions and Future Directions
![Page 79: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/79.jpg)
7979
ConclusionsConclusions
Statistical Language Modeling based Statistical Language Modeling based approachesapproaches
Machine learning based approachMachine learning based approachPersonalized Search without relevance Personalized Search without relevance
feedbackfeedbackPerformed evaluation using query log Performed evaluation using query log
datadataQuery Log Analysis and Simulated Query Log Analysis and Simulated
FeedbackFeedback
![Page 80: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/80.jpg)
8080
Future DirectionsFuture Directions
Recommending DocumentsRecommending Documents Extend to exploit Repetition in queries and Extend to exploit Repetition in queries and
clickthroughsclickthroughs Language Modeling based ApproachesLanguage Modeling based Approaches
Capture Richer contextCapture Richer context N-gram based method : trigrams etcN-gram based method : trigrams etc Noisy Channel based method : bigramNoisy Channel based method : bigram
Machine learning based ApproachesMachine learning based Approaches Can learn non-text patterns or behaviourCan learn non-text patterns or behaviour
Personalized SummarizationPersonalized Summarization Simulating user behaviourSimulating user behaviour
![Page 81: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/81.jpg)
8181
Thank youThank you
![Page 82: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/82.jpg)
8282
Simple N-gram based Simple N-gram based approachesapproaches
N-gram : general term for wordsN-gram : general term for words1-gram : unigram, 2-gram : bigram1-gram : unigram, 2-gram : bigram
Capture statistical properties of textCapture statistical properties of textSingle words (Unigrams)Single words (Unigrams)Two adjacent words (Bigrams)Two adjacent words (Bigrams)
![Page 83: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/83.jpg)
8383
Query Log Study : Query Log Study : IntroductionIntroduction
Query logsQuery logs Large interest in finding patterns and Large interest in finding patterns and
computing statistics from query logscomputing statistics from query logs Previous workPrevious work
Patterns and statistics on queriesPatterns and statistics on queriesCommon queries, avg. no. of words, avg. no. of Common queries, avg. no. of words, avg. no. of
queries per session etcqueries per session etc
Little work on analyzing click behaviour of Little work on analyzing click behaviour of usersusers Granka et. al - Eye tracking studyGranka et. al - Eye tracking study
![Page 84: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/84.jpg)
8484
Query Log Study: Our Query Log Study: Our AnalysisAnalysis
Focus on Analyzing clicking Focus on Analyzing clicking behaviour of usersbehaviour of users
Study if any general pattern in Study if any general pattern in clicking behaviourclicking behaviour
Aim to answer the following Aim to answer the following All users view results from top to bottom All users view results from top to bottom
(Expt 1)(Expt 1)All users view same number of results? All users view same number of results?
(Expt 2)(Expt 2)
![Page 85: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/85.jpg)
8585
Query log DataQuery log Data
Click through data from a popular Click through data from a popular search enginesearch engine
Data collected from 250k million Data collected from 250k million users over 3 months data in 2006.users over 3 months data in 2006.
Consists of (anonymous id, query, Consists of (anonymous id, query, timestamp,position of the timestamp,position of the click,domain name of the click url)click,domain name of the click url)
![Page 86: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/86.jpg)
8686
Sample DataSample Data
AnonID
Query QueryTime
Position url
2722 charles drew 2006-03-01 18:00:07
10 http://www.cdhcmedical.com
2722 military rental benefits
2006-03-10 09:32:38
4 http://www.valoans.com
2722 tricare 2006-03-16 19:07:38
2 http://www.tricareonline.com
142 rentdirect.com 2006-03-01 07:17:12
142 westchester.gov 2006-03-20 03:55:57
1 http://www.westchestergov.com
142 vera.org 2006-04-08 08:38:42
1 http://www.vera.org
142 broadway.vera.org
2006-04-08 08:39:30
![Page 87: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/87.jpg)
8787
Experiment 1Experiment 1
All users view results from top to All users view results from top to bottom?bottom?
PositionPosition – position of the search result in – position of the search result in the search enginethe search engine
For each query For each query Arrange clicks based on time of clickArrange clicks based on time of clickIf all the postions are in ascending order, If all the postions are in ascending order,
user views from top to bottomuser views from top to bottomThe query is said to be an anomaly if not so!The query is said to be an anomaly if not so!
![Page 88: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/88.jpg)
8888
![Page 89: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/89.jpg)
8989
ObservationsObservations
For 90% of the queries, users always For 90% of the queries, users always go from top to bottom!!!go from top to bottom!!!
For the rest 10% queriesFor the rest 10% queriesUses clicks at least one bottom result Uses clicks at least one bottom result
before clicking a top resultbefore clicking a top resultUser not happy with search engine User not happy with search engine
rankingrankingNot the behaviour of the user - 50% Not the behaviour of the user - 50%
users exhibit itusers exhibit itCertain Queries are “hard” ?Certain Queries are “hard” ?
![Page 90: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/90.jpg)
9090
Experiment 2Experiment 2
How many top results does the user How many top results does the user view?view?
Intuition Intuition Typically users don’t view all the resultsTypically users don’t view all the resultsOnly top few – How many?Only top few – How many?Depends on the user?Depends on the user?
Goal: To see, how deep a user goes Goal: To see, how deep a user goes to see resultsto see results
![Page 91: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/91.jpg)
9191
Patience – how many results a user Patience – how many results a user viewsviews
1.1. For each query, the deepest click. For each query, the deepest click. Maximum over all queriesMaximum over all queries
2.2. For each query, average click. Maximum For each query, average click. Maximum over all queriesover all queries
![Page 92: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/92.jpg)
9292
For each query, the deepest For each query, the deepest click. Maximum over all click. Maximum over all
queriesqueries
![Page 93: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/93.jpg)
9393
For each query, average click. For each query, average click. Maximum over all queriesMaximum over all queries
![Page 94: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/94.jpg)
9494
ObservationsObservations
Statistical Analysis show they follow Statistical Analysis show they follow a Zipf’s distribution or Power lawa Zipf’s distribution or Power law
Many users view only top 5 (about Many users view only top 5 (about 90/95%), few users view top 10, 90/95%), few users view top 10, much fewer view top 20 and so onmuch fewer view top 20 and so on
Can characterize patience of a group Can characterize patience of a group of users using Zipf’s law or power lawof users using Zipf’s law or power law
![Page 95: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/95.jpg)
9595
Simulated FeedbackSimulated Feedback
Relevance Feedback Relevance Feedback Explicit Explicit
Difficult to collectDifficult to collectImplicitImplicit
Clickthrough data from search engines not Clickthrough data from search engines not availableavailable
Repeatability of experiments – Problem!Repeatability of experiments – Problem!Web – Dynamic data collections : Web – Dynamic data collections :
Feedback collected becomes staleFeedback collected becomes stale
![Page 96: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/96.jpg)
9696
Simulated FeedbackSimulated Feedback
Simulated Feedback – Drawing analog Simulated Feedback – Drawing analog from explicit and implicit feedbackfrom explicit and implicit feedback
Potential area – outcome useful for web Potential area – outcome useful for web search and personalizationsearch and personalization
Easy to create Easy to create CustomizableCustomizable Large amounts can be createdLarge amounts can be created RepeatableRepeatable Testing specific domainsTesting specific domains
![Page 97: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/97.jpg)
9797
Creating simulated FeedbackCreating simulated FeedbackCreating Simulated userCreating Simulated userSimulating user web search behaviourSimulating user web search behaviour
U. Rohini, Vamshi Ambati, and Vasudeva Varma. Creating simulated feedback. Technical report, International Institute of Information Technology, 2007.
![Page 98: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/98.jpg)
9898
Creating Simulated UserCreating Simulated User
User Specific Parameters (User Specific Parameters (Unique idUnique id etc)etc)
Web search Specific parametersWeb search Specific parametersPatience Patience (From Query log analysis)(From Query log analysis)ThresholdThreshold
Others can be Interests (User Others can be Interests (User Profile/Model), Browsing History etc.Profile/Model), Browsing History etc.
We considered Patience and threshold in this work
![Page 99: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/99.jpg)
9999
PatiencePatience
Pick From Power law Distribution.
Many users view top 5, less few top 10, much fewer view top 20 and so on
![Page 100: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/100.jpg)
100100
Relevance ThresholdRelevance Threshold
Depends on the query and userDepends on the query and userFor some query, very high relevance For some query, very high relevance
is neededis neededWe compute it according to the We compute it according to the
query for each userquery for each user
![Page 101: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/101.jpg)
101101
Simulating user web search Simulating user web search behaviourbehaviour
Formulate a Web Search ProcessFormulate a Web Search ProcessStep1: Create the queryStep1: Create the queryStep2: Posing to a search engineStep2: Posing to a search engineStep3: Looking at the results returned Step3: Looking at the results returned
by the search engineby the search engineStep4: Possibly clicking one or more Step4: Possibly clicking one or more
resultsresultsStep 5: Reformulate if unsatisfiedStep 5: Reformulate if unsatisfied
Simulate the search process for the Simulate the search process for the created usercreated userWe consider only Steps 1 to 4 in our approach
![Page 102: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/102.jpg)
102102
Simulating Step1:Simulating Step1:Formulating the queryFormulating the query
Can be very complexCan be very complexWe take a simple and practical We take a simple and practical
approachapproachAs of now, the queries are assumed As of now, the queries are assumed
to be given to the systemto be given to the system
![Page 103: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/103.jpg)
103103
Simulating Step2:Simulating Step2:Searching the Search EngineSearching the Search Engine
Given a search engineGiven a search enginePose the query from Step1 to the Pose the query from Step1 to the
search enginesearch engineGet the search results.Get the search results.
![Page 104: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/104.jpg)
104104
Simulating Step3Simulating Step3Looking at the Search ResultsLooking at the Search Results
Simulation of this step can be done in Simulation of this step can be done in a number of ways a number of ways
Ex: Random, top to bottom, bottom to Ex: Random, top to bottom, bottom to up etcup etc
We considerWe considerSequential from Top to bottom until Sequential from Top to bottom until
patience is zeropatience is zeroFor each document performs clicks as in For each document performs clicks as in
Step4Step4(motivated by Radlinski et al, Granka et al )(motivated by Radlinski et al, Granka et al )
![Page 105: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/105.jpg)
105105
Simulating Step 4Simulating Step 4Clicking the resultsClicking the results
Crucial Step of our simulationCrucial Step of our simulationUser Clicks a result ifUser Clicks a result if
The snippet shown by the search engine The snippet shown by the search engine appears to be relevant to the userappears to be relevant to the user
The result below it is not more relevant The result below it is not more relevant than it (motivated by Radlinski et al, than it (motivated by Radlinski et al, Granka et al )Granka et al )
![Page 106: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/106.jpg)
106106
Simulated Feedback Creation Simulated Feedback Creation
User CreatorWeb Search
Behaviour Simulator
SimulatedFeedback
Search Engine
Parameters
Search results
Simulator
![Page 107: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/107.jpg)
107107
Evaluation: ProblemsEvaluation: Problems
Is Simulated Feedback relevant?Is Simulated Feedback relevant?How different is it from a randomly How different is it from a randomly
created feedback?created feedback?
Evaluation -Evaluation -No standard methods to evaluateNo standard methods to evaluateNo Metrics to quantify successNo Metrics to quantify successHow and what to compare ?How and what to compare ?
![Page 108: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/108.jpg)
108108
ExperimentsExperiments
Experiment 1Experiment 1Comparison with Implicit Feedback from Comparison with Implicit Feedback from
Query log DataQuery log DataExperiment 2Experiment 2
Comparison with BaselinesComparison with BaselinesExperiment 3Experiment 3
Comparison with Explicit FeedbackComparison with Explicit Feedback
![Page 109: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/109.jpg)
109109
Experimental Set upExperimental Set up
Creating simulated userCreating simulated userRandomly assign unique id Randomly assign unique id Patience Patience
Draw randomly from Power law Distribution : Draw randomly from Power law Distribution : 1- 25 1- 25
![Page 110: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/110.jpg)
110110
Experimental set upExperimental set up
Simulating Web Search ProcessSimulating Web Search ProcessPick a user from query log, gather all Pick a user from query log, gather all
queries posed by him. queries posed by him. Simulate Web search process of each Simulate Web search process of each
query in succession query in succession Step 1: Formulating a queryStep 1: Formulating a query
Pick each query in succession from the gathered Pick each query in succession from the gathered queriesqueries
Step 2: Searching the Search engineStep 2: Searching the Search enginePose the query to a search engine and gather resultsPose the query to a search engine and gather results
Step 3: Looking at the resultsStep 3: Looking at the resultsStep 4: Clicking one or moreStep 4: Clicking one or more
![Page 111: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/111.jpg)
111111
Sample Data CreatedSample Data Created
![Page 112: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/112.jpg)
112112
Experiment 1Experiment 1
Comparison with clickthroughs from query Comparison with clickthroughs from query loglog
For each query – Relevance Document Pool For each query – Relevance Document Pool (RDP)(RDP) All clicked documents for the query from all the All clicked documents for the query from all the
users in the query logusers in the query log
Average Accuracy = 60.04 %Average Accuracy = 60.04 %
![Page 113: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/113.jpg)
113113
Experiment 2Experiment 2
Random NavigationRandom NavigationPower law NavigationPower law NavigationRandom clickRandom click
![Page 114: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/114.jpg)
114114
Creating userCreating user
Creating Creating useruser
RandomRandom Power Power lawlaw
RandomRandom
ClickClickProposedProposed
Unique IDUnique ID Chosen Chosen random random uniqueunique
Chosen Chosen random random uniqueunique
Chosen Chosen random random uniqueunique
Chosen Chosen random random uniqueunique
PatiencePatience
-- -- -- From From power lawpower law
![Page 115: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/115.jpg)
115115
Creating Web Search ProcessCreating Web Search ProcessStepStep RandomRandom
NavigationNavigationPower Power lawlaw
RandomRandom
ClickClickProposedProposed
Step 1. Step 1.
FormulatFormulatee
QueryQuery
GivenGiven
Step 2. Step 2. Search Search
Pose to a search enginePose to a search engine
and get search resultsand get search results
Step 3. Step 3.
Look Look ResultsResults
Completely Completely RandomRandom
Power lawPower law From top From top
To bottomTo bottomFrom topFrom top
To bottomTo bottom
4. Click 4. Click ResultsResults
RandomRandom RandomRandom RandomRandom Relevance > Relevance > threshold.threshold.
&&&&
More More relevant relevant than belowthan below
Document.Document.
![Page 116: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/116.jpg)
116116
ResultsResults
![Page 117: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/117.jpg)
117117
Experiment 3Experiment 3
Comparison with explicit FeedbackComparison with explicit Feedback4 Judges4 JudgesSelect small sub set of data createdSelect small sub set of data created
25 users25 users1 query per user – total 25 queries1 query per user – total 25 queriesWe consider the query, and the We consider the query, and the
simulated feedback created for this simulated feedback created for this queryquery
![Page 118: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/118.jpg)
118118
Each judge given an evaluation formEach judge given an evaluation form Evaluation formEvaluation form
Details about the judgeDetails about the judge A table containing query and corresponding A table containing query and corresponding
simulated click urlssimulated click urls For each simulated click – judge feedback For each simulated click – judge feedback
Boolean feedback – 1 or 0Boolean feedback – 1 or 0
![Page 119: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/119.jpg)
119119
ResultsResults
Judge Accuracy = 66.02 %Judge Accuracy = 66.02 % Correlation between the judges = 0.859Correlation between the judges = 0.859
![Page 120: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/120.jpg)
120120
DiscussionDiscussion
6 % increase in accuracy over comparison 6 % increase in accuracy over comparison with query logwith query log Match problemsMatch problems Search Engine index changes – Relevance Search Engine index changes – Relevance
feedback becomes stale!feedback becomes stale! Too low relevant documents in RDP Too low relevant documents in RDP
““qualcom.com” - Only one document in RDP.qualcom.com” - Only one document in RDP. Focussed query, only user posed itFocussed query, only user posed it
Focussed query Vs General queryFocussed query Vs General query ““qualcomm.com” - only one query , one user posedqualcomm.com” - only one query , one user posed ““lottery” - 58 users , 24 unique click urlslottery” - 58 users , 24 unique click urls
![Page 121: Personalized Web Search using Clickthrough History](https://reader035.vdocument.in/reader035/viewer/2022062723/56813f64550346895daa370f/html5/thumbnails/121.jpg)
121121
RerankingReranking
In general LM for IRIn general LM for IR
Noisy Channel based approachNoisy Channel based approach
Lemur - Encyclopedia gives a brief description of the physical traits of this
animal.
The Lemur toolkit for language modeling and information retrieval is documented and made available for download.
lemur
Lemur encyclopedia … brief …
Lemur toolkit … information retireval …