intent mining from search results
DESCRIPTION
Intent Mining from Search Results. Jan Pedersen. Outline. Intro to Web Search Free text queries Architecture Why it works Result Set Mining Disambiguation Correction Amplification. The Worst Interface ( ca 1990). The Search Interface ( ca 2010). Search wasn’t always like this. - PowerPoint PPT PresentationTRANSCRIPT
Intent Mining from Search Results
Jan Pedersen
Outline
• Intro to Web Search– Free text queries– Architecture– Why it works
• Result Set Mining– Disambiguation– Correction– Amplification
The Worst Interface (ca 1990)
The Search Interface (ca 2010)
Search wasn’t always like this
ttl/(tennis and (racquet or racket))isd/1/8/2002 and motorcyclein/newmar-julieSource: USPTO
Salton’s Contribution
Source: cs.cornell.edu
• Free text queries• Approximate matching• Relevance ranking
• Exploit redundancy• Meta data• Scored-OR
Life of a query
Gerry Salton
(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))
Index• Separation between user query and backend query
• Relevance scoring and ranking• Query-in-context summaries
Why Does it Work?
Semantic Meta-Data
Segment Tail OverallAll Queries 100% 100%Word Count > 4 41% 20%Misspelled 21% 11%Perfect Matches Popularity 28% 54%Partial Matches Popularity 45% 28%No Matches Popularity 9% 7%
RESULT SET MINING
Query Expansion
• [Gerry Salton] [Gerry Salton Cornell]• Disambiguation via Expansion• Pseudo Relevance Feedback (Evans)
Life of a query (2)
Gerry Salton
(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))
Index
Gerry Salton Gerry Salton Cornell
• Result Set Analysis• Automated Query expansion• Reranking
Spelling Correction
• Session Log Mining• Multiple queries with Blending• Behavioral feedback loop
Blend(Scored-AND(200, “britinay”, “spares”), Scored-AND(200, “britney”, “spears”))
Scored-AND(200, OR(“britinay”, “britney”), OR(“spares”, “spears”))
Web Search
Gerry Salton
• Speller• Synonyms
Index
First Stage reRanking: 100K
(Scored-AND 200,”Gerry”, “Salton”)
IndexIndexIndexIndexIndex100B
LocalNews
Second Stage reRanking: 5K
Third Stage reRanking: 50
• Query Understanding• Federation• ReRanking and Blending
• Entity Detection• Grouping• Summarization
Post Result Triggering
• Alternative to Answer Blending• Structured Data integration• Off-page data joins
Grouping
• Reranked Results• Compressed Presentation• Coherently grouped
Summary
• Web Queries are not User Intent– Suffer from ambiguity and errors
• Intent can be mined from results– Query Correction– Disambiguation– Grouping and Organization