clustering personalized web search results

15
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng

Upload: joshua-savage

Post on 05-Jan-2016

38 views

Category:

Documents


4 download

DESCRIPTION

Clustering Personalized Web Search Results. Xuehua Shen and Hong Cheng. Introduction. Search engine’s objectives Rank most relevant search results at top Effectiveness PageRank / HITS Group and present different categories of search results Global view Clustering. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Clustering Personalized Web Search Results

Clustering Personalized Web Search Results

Xuehua Shen and Hong Cheng

Page 2: Clustering Personalized Web Search Results

Introduction

• Search engine’s objectives– Rank most relevant search results at top

• Effectiveness• PageRank / HITS

– Group and present different categories of search results

• Global view• Clustering

Page 3: Clustering Personalized Web Search Results

Clustering Personalized Search Results

• Study the clustering problem in the UCAIR framework

• Personalized search ranks or reranks the search results based on user implicit feedback

• Bring interesting problems– Efficient and effective clustering/presentation– Dynamically update the clustering results bas

ed on personalization

Page 4: Clustering Personalized Web Search Results

Goal

• Effective– Cluster user search results into meaningful groups – Present in a clear format– Provide users with main themes of search results

• Efficient– Implement efficient clustering algorithms

• Dynamic– Dynamically maintain the clustering results based on

personalized ranking and reranking

Page 5: Clustering Personalized Web Search Results

Progress

• Implemented two clustering algorithms– K-Medoids– Hierarchical clustering

• Presentation– Replace Google ads with clustering results– Present ranked results together with clustering results– Two presentation strategies

• Most centrally located document in each cluster• Most frequent terms in each cluster

Page 6: Clustering Personalized Web Search Results

Partial Results

• K-Medoids– Select the most centrally located documents a

s cluster center– Present the centroid documents as each clust

er’s representative– Efficiency not so good

• Other processing time: 490+100+1562=2152 ms

• Cluster search results time: 2844 ms

Page 7: Clustering Personalized Web Search Results

Partial Results (II)

• Hierarchical clustering– Merge similar documents in a pair-wise mann

er– Use weighted average term vectors to represe

nt cluster center– Present centroid term vectors as a virtual doc

uments (output Top-K terms)– Efficiency better than K-Medoids

• Other processing time: 200+110+831= 1141 ms

• Cluster search results time: 661 ms

Page 8: Clustering Personalized Web Search Results

Efficiency Analysis

• K-Medoids

– O(k(n-k)2 ) for each iteration

where n is # of documents, k is # of clusters

– Need multiple iterations for convergence

• Hierarchical clustering– O(n2 ) for each iteration– Need n-k iterations

Page 9: Clustering Personalized Web Search Results

Lessons Learned

• Clustering takes longer time as more search results accumulate (when we click “Next”)

• Top-K frequent terms in each cluster sometimes do not make sense– Combine additional information besides term

frequency

• Re-cluster each time when reranking search results– Incremental update of clustering results is desired!

Page 10: Clustering Personalized Web Search Results

Remaining

• Implementation– KMeans– MMR– Frequent word sets

• Effective presentation study– Based on user feedback– Literature survey

• Dynamic maintenance of clustering based on search result ranking and reranking– Drill down in a particular cluster– Update overall clustering organization

Page 11: Clustering Personalized Web Search Results

Feedback

• Which way to present clustering results is more meaningful?– Based on central documents– Based on term vectors– More options?

• Any other clustering algorithms to achieve effectiveness and efficiency?

• Any other presentation strategy besides “rank list + cluster center” ?

Page 12: Clustering Personalized Web Search Results
Page 13: Clustering Personalized Web Search Results
Page 14: Clustering Personalized Web Search Results
Page 15: Clustering Personalized Web Search Results