supporting exploration and serendipity in information retrieval
Post on 18-Jul-2015
117 Views
Preview:
TRANSCRIPT
Supporting Exploration and Serendipity in Information Retrieval
Nattiya Kanhabua
Department of Computer and Information Science Norwegian University of Science and Technology
24 February 2012
Nattiya Kanhabua 2 Trial lecture
• Typical search engines – Lookup-based paradigm – Known-item search
Motivation
World Wide Web
Document Index
query
results
Does this paradigm satisfy all types of information needs?
Nattiya Kanhabua 3 Trial lecture
Two tasks when searching for unknown:
1. Exploratory Search – Users perform information seeking
• E.g., collection browsing or visualization – Human-computer interaction
2. Serendipitous IR – Systems predict/suggest interesting information
• E.g., recommender systems – Asynchronous manner
Beyond the lookup-based paradigm
Nattiya Kanhabua 6 Trial lecture
• Information-seeking task [Marchionini 2006, White 2006a] – Seek for unknown, or an open-end problem – Complex information needs – No knowledge about the contents
Exploratory search
Document Index
query
results
? ?
Nattiya Kanhabua 7 Trial lecture
Exploratory search activities
G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41–46, 2006.
Nattiya Kanhabua 8 Trial lecture
Features of exploratory search
Query (re)formulation in real-time
Exploiting search context
Facet-based and metadata result filtering
Learning and understanding support
Result visualization
Nattiya Kanhabua 9 Trial lecture
• Help users to formulate information needs in an early stage [Manning 2008]
• Query suggestion
– Support by major search engines – Based on query logs analysis
• Query-by-example – Search using examples of documents
Query (re)formulation
Nattiya Kanhabua 10 Trial lecture
• Effective systems must adapt to contextual constraints [Ingwersen 2005]
– Time, place, history of interaction, task in hand, etc.
• Types of context 1. Explicitly provided feedbacks
• E.g., select relevant documents 2. Implicitly obtained user information
• E.g., mine users’ interaction behaviors [Dumais 2004, Kelly 2004]
Leveraging search context
Nattiya Kanhabua 11 Trial lecture
Facet-based result filtering
• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata
• Facet search provides an ability to: – Explore results via properties – Expand or refine the search
Nattiya Kanhabua 12 Trial lecture
Facet-based result filtering
• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata
• Facet search provides an ability to: – Explore results via properties – Expand or refine the search
• No metadata? – Categorization – Clustering
Nattiya Kanhabua 13 Trial lecture
• Provide overviews of the collection and search results – To understand and support an analysis
• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]
Result visualization
Nattiya Kanhabua 14 Trial lecture
• Provide overviews of the collection and search results – To understand and support an analysis
• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]
Result visualization
Nattiya Kanhabua 15 Trial lecture
• Provide overviews of the collection and search results – To understand and support an analysis
• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]
Result visualization
Nattiya Kanhabua 16 Trial lecture
• Provide facilities for deriving meaning from search results • Examples
– Wikify!: linking documents to encyclopedic knowledge [Mihalcea 2007]
– Learning to link with Wikipedia [Milne 2008] – Generating links to background knowledge [He 2011]
Support learning and understanding
Nattiya Kanhabua 17 Trial lecture
• Evaluation metrics for exploratory search [White 2006b]
1. Engagement and enjoyment • The degree to which users are engaged and are experiencing
2. Information novelty • The amount of new information encountered
3. Task success 4. Task time
• Time spent to reach a state of task completeness 5. Learning and cognition
• The amount of the topics covered, or and the number of insights users acquire
Evaluation of exploratory search
Nattiya Kanhabua 18 Trial lecture
• Collaborative and social search – Support of task division and knowledge sharing – Allow the team to move rapidly toward task – Provide already encountered information
Future direction
Nattiya Kanhabua
20
Trial lecture
• Serendipity [Andel 1994] – The act of encountering relevant information unexpectedly
• Task: Predict and suggest relevant information – E.g., recommender systems
Serendipitous IR
20
Nattiya Kanhabua 21 Trial lecture
• Motivation [Adomavicius 2005, Jannach 2010] – Ease information overload – Business intelligence
• Increase the number of products sold • Sale products from the long tail • Improve users’ experience
• Real-world applications
– Book: Amazon.com – Movie: Netflix, IMDb – News: Yahoo, New York Times – Video & music: YouTube, Last.fm
Recommender systems
Nattiya Kanhabua 22 Trial lecture
• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)
• Goal: – Predict the relevance score of items – Recommend k items based on the scores
Problem statements
Recommender System
Item collection
Item Score
I1 0.8
I2 0.6
I3 0.5
Non-personalized recommendation
Nattiya Kanhabua 23 Trial lecture
• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)
• Goal: – Predict the relevance score of items – Recommend k items based on the scores
Problem statements
Recommender System
Item collection
Item Score
I1 0.8
I2 0.6
I3 0.5
Non-personalized recommendation Personalized recommendation
User information
Nattiya Kanhabua 24 Trial lecture
• Two main approaches – Content-based – Collaborative filtering
Personalized recommendation
Item Score
I1 0.8
I2 0.6
I3 0.5
Recommender System
Item collection
User information
Title Genre Actor …
Product features
Content-based recommendation
Nattiya Kanhabua 25 Trial lecture
• Two main approaches – Content-based – Collaborative filtering
Personalized recommendation
Item Score
I1 0.8
I2 0.6
I3 0.5
Recommender System
Item collection
User information
Collaborative filtering recommendation
Community data
Nattiya Kanhabua 26 Trial lecture
• Basic idea
– Give me “more like this” – Exploit item descriptions (contents) and user preferences
• No rating data is needed
Content-based recommendation
Genre
Director, Writers, Stars
Nattiya Kanhabua 27 Trial lecture
• Basic idea
– Give me “more like this” – Exploit item descriptions (contents) and user preferences
• No rating data is needed • Approach
1. Represent information as bag-of-word 2. Compute the similarity between the preferences and an unseen item,
e.g., the Dice coefficient or the cosine similarity [Manning 2008]
Content-based recommendation
User profiles
Contents Title Genre Director Writer Start
The Twilight Saga: Eclipse
Adventure, Drama, Fantasy
David Slade
Melissa Rosenber, Stephenie Meyer
Kristen Stewart, Robert
Pattinson
Harry Potter and the Deathly
Hallows: Part 1
Adventure, Drama, Fantasy
David Yates
Steve Kloves, J.K. Rowling
Daniel Radcliffe, Emma Watson
Title Genre Director Writer Start
The Lord of the Rings: The Return
of the King
Action, Adventure,
Drama
Peter Jackson J.R.R. Tolkien, Fran Walsh
Elijah Wood, Viggo Mortensen
Nattiya Kanhabua 28 Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste
• Basic approach – Use a matrix of user-item ratings (explicit or implicit)
Collaborative filtering (CF)
Nattiya Kanhabua 29 Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste
• Basic approach – Use a matrix of user-item ratings (explicit or implicit)
Collaborative filtering (CF)
Implicit rating - Clicks - Page views - Time spent on a page
Nattiya Kanhabua 30 Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste
• Basic approach – Use a matrix of user-item ratings (explicit or implicit) – Predict a rating for an unseen item
Collaborative filtering (CF)
Nattiya Kanhabua 31 Trial lecture
• Given the active user and a matrix of user-item ratings • Goal: predict a rating for an unseen item by
1. Find a set of users (neighbors) with similar ratings
2. Estimate John’s rating of Item5 from neighbors’ ratings 3. Repeat for all unseen items and recommend top-N items
User-based nearest-neighbor CF
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 1 5 5 2 1
Nattiya Kanhabua 32 Trial lecture
• Measure user similarity, e.g., Pearson correlation – a, b : users – ra,p : rating of a for item p, , = users’ averaged ratings – P : set of items, rated by both a and b
Find neighbors
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 1 5 5 2 1
sim = 0.85
sim = 0.70 sim = -0.79
Nattiya Kanhabua 33 Trial lecture
• Prediction function – Combine the rating differences – Use the user similarity as a weight
Estimate a rating
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 4.87
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 1 5 5 2 1
sim = 0.85
sim = 0.70
Nattiya Kanhabua 34 Trial lecture
• Basic idea – Use the similarity between items (instead of users) – Item-item similarity can computed offline
• Example – Look for items that are similar to Item5, or neighbors – Predict the rating of Item5 using John's ratings of neighbors
Item-based nearest-neighbor CF
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 1 5 5 2 1
Nattiya Kanhabua 35 Trial lecture
• Sparse data – Users do not rate many items
• Cold start – No rating for new users or new items
• Scaling problem – Millions of users and thousands of items – m = #users and n = #items – User-based CF
• Space complexity O(m2) when pre-computed • Time complexity for computing Pearson O(m2n)
– Item-based CF • Space complexity is reduced to O(n2)
Problems of CF
Nattiya Kanhabua 36 Trial lecture
• How to solve the sparse data problem? – Ask users to rate a set of items – Use other methods in the beginning
• E.g., content-based, or non-personalized
• How to solve the scaling problem? – Apply dimensionality reduction
• E.g. matrix factorization
Possible solutions
Nattiya Kanhabua 37 Trial lecture
• Basic idea [Koren 2008] – Determine latent factors from ratings
• E.g., types of movies (drama or action) – Recommend items from the determined types
• Approach – Apply dimensionality reduction
• E.g., Singular value decomposition (SVD) [Deerwester 1990]
Matrix factorization
Nattiya Kanhabua 38 Trial lecture
• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches
• Approach 1. Pipelined hybridization
• Use content-based to fill up entries, then use CF [Melville 2002]
Hybrid recommendation
Nattiya Kanhabua 39 Trial lecture
• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches
• Approach 1. Pipelined hybridization
• Use content-based to fill up entries, then use CF [Melville 2002]
2. Parallel hybridization • Feature combination: ratings, user preferences and constraints
Hybrid recommendation
Nattiya Kanhabua 40 Trial lecture
• Temporal dynamics of recommender systems – Items has short lifetimes, i.e., dynamic set of items – User behaviors depend on moods or time periods – Attention to breaking news stories decay over time – Challenge: how to capture /model temporal dynamics?
• TimeSVD++ [Koren 2009] • Tensor factorization [Xiong 2010]
• Temporal diversity [Lathia 2010]
Future directions
Nattiya Kanhabua 41 Trial lecture
• Group recommendations [McCarthy 2006] – Recommendations for a group of users or friends – Challenge: how to model group preference?
• Context-aware recommendations [Adomavicius 2011] – Context, e.g., demographics, interests, time and place,
moods, weather, so on – Challenge: how to combine different context?
Future directions (cont’)
Nattiya Kanhabua 42 Trial lecture
1. Exploratory Search – Users perform information seeking
• E.g., collection browsing or visualization – Human-computer interaction
2. Serendipitous IR – Systems predict/suggest interesting information
• E.g., recommender systems – Asynchronous manner
Conclusions
Nattiya Kanhabua 43 Trial lecture
• [Dumais 2003] S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I’ve seen: A system for personal information retrieval and re-use. In Proceedings of SIGIR, pp. 72-79, 2003.
• [Dumais 2004] S. T. Dumais, E. Cutrell, R. Sarin and E. Horvitz. Implicit queries (IQ) for contextualized search. In Proceedings of SIGIR, p. 594, 2004.
• [Ingwersen 2005] P. Ingwersen and K. Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context. The Information Retrieval Series, Springer-Verlag, New York, 2005.
• [He 2011] J. He, M. de Rijke, M. Sevenster, R. C. van Ommering and Y. Qian. Generating links to background knowledge: a case study using narrative radiology reports. In Proceedings of CIKM, pp. 1867-1876, 2011.
• [Kelly 2004] D. Kelly, and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of SIGIR, pp. 377-384, 2004.
• [Manning 2008] C. D.Manning, P. Raghavan and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
• [Matthews 2010] M. Matthews, P. Tolchinsky, P. Mika, R. Blanco and H. Zaragoza. Searching through time in the New York Times. In HCIR Workshop, 2010.
• [Marchionini 2006] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41-46, 2006.
• [Mihalcea 2007] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of CIKM, pp. 233-242, 2007.
• [Milne 2008] D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proceedings of CIKM, pp. 509-518, 2008. • [Tunkelang 2009] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. • [Viégas 2007] F. B. Viégas, M. Wattenberg, F. van Ham, J. Kriss and M. M. McKeon. Many eyes: A site for visualization at
internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1121-1128, 2007. • [White 2006a] R. W. White, B. Kules, S. M. Drucker and m. c. schraefel. Supporting exploratory search: Introduction to
special section. Communications of the ACM, 49(4), pp. 36-39, 2006 • [White 2006b] R. W. White, G. Muresan, and G. Marchionini. Report on ACM SIGIR 2006 workshop on evaluating
exploratory search systems. SIGIR Forum, 40(2), pp. 52-60, 2006. • [White 2009] R. W. White and R. A. Roth. Exploratory Search: Beyond the Query-Response Paradigm. Morgan & Claypool
Publishers, 2009.
References
Nattiya Kanhabua 44 Trial lecture
• [Agarwal 2010] D. Agarwal and B. C.Chen. Recommender Systems Tutorial. In ACM SIGKDD, 2010. • [Adomavicius 2005] G. Adomavicius and A. Tuzhilin: Toward the Next Generation of Recommender Systems: A Survey of
the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), pp. 734-749, 2005 • [Adomavicius 2011] G. Adomavicius and A. Tuzhilin. Context-Aware Recommender Systems. In Recommender Systems
Handbook, pp. 217-253, 2011. • [Andel 1994] P. V. Andel. Anatomy of the Unsought Finding. Serendipity: Origin, history, domains, traditions, appearances,
patterns and programmability. The British Journal for the Philosophy of Science45(2), pp. 631-648, 1994. • [Balabanovic 1997] M. Balabanovic and Y. Shoham. Content-based, collaborative recommendation. Communication of
ACM 40(3), pp. 66-72, 1997. • [Deerwester 1990] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harshman. Indexing by Latent
Semantic Analysis. In JASIS 41(6), pp. 391-407, 1990. • [Jannach 2010] D. Jannach, M. Zanker, A. Felfernig and G. Friedrich. Recommender Systems: An Introduction. Cambridge
University Press, 2010[Koren 2008] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of KDD, pp. 426-434, 2008.
• [Koren 2009] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of KDD, pp. 447-456, 2009. • [Lathia 2010] N. Lathia, S. Hailes, L. Capra and X. Amatriain. Temporal Diversity in Recommender Systems. In Proceedings
of SIGIR, pp. 210-217, 2010. • [McCarthy 2006] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth and P. Nixon. Group recommender systems: a
critiquing based approach. In Proceedings of IUI, pp. 267-269, 2006. • [Melville 2002] P. Melville, R. J. Mooney and R. Nagarajan. Content-Boosted Collaborative Filtering for Improved
Recommendations. In Proceedings of AAAI, pp. 187-192, 2002. • [Xiong 2010] L. Xiong, X. Chen, T. K. Huang, J. G. Schneider and J. G. Carbonell. Temporal Collaborative Filtering with
Bayesian Probabilistic Tensor Factorization. In Proceedings of SDM, pp. 211-222, 2010.
References (con’t)
top related