humane information seeking: going beyond the ir way
DESCRIPTION
HUMANE INFORMATION SEEKING: Going beyond the IR Way. JIN YOUNG KIM @ SNU DCC. Jin Young Kim. Graduate of SNU EE / Business 5 th Year Ph.D Student in UMass Computer Science Starting as a Applied Researcher at Microsoft Bing. Today’s Agenda. A brief introduction of IR as a research area - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/1.jpg)
1
HUMANE INFORMATION SEEKING:GOING BEYOND THE IR WAY
JIN YOUNG KIM @ SNU DCC
![Page 2: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/2.jpg)
2
Jin Young Kim• Graduate of SNU EE / Business
• 5th Year Ph.D Student in UMass Computer Science
• Starting as a Applied Researcher at Microsoft Bing
![Page 3: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/3.jpg)
3
Today’s Agenda• A brief introduction of IR as a research area
• An example of how we design a retrieval model
• Other research projects and recent trends in IR
![Page 4: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/4.jpg)
4
BACKGROUNDAn Information Retrieval Primer
![Page 5: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/5.jpg)
5
Information Retrieval?• The study of how an automated system can enable its users to access, interact with, and make sense of information.
User
Query
DocumentVisit
IssueSurface
![Page 6: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/6.jpg)
6
IR Research in Context• Situated between human interface and system / analytics research• Aims at satisfying user’s information needs• Based on large-scale system infrastructure & analytics
• Need for convergence research!
Information Retrieval
Large-scale System Infra.
Large-scale (Text)Analytic
s
End-user Interface(UX / HCI / InfoViz)
![Page 7: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/7.jpg)
7
Major Problems in IR• Matching• (Keyword) Search : query – document• Personalized Search : (user+query) – document• Contextual Advertising : (user+context) – advertisement
• Quality• Authority/ Spam / Freshness• Various ways to capture them
• Relevance Scoring• Combination of matching and quality features• Evaluation is critical for optimal performance
User
Query
DocumentVisit
IssueSurface
![Page 8: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/8.jpg)
8
HUMANE INFORMATION RETRIEVALGoing Beyond the IR Way
![Page 9: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/9.jpg)
9
You need the freedom of expression.You need someone who understands.
Information seeking requires a communication.
![Page 10: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/10.jpg)
10Information Seeking circa 2012
Search engine accepts keywords only.Search engine doesn’t understand you.
![Page 11: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/11.jpg)
11Toward Humane Information Seeking
Rich User Interactions
Rich User ModelingProfileContextBehavior
SearchBrowsingFiltering
![Page 12: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/12.jpg)
from Query to SessionRich User ModelingHCIR Way: 12
Action Response
Action Response
Action Response
USER SYSTEM
InteractionHistory
Filtering / BrowsingRelevance Feedback
…
Filtering ConditionsRelated Items
…
User Model
Rich User InteractionIR Way:The
ProfileContextBehavior
HCIR = HCI + IR
![Page 13: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/13.jpg)
13The Rest of Talk…
Web Search
Personal SearchImproving search and browsing for known-item findingEvaluating interactions combining search and browsing
User modeling based on reading level and topicProviding non-intrusive recommendations for browsing
Book SearchAnalyzing interactions combining search and filtering
![Page 14: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/14.jpg)
14
PERSONAL SEARCHRetrieval And Evaluation Techniquesfor Personal Information [Thesis]
![Page 15: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/15.jpg)
15
Example: Desktop SearchExample: Search over Social Media
Ranking using Multiple Document Types for Desktop Search [SIGIR10]
Evaluating Search in Personal Social
Media Collections [WSDM12]
![Page 16: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/16.jpg)
Structured Document Retrieval: Background• Field Operator / Advanced Search Interface• User’s search terms are found in multiple fields
16
Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]
![Page 17: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/17.jpg)
17
Structured Document Retrieval: Models• Document-based Retrieval Model• Score each document as a whole
• Field-based Retrieval Model• Combine evidences from each field
q1 q2 ... qm
Document-based Scoring Field-based Scoring
f1
f2
fn
...
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
![Page 18: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/18.jpg)
18
1
1
221
2
• Field Relevance• Different field is important for different query-term
‘james’ is relevant when it occurs in
<to>
‘registration’ is relevant when it occurs
in <subject>
Improved Matching for Email SearchStructured Documents[CIKM09, ECIR09,12]
![Page 19: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/19.jpg)
19
Estimating the Field Relevance• If User Provides Feedback• Relevant document provides sufficient information
• If No Feedback is Available• Combine field-level term statistics from multiple sources
contenttitle
from/to
Relevant Docscontent
titlefrom/to
Collection content
titlefrom/to
Top-k Docs
+ ≅
![Page 20: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/20.jpg)
20
Retrieval Using the Field Relevance• Comparison with Previous Work
• Ranking in the Field Relevance Model
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
P(F1|q1)
P(F2|q1)
P(Fn|q1)
P(F1|qm)
P(F2|qm)
P(Fn|qm)
Per-term Field Weight
Per-term Field Score
sum
multiply
![Page 21: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/21.jpg)
21
• Retrieval Effectiveness (Metric: Mean Reciprocal Rank)
DQL BM25F MFLM FRM-C FRM-T FRM-RTREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4%IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4%Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6%
Evaluating the Field Relevance Model
DQL BM25F MFLM FRM-C FRM-T FRM-R40.0%
45.0%
50.0%
55.0%
60.0%
65.0%
70.0%
75.0%
80.0%
TRECIMDBMonster
Fixed Field WeightsPer-term Field Weights
![Page 22: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/22.jpg)
Evaluation Challenges for Personal Search• Evaluation of Personal Search• Each based on its own user study• No comparative evaluation was performed yet
• Solution: Simulated Collections• Crawl CS department webpages, docs and calendars• Recruit department people for user study
• Collecting User Logs• DocTrack: a human-computation search game• Probabilistic User Model: a method for user simulation
22
[CIKM09,SIGIR10,CIKM11]
![Page 23: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/23.jpg)
23
DocTrack Game
![Page 24: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/24.jpg)
24
Summary so far…• Query Modeling for Structured Documents• Using the estimated field relevance improves the retrieval• User’s feedback can help personalize the field relevance
• Evaluation Challenges in Personal Search• Simulation of the search task using game-like structures• Related work : ‘Find It If You Can’ [SIGIR11]
![Page 25: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/25.jpg)
25
WEB SEARCHCharacterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic
[WSDM12]
![Page 26: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/26.jpg)
Reading level distribution varies across major topical categories
![Page 27: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/27.jpg)
User Modeling by Reading Level and Topic• Reading Level and Topic• Reading Level: proficiency (comprehensibility)• Topic: topical areas of interests
• Profile Construction
• Profile Applications• Improving personalized search ranking• Enabling expert content recommendation
P(R|d1) P(T|d1)P(R|d1) P(T|d1)P(R|d1) P(T|d1) P(R,T|u)
![Page 28: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/28.jpg)
Profile matching can predict user’s preference over search results• Metric• % of user’s preferences predicted by profile matching• Profile matching measured in KL-Divergence of RT profiles
• Results• By the degree of focus in user profile• By the distance metric between user and website
User Group #Clicks KLR(u,s) KLT(u,s) KLRT(u,s)
↑Focused 5,960 59.23% 60.79% 65.27% 147,195 52.25% 54.20% 54.41%
↓Diverse 197,733 52.75% 53.36% 53.63%
![Page 29: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/29.jpg)
Comparing Expert vs. Non-expert URLs• Expert vs. Non-expert URLs taken from [White’09]
Higher Reading Level
Lower Topic D
iversity
![Page 30: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/30.jpg)
30
Enabling Browsing for Web Search
• SurfCanyon®
• Recommend results based on clicks
Initial results indicate that recommendations are useful for shopping
domain.
[Work-in-progress]
![Page 31: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/31.jpg)
31
BOOK SEARCHUnderstanding Book Search Behavior on the Web
[Submitted to SIGIR12]
![Page 32: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/32.jpg)
32
Understanding Book Search on the Web• OpenLibrary• User-contributed online digital library• DataSet: 8M records from web server log
![Page 33: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/33.jpg)
33
Comparison of Navigational Behavior• Users entering directly show different behaviors from users entering via web search engines
Users entering the site directly Users entering via Google
![Page 34: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/34.jpg)
34
Comparison of Search Behavior
Rich interaction reduces the query lengthsFiltering induces more interactions than search
![Page 35: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/35.jpg)
35
LOOKING ONWARD
![Page 36: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/36.jpg)
36
Where’s the Future? – Social Search• The New Bing Sidebar makes search a social activity.
![Page 37: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/37.jpg)
37
Where’s the Future? – Semantic Search• The New Google serves ‘knowledge’ as well as docs.
![Page 38: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/38.jpg)
38
Where’s the Future? – Siri-like Agent• The New Google serves ‘knowledge’ as well as docs.
![Page 39: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/39.jpg)
39
Exciting Future is Awaiting US!• Recommended Readings in IR:• http://www.cs.rmit.edu.au/swirl12
Any Questions
?
![Page 40: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/40.jpg)
Selected Publications• Structured Document Retrieval• A Probabilistic Retrieval Model for Semi-structured Data [ECIR09]
• A Field Relevance Model for Structured Document Retrieval [ECIR11]
• Personal Search• Retrieval Experiments using Pseudo-Desktop Collections [CIKM09]
• Ranking using Multiple Document Types in Desktop Search [SIGIR10]
• Building a Semantic Representation for Personal Information [CIKM10]
• Evaluating an Associative Browsing Model for Personal Info. [CIKM11]
• Evaluating Search in Personal Social Media Collections [WSDM12]
• Web / Book Search• Characterizing Web Content, User Interests, and Search Behavior by Reading
Level and Topic [WSDM12]
• Understanding Book Search Behavior on the Web [In submission to SIGIR12]
40
More at @lifidea, or
cs.umass.edu/~jykim
![Page 41: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/41.jpg)
41
My Self-tracking Efforts• Life-optimization Project (2002~2006)
• LiFiDeA Project (2011-2012)
![Page 42: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/42.jpg)
42
OPTIONAL SLIDES
![Page 43: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/43.jpg)
43
The Great Divide: IR vs. HCIIR
• Query / Document• Relevant Results• Ranking / Suggestions• Feature Engineering• Batch Evaluation (TREC)• SIGIR / CIKM / WSDM
HCI• User / System• User Value / Satisfaction• Interface / Visualization• Human-centered Design• User Study• CHI / UIST / CSCW
Can we learn from each other?
![Page 44: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/44.jpg)
44
The Great Divide: IR vs. RecSysIR
• Query / Document• Reactive (given query)• SIGIR / CIKM / WSDM
RecSys• User / Item• Proactive (push item)• RecSys / KDD / UMAP
![Page 45: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/45.jpg)
45
The Great Divide: IR in CS vs. LISIR in CS
• Focus on ranking & relevance optimization
• Batch & quantitative evaluation
• SIGIR / CIKM / WSDM
• UMass / CMU / Glasgow
IR in LIS• Focus on behavioral study & understanding
• User study & qualitative evaluation
• ASIS&T / JCDL
• UNC / Rutgers / UW
![Page 46: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/46.jpg)
46
• What
• How
Problems & Techniques in IR
Data Format (documents, records and linked data) /Size / Dynamics (static, dynamic, streaming)
User & Domain
End User (web and library)Business User (legal, medical and patent)System Component (e.g., IBM Watson)
Needs Known-item vs. Exploratory SearchRecommendation
System Indexing and Retrieval(Platforms for Big Data Handling)
Analytics Feature ExtractionRetrieval Model Tuning & Evaluation
Presentation
User InterfaceInformation Visualization
![Page 47: HUMANE INFORMATION SEEKING: Going beyond the IR Way](https://reader034.vdocument.in/reader034/viewer/2022051317/56816100550346895dd040c9/html5/thumbnails/47.jpg)
47
More about the Matching Problem• Finding Representations• Term vector vs. Term distribution• Topical category, Reading level, …
• Estimating Representations• By counting terms• Using automatic classifiers
• Calculating Matching Scores• Cosine similarity vs. KL-divergence• Combining multiple reps.
User
Query
DocumentVisit
IssueSurface