![Page 2: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/2.jpg)
Who am I?
• Assistant Professor at Institute of Informatics (UvA)
• Director of Data Science MSc program (UvA, VU, ADS)
• Before that:
– Google Research & University of Sheffield
• My background:
– Computer Science (PhD and MSc, Northeastern Univ.)
– Joint degree on Informatics & Economics (BS)
• My expertise:
– Information Retrieval, Text Mining, and Natural Language Understanding
![Page 3: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/3.jpg)
Professional Search
“… employees spend 1.8 hours every
day— 9.3 hours per week, on average—
searching and gathering information.” –
source: McKinsey
“the knowledge worker spends about
2.5 hours per day, or roughly 30% of the
workday, searching for information” –
source: IDC
![Page 4: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/4.jpg)
Web Search Engines
Great at answering simple user questions
![Page 5: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/5.jpg)
Web Search
• Find one (or a few) good web‐pages
• High redundancy in information on the web
• High redundancy in user signals• E.g. clicks on documents, query re‐writes
![Page 6: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/6.jpg)
Professional Search
• (Often) exploratory search
• Users do not know exactly what they are looking for or …
• … how to phrase their request (query)
![Page 7: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/7.jpg)
Professional Search
• Total‐recall search
• Users need to find (nearly) everything about a topic X
• Exhaustive research • X = me, my PhD topic, ebola
• Investigation• X = somebody or something or some activity
• Systematic review • X = studies measuring a particular effect
• Patent search • X = prior art
![Page 8: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/8.jpg)
Professional Search
• There is no (simple) single query
A sample MEDLINE query
1. exp vitamin A/2. vitamin A.mp3. retinol.mp4. exp dietary supplements/5. or/1-46. exp pneumonia/7. pneumonia$.mp8. exp pneumonia, bacterial/9. exp pneumonia, lipid/10. exp pneumonia, mycoplasma/...14. exp pneumonia, viral/15. exp respiratory tract infections/16. acute adj respiratory.mp17. respiratory adj infection.mp18. respiratory adj disease.mp19. or/6-1820. 5 and 19
Main Question: Is adjunctive vitamin A
effective in children diagnosed with non‐
measles pneumonia?
![Page 9: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/9.jpg)
Crawling
Pre‐processing
& Indexing
Query understanding
Logging
Quality
Freshness
Spaminess
Clicks
Profiles
Ranking
AlgorithmContent
Modern
SearchEngines
![Page 10: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/10.jpg)
Modern Search Engines
![Page 11: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/11.jpg)
Batch Learning
• Requires labeling data (query – document pairs)
• Time‐consuming, and boring
![Page 12: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/12.jpg)
Batch Learning
• Leads to a static, one‐size‐fits‐all search engine
![Page 13: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/13.jpg)
User Feedback
• Leads to a static, one‐size‐fits‐all search engine
✔
✘
✘
✘
1
2
![Page 14: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/14.jpg)
TREC Total Recall
Objective:
1. Find documents containing nearly all relevant information …
2. … while uncovering [relatively] few documents
1
![Page 15: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/15.jpg)
TREC Total Recall: Participation
results
human assessor
search algorithm
query
document
collection
![Page 16: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/16.jpg)
TREC Total Recall: Participation
• Play‐at‐home
• Data collection and queries available via internet
• Automated assessor accessed via the Internet
• Play‐in‐sandbox
• Submit virtual appliance that works isolated from internet
• Downloads corpus, topic from intranet
• “Uncover” documents one at a time via intranet
![Page 17: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/17.jpg)
TREC Total Recall
![Page 18: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/18.jpg)
User‐in‐the‐loop strategies
• Extreme relevance feedback
• Batch learning
• uncover training set; rank
• Online learning [UvA/HvA submission]
![Page 19: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/19.jpg)
Online Learning
• Learn‐as‐you‐go
• Requires user feedback (implicit or explicit)
• Serves the user and builds a training collection at the same time
search
algorithm
user
examine
document
generates
feedback
documents
query
feedback
![Page 20: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/20.jpg)
Online Learning
• Learn‐as‐you‐go
• Requires user feedback (implicit or explicit)
• Serves the user and builds a training collection at the same time
• The collection contains feedback (e.g. labels) only on items you show to the user
![Page 21: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/21.jpg)
Exploitation: Baseline Model
1. Run ad hoc search to construct a synthetic training dataset
• Unsupervised method – no training data needed
2. Train a classifier
3. Predict relevance for the remaining collection
4. Select a few highest‐scoring documents for review.
5. Review the documents, coding each as “relevant” or “not relevant.”
6. Add the documents to the training set.
![Page 22: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/22.jpg)
Exploitation: Baseline Model
![Page 23: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/23.jpg)
Exploitation: Baseline Model
![Page 24: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/24.jpg)
Exploitation: Baseline Model
![Page 25: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/25.jpg)
Exploitation: Baseline Model
uncover the
most relevant
document to
present
![Page 26: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/26.jpg)
Exploitation: Active Learning
![Page 27: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/27.jpg)
Exploitation : Active Learning
![Page 28: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/28.jpg)
Exploitation : Active Learning
![Page 29: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/29.jpg)
Exploitation : Active Learning
uncover the
most informative
document to
present
![Page 30: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/30.jpg)
Exploration: Hierarchical Clustering
![Page 31: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/31.jpg)
Exploration: Hierarchical Clustering
![Page 32: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/32.jpg)
Exploration: Hierarchical Clustering
![Page 33: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/33.jpg)
Reinforcement Learning
Balances:
1. Exploitation– uncover the most relevant document
2. Exploitation– uncover the most informative document
3. Exploration– uncover documents from different regions
![Page 34: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/34.jpg)
User Feedback
• Leads to a static, one‐size‐fits‐all search engine
✔
✘
✘
✘
1
2
![Page 35: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/35.jpg)
SessionPersonalization
![Page 36: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/36.jpg)
TREC Session
Objective:
• Improve retrieval performance for a given query by using the session prior to this
query
2
![Page 37: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/37.jpg)
TREC Session: Test Collection
![Page 38: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/38.jpg)
Query change
• Changes in the query• Adding a term• Removing a term• Keeping a term
• Correlations between Δ(query) and feedback
• Task stage• Sub‐tasks
• User stage• Struggling• Exploring• Exploiting
Travel to Beijing
Flight
tickets
Hotel
RoomMap
Conference POI
Task/Subtasks
Exploit/
Explore/
StruggleQuery Changes
![Page 39: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/39.jpg)
Dialog Systems
• Fully conversational system
• Search algorithm asking questions to the user
![Page 40: Challenges in Professional Search - Hogeschool Leiden · source: IDC. Web Search Engines Great atanswering simple user questions. Web Search •Find one(or a few) good web‐pages](https://reader033.vdocument.in/reader033/viewer/2022043022/5f3dd1bfee41c55c230d39f2/html5/thumbnails/40.jpg)
Conclusions
• Professional Search
• Exploratory
• Complex
• Recall‐oriented
• Fully conversational systems
• Receive feedback
• Documents
• Query rewrites
• Explicitly ask for feedback