Download - Search Quality at LinkedIn
![Page 1: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/1.jpg)
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Abhimanyu LadSatya KanduriSenior Software Engineer Senior Software Engineer 1
Abhi Satya
Search Quality at LinkedIn
![Page 2: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/2.jpg)
2
tag: skill OR titlerelated skills: search, ranking, …
tag: companyid: 1337industry: internet
verticals:people, jobs
intent: exploratory
![Page 3: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/3.jpg)
3
SEARCH USE CASES
How do people use LinkedIn’s search?
![Page 4: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/4.jpg)
4
PEOPLE SEARCH
Search for people by name
![Page 5: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/5.jpg)
5
PEOPLE SEARCH
Search for people by other attributes
![Page 6: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/6.jpg)
6
EXPLORATORY PEOPLE SEARCH
![Page 7: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/7.jpg)
7
JOB SEARCH
![Page 8: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/8.jpg)
8
COMPANY SEARCH
![Page 9: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/9.jpg)
9
AND MUCH MORE…
![Page 10: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/10.jpg)
10
OUR GOAL
Universal Search– Single search box
High Recall– Spelling correction, synonym expansion, …
High Precision– Entity-oriented search: match things, not strings
![Page 11: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/11.jpg)
11
QUERY UNDERSTANDINGPIPELINE
![Page 12: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/12.jpg)
12
QUERY UNDERSTANDING PIPELINE
Spellcheck
Query Tagging
Vertical Intent Prediction
Query Expansion
Raw query
Structured query+
Annotations
![Page 13: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/13.jpg)
13
QUERY UNDERSTANDING PIPELINE
Spellcheck
Query Tagging
Vertical Intent Prediction
Query Expansion
Raw query
Structured query+
Annotations
![Page 14: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/14.jpg)
14
SPELLING CORRECTION
Fix obvious typos
Help users spell names
![Page 15: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/15.jpg)
15
SPELLING OUT THE DETAILS
PEOPLE NAMESCOMPANIES
TITLES
PAST QUERIES
N-gramsmarissa => ma ar ri is ss sa
Metaphonemark/marc => MRK
Co-occurrence countsmarissa:mayer = 1000
marisa meyer yahoo
marissa
marisa
meyer
mayer
yahoo
![Page 16: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/16.jpg)
16
SPELLING OUT THE DETAILS
PROBLEM: Corpus as well as query logs contain many spelling errors
Certain spelling errors are quite frequent
While genuine words (especially names) might be infrequent
![Page 17: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/17.jpg)
17
SPELLING OUT THE DETAILS
PROBLEM: Corpus as well as query logs contain many spelling errors
SOLUTION: Use query chains to infer correct spelling
[product manger] [product manager] CLICK
[marissa mayer] CLICK
![Page 18: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/18.jpg)
18
QUERY UNDERSTANDING PIPELINE
Spellcheck
Query Tagging
Vertical Intent Prediction
Query Expansion
Raw query
Structured query+
Annotations
![Page 19: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/19.jpg)
19
QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY
TITLE CO GEO
TITLE-237software engineersoftware developer
programmer…
CO-1441Google Inc.
Industry: Internet
GEO-7583Country: US
Lat: 42.3482 NLong: 75.1890 W
(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
![Page 20: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/20.jpg)
20
QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY
TITLE CO GEO
MORE PRECISE MATCHING WITH DOCUMENTS
![Page 21: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/21.jpg)
21
ENTITY-BASED FILTERING
BEFORE
![Page 22: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/22.jpg)
22
AFTER
ENTITY-BASED FILTERING
BEFORE
![Page 23: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/23.jpg)
23
BEFORE
ENTITY-BASED FILTERING
![Page 24: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/24.jpg)
24
AFTER
ENTITY-BASED FILTERING
BEFORE
![Page 25: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/25.jpg)
25
ENTITY-BASED SUGGESTIONS
![Page 26: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/26.jpg)
26
ENTITY-BASED SUGGESTIONS
![Page 27: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/27.jpg)
27
QUERY TAGGING : SEQUENTIAL MODEL
EMISSION PROBABILITIES(Learned from user profiles)
TRANSITION PROBABILITIES(Learned from query logs)
TRAINING
![Page 28: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/28.jpg)
28
QUERY TAGGING : SEQUENTIAL MODEL
INFERENCE
Given a query, find the most likely sequence of tags
![Page 29: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/29.jpg)
29
QUERY UNDERSTANDING PIPELINE
Spellcheck
Query Tagging
Vertical Intent Prediction
Query Expansion
Raw query
Structured query+
Annotations
![Page 30: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/30.jpg)
30
VERTICAL INTENT PREDICTION
JOBS
PEOPLE
COMPANIES
(Probability distribution over verticals)
![Page 31: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/31.jpg)
31
VERTICAL INTENT PREDICTION : SIGNALS
[Company]
1. Past query counts in each vertical + Query tags
2. Personalization: User’s search history
[Employees]
[Jobs]
[Name Search]
(TAG:COMPANY) (TAG:NAME)
![Page 32: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/32.jpg)
32
QUERY UNDERSTANDING PIPELINE
Spellcheck
Query Tagging
Vertical Intent Prediction
Query Expansion
Raw query
Structured query+
Annotations
![Page 33: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/33.jpg)
33
QUERY EXPANSION
GOAL: Improve recall through synonym expansion
![Page 34: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/34.jpg)
34
QUERY EXPANSION : NAME SYNONYMS
![Page 35: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/35.jpg)
35
QUERY EXPANSION : JOB TITLE SYNONYMS
![Page 36: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/36.jpg)
36
QUERY EXPANSION : SIGNALS
[jon] [jonathan] CLICK
Trained using query chains:
[programmer] [developer] CLICK
Symmetric but not transitive!
[francis] ⇔ [frank][franklin] ⇔ [frank]
[francis] ≠ [franklin]
[software engineer] [software developer] CLICK
Context based!
[software engineer] => [software developer]
[civil engineer] ≠ [civil developer]
![Page 37: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/37.jpg)
37
QUERY UNDERSTANDING PIPELINE
Spellcheck
Query Tagging
Vertical Intent Prediction
Query Expansion
Raw query
Structured query+
Annotations
![Page 38: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/38.jpg)
38
QUERY UNDERSTANDING: SUMMARY
High degree of structure in queries as well as corpus(user profiles, job postings, companies, …)
Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search
Query tagging and query log analysis play a big role in query understanding
![Page 39: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/39.jpg)
39
ranking
![Page 40: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/40.jpg)
WHAT’S IN A NAME QUERY?
![Page 41: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/41.jpg)
kevin scott
≠
BUT NAMES CAN BE AMBIGUOUS
![Page 42: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/42.jpg)
SEARCHING FOR A COMPANY’S EMPLOYEES
![Page 43: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/43.jpg)
SEARCHING FOR PEOPLE WITH A SKILL
![Page 44: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/44.jpg)
RANKING IS COMPLICATED
Seemingly similar queries require dissimilar scoring functions
Personalization matters– Multiple dimensions to personalize on– Dimensions vary with query class
![Page 45: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/45.jpg)
Model
![Page 46: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/46.jpg)
TRAINING
Documents for training
Features
Human evaluation
Labels
Machine learning model
![Page 47: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/47.jpg)
TRAINING
Documents for training
Features
Human evaluation
Labels
Machine learning model
![Page 48: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/48.jpg)
ASSESSING RELEVANCE
![Page 49: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/49.jpg)
RELEVANCE DEPENDS ON WHO’S SEARCHING
What if the searcher is a job seeker?
Or a recruiter?
Or…
![Page 50: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/50.jpg)
THE QUERY IS NOT ENOUGH
![Page 51: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/51.jpg)
WE NEED USER FEATURES
Non-personalized relevance model:score = f(Document | Query)
Personalized relevance model:score = f(Document | Query, User)
![Page 52: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/52.jpg)
COLLECTING RELEVANCE JUDGMENTS WON’T SCALE
![Page 53: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/53.jpg)
TRAINING
Documents for training
Features
Human evaluation
Search logs
Labels
Machine learning model
![Page 54: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/54.jpg)
CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant
![Page 55: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/55.jpg)
CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant
![Page 56: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/56.jpg)
CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Not-Clicked = Not Relevant
![Page 57: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/57.jpg)
CLICKS AS TRAINING DATA
Unfairly penalized?
Good results not seen are marked Not Relevant.
Approach: Clicked = Relevant, Not-Clicked = Not Relevant
User eye scan direction
![Page 58: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/58.jpg)
CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant
• Only penalize results that the user has seen but ignored
![Page 59: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/59.jpg)
CLICKS AS TRAINING DATAApproach: Clicked = Relevant, Skipped = Not Relevant
• Only penalize results that the user has seen but ignored• Risks inverting model by overweighing low-ranked results
![Page 60: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/60.jpg)
FAIR PAIRS
[Radlinski and Joachims, AAAI’06]
• Fair Pairs: • Randomize, Clicked= R,
Skipped= NR
![Page 61: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/61.jpg)
FAIR PAIRS
Flipped
[Radlinski and Joachims, AAAI’06]
• Fair Pairs: • Randomize, Clicked= R,
Skipped= NR
![Page 62: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/62.jpg)
FAIR PAIRS
Flipped
[Radlinski and Joachims, AAAI’06]
• Fair Pairs: • Randomize, Clicked= R,
Skipped= NR• Great at dealing with position bias• Does not invert models
![Page 63: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/63.jpg)
EASY NEGATIVES
Page 1
Page 99
• Assumption: A decent current model would push out bad results to the very end.
• Easy Negatives: Some of the results at the end are picked up as negative examples
![Page 64: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/64.jpg)
EASY NEGATIVES
• Use strategies that sample across the feature space• Searches with less results preferred• Always sample from a given page, say page 10
2 pages90+ pages
![Page 65: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/65.jpg)
PUTTING IT ALL TOGETHER
Human evaluation is not practical for personalized searches
Learn from user behavior– Multiple heuristics depending on the need– Different pros and cons
![Page 66: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/66.jpg)
66
EFFICIENCY VS EXPRESSIVENESS Build tree with logistic regression leaves. By restricting decision nodes to (Query, User) segments,
only one regression model can be evaluated for each document.
X 2=0
X2=?
X2=1
X4?
X 4=0
X4=1
![Page 67: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/67.jpg)
SCORING
New document
Features
Machine
learning model
scoreNew
document
Features
Machine
learning model
scoreNew
document
Features
Machine learning model
score
Ordered listOrdered
listOrdered list
![Page 68: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/68.jpg)
68
A SIMPLIFIED EXAMPLE
Yes
Name Query?
No
Skill Query?
Yes
No
![Page 69: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/69.jpg)
69
TEST, TEST, TEST
a
b
c
d
g
h
b
e
a
f
g
h
Model 1 Model 2
a
b
c
e
d
f
Interleaved
[Radlinski et al., CIKM 2008]
Interleaving
![Page 70: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/70.jpg)
SUMMARY
Query understanding leverages the rich structure of LinkedIn’s content and information needs.
Query tagging and rewriting allows us to deliver precision and recall.
For ranking, personalization is both the biggest challenge and the core of our solution.
Segmenting relevance models by query type helps us efficiently address the diversity of search needs.
![Page 71: Search Quality at LinkedIn](https://reader033.vdocument.in/reader033/viewer/2022061105/540dee998d7f728d7e8b4b72/html5/thumbnails/71.jpg)
71
Abhimanyu Lad Satya [email protected] [email protected]://linkedin.com/in/abhilad https://linkedin.com/in/skanduri