search at linkedin by sriram sankar and kumaresh pattabiraman
TRANSCRIPT
![Page 1: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/1.jpg)
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Search at LinkedIn
Sriram Sankar, Principal Staff EngineerKumaresh Pattabiraman, Senior Product Manager
![Page 2: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/2.jpg)
https://www.youtube.com/watch?v=obCHKPYHuhA
2
![Page 3: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/3.jpg)
Search at LinkedIn
Personalized professional search
Part of a bigger product experience
But a really big part of it
3
![Page 4: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/4.jpg)
4
Some history . . .
![Page 5: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/5.jpg)
Approach to Search
Off the shelf components (Lucene) Extended to address Lucene limitations (Sensei,
Bobo, Zoie, Content Store) Specialized verticals (Cleo, Krati)
Stack adopted for other purposes (recommendations, newsfeed, ads, analytics, etc.)
5
![Page 6: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/6.jpg)
Lucene
An open source API that supports search functionality: Add new documents to index Delete documents from the index Construct queries Search the index using the query Score the retrieved documents
6
![Page 7: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/7.jpg)
7
The Search Index
Inverted Index: Mapping from (search) terms to list of documents (they are present in)
Forward Index: Mapping from documents to metadata about them
![Page 8: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/8.jpg)
8
BLAH BLAH BLAH Kumaresh BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH
BLAH BLAH Sriram BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH2.
1.
Kumaresh Sriram LinkedIn
2
1
Inverted Index Forward Index
![Page 9: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/9.jpg)
9
The Search Index
The lists are called posting lists Upto hundreds of millions of posting lists Upto hundreds of millions of documents Posting lists may contain as few as a single hit and
as many as tens of millions of hits Terms can be
– words in the document– inferred attributes about the document
![Page 10: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/10.jpg)
10
Lucene Queries
“Sriram Sankar” Sriram Kumaresh +Sriram +LinkedIn +Kumaresh connection:418001 +Kumaresh industry:software
connection:418001^4
![Page 11: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/11.jpg)
11
Lucene Scoring
As documents are added to the index, Lucene maintains some metadata on the terms (e.g., term position, tf/idf)
Lucene accepts scoring information via query modifications, boosts, etc.
Lucene assigns a score to each retrieved document using this information
![Page 12: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/12.jpg)
12
Sensei
Layer over Lucene that provides: Sharding Cluster management Enhanced query language
![Page 13: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/13.jpg)
13
![Page 14: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/14.jpg)
14
Sensei BQL
SELECT *FROM carsWHERE price > 2000.00USING RELEVANCE MODEL my_model (favoriteColor:"black", favoriteTag:"cool") DEFINED AS (String favoriteColor, String favoriteTag) BEGIN float boost = 1.0; if (tags.contains(favoriteTag)) boost += 0.5; if (color.equals(my_color)) boost += 1.2; return _INNER_SCORE * boost; END
![Page 15: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/15.jpg)
15
Live Updates – Zoie and Content Store
The index reader has to be reopened before earlier live updates are visible
The only way to perform a live update is to replace the entire document – which requires access to the unchanged attributes also
![Page 16: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/16.jpg)
16
Zoie
![Page 17: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/17.jpg)
17
Search Content Store
SearchContent
Store
LuceneIndex
ActivityFeeds Deletes
Inserts
![Page 18: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/18.jpg)
18
Faceting
![Page 19: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/19.jpg)
19
Bobo
![Page 20: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/20.jpg)
20
Typeahead (Instant Search)
Results as you type
Conventional wisdom: Inverted indices cannot support typeahead
Cleo, Krati
![Page 21: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/21.jpg)
21
Fast forward to last year – and growing pains . . .
![Page 22: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/22.jpg)
22
Scalability
Rebuilding index from scratch extremely difficult
Not possible to use complex algorithms during indexing
Live updates at document granularity
Inflexible scoring – both at Lucene and Sensei levels
![Page 23: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/23.jpg)
23
Fragmentation
Too many open source components glued together with primary developers spread across many companies
Different instantiations starting to diverge to deal with their specific growing pains – so diverging stacks and distracted engineers
![Page 24: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/24.jpg)
24
Our new search stack . . .Two verticals already in
production
![Page 25: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/25.jpg)
25
Life of a Query
Query Rewriter/Planner
ResultsMerging
UserQuer
y
Search
Results
Search Shard
Search Shard
![Page 26: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/26.jpg)
26
Life of a Query – Within A Search Shard
Rewritten
Query
TopResult
sFromShard
INDEX
TopResult
s
Retrieve aDocument
Score theDocument
![Page 27: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/27.jpg)
27
Life of a Query – Within A Rewriter
Query
DATAMODEL
Rewriter
State
Rewriter
Module
DATAMODEL
DATAMODEL
Rewritten
Query
Rewriter
Module
Rewriter
Module
![Page 28: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/28.jpg)
28
Life of Data - Offline
INDEX
Derived Data
Raw Data
DATAMODEL
DATAMODEL
DATAMODEL
DATAMODEL
DATAMODEL
![Page 29: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/29.jpg)
29
Benefits of New Stack
A complete search engine Frequent reindexing possible (a full reset) Resharding becomes easy Clear separation of infrastructure and relevance
functions
A single stack with a single identity!
![Page 30: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/30.jpg)
30
Early Termination
We order documents in the index based on a static rank – from most important to least important
An offline relevance algorithm assigns a static rank to each document on which the sorting is performed
This allows retrieval to be early-terminated (assuming a strong correlation between static rank and importance of result for a specific query)
Happens to work well with personalized search also
![Page 31: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/31.jpg)
31
New Strategy for Live Updates
Lucene segments are “document-partitioned” We have enhanced Lucene with “term-partitioned”
segments We use 3 term-partitioned segments:
– Base index (never changed)– Live update buffer– Snapshot index
Fault tolerant, and performant No more content store!
![Page 32: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/32.jpg)
32
Base IndexSnapshot
IndexLive Update
Buffer
![Page 33: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/33.jpg)
33
Data Distribution
Bit torrent based data distribution framework
More details at a later time
![Page 34: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/34.jpg)
34
Relevance
Offline analysis – resulting in a better index and data models
Query rewriting – for better and more accurate recall
Scoring – to fine tune each of the retrieved results
Reranking – selection of top results for overall result set quality
Blending – to combine results from multiple verticals
![Page 35: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/35.jpg)
35
Machine Learned Scorers
Goal: To automatically build a function whose arguments are interesting features of the query and the document
Input to the machine learning system is a set of training data that describes how the function should behave on various combination of feature values
The function takes the form of standard templates – a linear formula is commonly used (due to simplicity)
![Page 36: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/36.jpg)
36
Linear Regression on a Single Feature
![Page 37: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/37.jpg)
37
LinkedIn Scorer:Different Linear Models for Different Intents
Relevance models incorporate user features:
score = P (Document | Query, User)
Tree with linear regression leaves
37
X 2=0
X2=?
X2=
1
X10< 0.1234 ?
Yes
No
![Page 38: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/38.jpg)
38
Going Forward
Further standardize infrastructure for relevance components
Scatter-gather
Java GC issues
Extend infrastructure to browser/device
Reintegrate diverging stacks
![Page 39: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/39.jpg)
39
Product Overview
![Page 40: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/40.jpg)
40
LinkedIn’s Vision
“Create economic opportunity for every member of the global workforce”
![Page 41: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/41.jpg)
41
The Economic Graph
![Page 42: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/42.jpg)
42
Search is core to the economic graph vision
![Page 43: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/43.jpg)
LI as a way to get the day job
Job Seeker
Who uses search?
Casual User
LI as professional identity
43
Outbound professional(Recruiter / Sales)
LI as day job
![Page 44: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/44.jpg)
44
Casual User
Name SearchTopic Search
![Page 45: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/45.jpg)
Instant: Name Search
Search all members by name or approximate name
45
![Page 46: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/46.jpg)
Unified Search: Topic Search
One federated search result page with all relevant entities about the topic
46
![Page 47: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/47.jpg)
47
Outbound professional
Exploratory people search
![Page 48: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/48.jpg)
Instant: Search Suggestions
Entity-aware suggestions for companies, skills & titles
48
![Page 49: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/49.jpg)
Instant: Just one keystroke
From name search to exploratory search
49
![Page 50: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/50.jpg)
People Search
Explore using facets and advanced search fields
50
![Page 51: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/51.jpg)
People Search
Leverage the network through shared connections
51
![Page 52: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/52.jpg)
Recruiter & Sales Navigator
Products powered by search
52
![Page 53: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/53.jpg)
53
Job Seeker
Job Search
![Page 54: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/54.jpg)
Instant: Search Suggestions
Entity-aware suggestions for companies, skills & titles
54
![Page 55: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/55.jpg)
Job Search
Explore using facets and advanced search fields
55
![Page 56: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/56.jpg)
Job Search
Leverage the network through relationship to job poster or connections in the company
56
![Page 57: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/57.jpg)
57
Other Search Users include…
Students – University SearchInformation Seekers / Researchers - Content SearchAdvertisers / Content Marketers – Company & Group Search
![Page 58: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/58.jpg)
58
Bringing it all together
300 Million+ members
Search the economic graph of300M profiles
3B Endorsements300K jobs
3M Companies2M Groups
25K Schools100M+ pieces of professional
content
One indexOne unified search stack
Users
Product
Platform
![Page 59: Search at Linkedin by Sriram Sankar and Kumaresh Pattabiraman](https://reader035.vdocument.in/reader035/viewer/2022062216/55d4f9ecbb61eb6e1f8b4731/html5/thumbnails/59.jpg)
59