fast, lenient, and accurate – building personalized instant search experience at linkedin

36
Fast, Lenient, and Accurate Building Personalized Instant Search Experience at LinkedIn Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti Sinha LinkedIn

Upload: abhimanyu-lad

Post on 15-Apr-2017

681 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Fast, Lenient, and AccurateBuilding Personalized Instant Search Experience at LinkedIn

Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti SinhaLinkedIn

Page 2: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Agenda

● LinkedIn● LinkedIn Search

○ Navigational vs Exploratory searches○ Typeahead vs SERP

● Big picture and problem statement● Instant search – Search-as-you-type

○ Query autocomplete○ Entity-aware suggestions○ Instant results

● Conclusions & Future work

Page 3: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Professional Identity

Page 4: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Professional Graph

Page 5: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Jobs

Page 6: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – And much more...

Companies

Skills

Professional Content

Page 7: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – Massive Scale

Page 8: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search

Page 9: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Navigational Search

Looking for someone specific by name.

Query has a single correct result.

Page 10: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Exploratory SearchFinding people that match a given set of criteria.

Multiple results match the user’s query.

Page 11: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Search – Search-as-you-typeSatisfy navigational searches: Show instant search results.

Help frame exploratory searches: Complete the user’s query and show search suggestions.

Page 12: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Big PicturePartial query

Instant results Autocomplete

Search suggestions

Query tagger

Full-text search

Search results

Manuallyenteredquery

Page 13: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Big PicturePartial query

Instant results Autocomplete

Search suggestions

Query tagger

Full-text search

Search results

Manuallyenteredquery

Focus today:● Autocomplete● Search suggestions● Instant results

Page 14: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Problem StatementPartial query

Instant results Autocomplete

Search suggestions

Query tagger

Full-text search

Search results

Manuallyenteredquery

Focus today:● Autocomplete● Search suggestions● Instant results

How can we build an instant search experience that scales to 450+ million members, and is fast, lenient, and accurate?

● Instant search = Query autocomplete + search suggestions + instant results● Fast = Search-as-you-type latencies● Lenient = Handle spelling errors and common variations● Accurate = Highly relevant and personalized results

Page 15: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Tagging

PERSON

TITLE(ID=126)

COMPANY(ID=1337)

Entity types identified: Person name, job title, company, school, skills, locations.

Key part of query processing!Impacts: autocomplete, spelling correction, search suggestions,query rewriting, ranking.

Sequential prediction model(CRF – Conditional Random Fields)

Training data:● Standardized dictionaries (people names,

companies, schools, titles, skills, locations)● Query logs● Clickthrough (CTR) data● Crowdsourced labels

Page 16: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete

● Fast● Relevant and contextual● Resilient to spelling errors

Page 17: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete – Offline processing

linkedin software engineersoftware engineerbig datadata scientistdata engineerexpert systems..

[linkedin] [software engineer]

Query logs Entities Index

FST – Finite State Transducers

Compact + fast retrieval + fuzzy match (via Levenstein Automata)

Page 18: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete – Online processingTwo step process:

1. Retrieval (Candidate generation)

User’s query: [big data e]

Candidates = C(big data e) U C(data e) U C(e)= big data engineer, big data expert systems, big data entry, ...

linkedin software engineersoftware engineerbig datadata scientistdata engineerexpert systems..

Query logs

Page 19: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Autocomplete – Online processingTwo step process:

2. Scoring (Ranking)

User’s query: [big data e]Candidate completions: “big data engineer”, “big data expert”, “big data entry”

Score(“big data engineer”):

P(s1, s2, s3…) ≈ P(s1)·P(s2|s1)·P(s3|s2).. // Bigram language model

Use entities : P([engineer] | [big data])Fall back to words : P(engineer | data)·P(data | big)

Page 20: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Query Suggestions – Autocomplete + query tagger

“linke” ⇒ “Linkedin” ⇒ COMPANY

“had” ⇒ “Hadoop” ⇒ SKILL

Page 21: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results

● Fast retrieval over 450+ million members● Highly personalized● Balance personalization & popularity● Resilient to spelling variations

Page 22: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Indexing

NAME: richardPREFIX: r, ri, ric, rich, richa, ...NAME: bransonPREFIX: b, br, bra, bran, brans, ...

● Inverted Index (Maps token to list of docs that contain that token):NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard”PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri”…

● Retrieval approachUser’s query – richard bRewritten query – +NAME:richard +PREFIX:b

● Prefix-based tokenization:

DOCID 4

(posting lists)

Page 23: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Indexing

CONN: 1, 10, 15

● Inverted IndexCONN:4 => [1, 10, 15] // Everyone connected to Richard BransonCONN:1 => [4, ...]CONN:10 => [4, ...]...

● Retrieval approachUser’s query – richard bRewritten query – +NAME:richard +PREFIX:b +CONN:1

(Everyone named richard b… and connected to User:1)

● Connections Index:

DOCID 4

Page 24: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Indexing

Early Termination

Problem: A query like [PREFIX:ri] might retrieve too many candidate documents.

How can we retrieve the most promising documents first so that we don’t need to score all of them?

Static Rank: Order documents based on their prior (query independent) likelihood of relevance:

A combination of:● Profile views● Spam and security related scores● Editorial rules (Celebrities, influencers, …)

numToScore: The number of documents to retrieve and score for any query

Page 25: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Balancing Popularity and Personalization

Query: richard b…

Are you looking for Richard Branson, or a colleague name Richard Burton?

(Assume searcher’s ID = 1)

Rewritten Query:

● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections.

● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections

Instant Results – Retrieval

Custom search operator: “Weighted OR”

Page 26: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Spelling Variations

weiner ⇔ wiener

catherine ⇔ kathryn

dipak ⇔ deepak

Page 27: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Name Clusters

Offline process to cluster together similar sounding or similarly spelt names.

Two step process:

1. Coarse clustering (optimized for broad coverage)Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f)Combination of edit distance & double metaphone (sound)E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff)

2. Fine-grained clustering (optimized for precision)Split up clusters based on more sophisticated rulesPosition and character-aware edit distanceQuery reformulation data (q1 → q2 → click)E.g. (jeff ≠ joff)

Instant Results – Spelling Variations

Page 28: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Instant Results – Spelling Variations

NAME: kathrynCLUSTER: katharine

Potential queries:katherinekathrynkatharinecatharine

Rewritten queries:?NAME:katherine ?CLUSTER:katharine?NAME:kathryn ?CLUSTER:katharine?NAME:katharine ?CLUSTER:katharine?NAME:catharine ?CLUSTER:katharine

Either match original query term or match the name cluster

Query time

Indexing time

Page 29: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Clicked result treated as positive.

All other shown results treated as negative.

Since this is navigational search, we assume there’s only 1 correct result => low presentation bias.

Learning to Rank (Machine-learned ranking)

Training data● Click data from previous typeahead sessions● <searcher, query, doc> ⇒ positive/negative

Features / signals● Textual match against various fields● Network distance, number of shared connections● Global popularity● Compound features

Instant Results – Scoring

+

Page 30: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Conclusions● Instant search experience

○ Directly satisfy navigational search uses in typeahead via Instant Results

○ Help the user frame exploratory search queries via Query Autocomplete & Search

Suggestions

● Combination of techniques○ Query tagger for entity extraction – “Things not Strings”○ FST-based query completion○ Inverted index-based instant results + Early termination + Weighted OR○ Name clusters for fuzzy name matching

Page 31: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Future Work● Personalized query completions

○ m ⇒ machine learning

○ m ⇒ machinist

● Multi-entity query suggestions○ Now : [linkedin] ⇒ “Find people who work at LinkedIn”

○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn”

● Better blending○ Autocomplete + query suggestions + instant results○ Query features – what does the query mean?○ Results features – what results come back from each system?

Page 32: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Thank You!

Page 33: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn – The Economic Graph

Page 34: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search – SERP (Jobs)

Page 35: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search – Typeahead

Page 36: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

LinkedIn Search – SERP