query understanding at linkedin [talk at facebook]

Query Understandingand

Search Assistance@ LinkedIn

Abhi Lad(Engineering Lead, Search Quality)

Outline

● Search at LinkedIn

● Goal of search

● Search assistance / Guided search

● Query understanding & rewriting

Search at LinkedIn

Universal search box

Search at LinkedInNavigational People search

Search at LinkedIn

Exploratory People search

FACETS

Search at LinkedIn

Exploratory People search

Search at LinkedIn

Job Search

Search at LinkedIn

Federated Search

PEOPLE

Goal of Search

Help users find who or what they are looking for

with minimal effort

Goal of search

Help users find who or what they are looking for

with minimal effort

1. Help users frame “good” queries

2. Understand the user’s underlying intent / information need

3. Rewrite the query to ensure good result set

4. Rank the results based on the user and the query

5. Provide good result attribution: snippets, highlighting

6. Propose next actions to refine results

Goal of search

Search Assistance

● Query Assistance: [Pre-retrieval] Help users frame their queries easily○ Autocomplete, Search suggestions in typeahead, Spellcheck, ...

● Guided Search: [Post-retrieval] Guide users through their search process○ Facet suggestions, Related searches, ...

Search Assistance

(Especially useful for exploratory queries)

Autocomplete & Search Suggestions

Query autocomplete

Search suggestions

Query autocomplete => Entity detection => Search suggestions

Autocomplete system:

● Based on query logs● Index and retrieve using Lucene FST● Can complete last part of the query (even if entire query was previously unseen)

(Do not index people names)

Autocomplete & Search SuggestionsAutocomplete

Use query logs to index unigrams (tokens), bigrams, and entities (companies, titles, skills, locations)

● Compute co-occurrence statistics● Build FST for efficient “prefix => entity” retrieval

Query: [senior digital product manager sa|n francisco]

Score based on entity co-occurrence using last entity in the query (product manager):

● P(san francisco | product manager)● P(san diego | product manager)● P(sandisk | product manager)

Fall back to bigram co-occurrence:

● P(francisco | san) x P(san | manager)

Autocomplete & Search SuggestionsAutocomplete

● Personalization○ [ma]

■ machinist■ manager■ machine learning?

● Implicit spelling correction○ [macine lear] => machine learning

● Use similar entities to complete previously unseen queries○ [software engineer] ⇔ [software developer]○ Complete [hadoop software de|veloper] based on [hadoop software engineer]

Autocomplete & Search SuggestionsSearch Suggestions

● Personalization

○ [hadoop]

■ “People with hadoop skills”

■ “Jobs requiring hadoop skills”

● Suggestions with multiple entities

○ [hadoop engineer san francisco]

■ “Hadoop engineer jobs in San Francisco]

Spellcheck

● Fix obvious typos

● Help users spell names

Spellcheck

People namesCompanies

Titles

Past queries

Spellcheck

PROBLEM: User profiles as well as query logs contain many spelling errors

(Frequency alone is not helpful due to the long-tail distribution of entities)

Spellcheck

PROBLEM: User profiles as well as query logs contain many spelling errors

SOLUTION: Use query chains and click data to infer correct spelling

Spellcheck

● Better error model○ Improved metaphone (version 3)○ Platform aware: Keyboard edit distance on mobile

● Machine-learned model

● Support for partial queries○ Spellcheck-as-you-type for “Instant” search

Facet Suggestions

● Query awareness○ For TITLE queries, suggest seniority facet○ Don’t suggest facets for name queries○ Don’t suggest redundant/conflicting facets (location facet when query has location)

● User awareness○ User profile: Users often restrict search results to their own location, industry, seniority○ User behavior: Recruiters often restrict to particular industry, location

● Document set awareness○ Ensure minimum number of results○ Bias towards higher-quality results (people, jobs, …)

Query Understandingand

Rewriting

Query Understanding

Query Tagging

(Recognized entities: Names, titles, companies, schools, locations, skills)

Query TaggerSequential model trained on the following data:

● Emission probabilities (dictionary)○ Profiles – Names, Titles, Schools, Locations○ Standardized data – Companies, Skills

● Transition probabilities○ Query logs○ Tags for query tokens inferred based on result clicks

Query TaggerPrediction:

1. Segmentation: Maximum likelihood using unigram/bigram counts[data scientist] [linkedin] [mountain view]

2. Sequence labeling: Viterbi decoding[TITLE] [COMPANY] [LOCATION]

3. Entity linking: Dictionary[TITLE ID=435] [COMPANY ID=1337] [LOCATION ID=us:ca:mountain_view]

Query Tagging

● Query tags used for ranking model selection○ Name query => NAME MODEL○ Title query, Skill query => TITLE MODEL○ ...

● More precise matching with documents

[software engineer google new york]

is rewritten to

[TITLE:(software engineer) COMPANY:(google) GEO:(new york)]

Using query tags:

Entity-based filtering

BEFORE

escapehatch

Query Expansion

Name synonyms Job Title synonyms

Query Expansion

● Titles○ Query reformulations

■ [programmer] => [software engineer] => CLICK■ [lawyer] => [attorney] => CLICK■ [attorney] => [legal counsel] => CLICK

● Names○ Query Reformulations○ Dictionaries

■ bob == robert■ beth == elizabeth■ ...

Name spelling variantsName Clustering

Name spelling variants

Two-step clustering:1. Coarse clustering – metaphone2. Finer clustering – edit distance, hand-written rules…

Each name is assigned to a clusterNC_SRIRAM = {sriram, sreeram, sriraam, shriram, …}

NC_SRIRAM

Name Clustering

Summary

● Search assistance and guided search are critical for ensuring search success○ Good query => good results

● High degree of structure in queries and documents (profiles, jobs, …)○ Query understanding and Document understanding are crucial○ “Things not Strings” => entity-based retrieval

● Query understanding and rewriting play an important role in result set quality○ A good initial set of documents simplifies the ranker’s job○ Good result set => accurate facet counts○ Allows for sorting options other than relevance (recency, number of connections, …)

Thank You!

query understanding at linkedin [talk at facebook]

Documents

from query-by-keyword to query-by-example: linkedin...

my transformation story at linkedin

magnetic - query categorization at scale

the "big data" ecosystem at linkedin

data storage infra at linkedin

swift at linkedin

apache kafka at linkedin

yarn at linkedin

jruby at linkedin

linkedin for companies - presentation by linkedin sweden at...

data infrastructure at linkedin

linkedin platform at leweb 2010

linkedin at pcc may17

data analyst at linkedin

from hi to hired - linkedin training - win at linkedin ·...

mindfulness at work linkedin

content ingestion at linkedin

kafka quotas talk at linkedin

linkedin class at nasscom

change colour at query view