named entity recognition in query jiafeng guo 1, gu xu 2, xueqi cheng 1,hang li 2 1 institute of...
TRANSCRIPT
![Page 1: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/1.jpg)
Named Entity Recognition in Query
Jiafeng Guo1, Gu Xu2, Xueqi Cheng1,Hang Li2
1Institute of Computing Technology, CAS, China2Microsoft Research Asia, China
![Page 2: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/2.jpg)
Outline
• Problem Definition• Potential Applications• Challenges• Our Approach• Experimental Results• Summary
![Page 3: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/3.jpg)
Outline
• Problem Definition• Potential Applications• Challenges• Our Approach• Experimental Results• Summary
![Page 4: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/4.jpg)
Problem Definition
Named Entity Recognition in Query (NERQ)Identify Named Entities in Query and Assign them into Predefined Categories with Probabilities
Harry Potter
Harry Potter WalkthroughMovie Book Game
Movie Book Game
0.50.4
0.1
0.0 1.00.0
![Page 5: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/5.jpg)
Outline
• Problem Definition• Potential Applications• Challenges• Our Approach• Experimental Results• Summary
![Page 6: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/6.jpg)
NERQ in Searching Structured Data
GamesBooks
Unstructured Queries
Structured Databases (Instant Answers, Local Search Index, Advertisements and etc)
NERQ Module
Smarter DispatchThis query prefers the results from the “Games” databaseBetter Ranking “harry potter” should be used as key to match the records in the database, and further ranked by “walkthrough”
harry potter walkthrough
Movies
![Page 7: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/7.jpg)
NERQ in Web Search
Search results can be better if we know that “21 movie” indicates searcher wants the movie named 21
![Page 8: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/8.jpg)
Outline
• Problem Definition• Potential Applications• Challenges• Our Approach• Experimental Results• Summary
![Page 9: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/9.jpg)
Challenges
• NER (Named Entity Recognition) – Well formed documents (e.g. news articles)– Usually a supervised learning method based on a set of
features• Context Feature: whether “Mr.” occurs before the word• Content Feature: whether the first letter of words is capitalized
• NERQ– Queries are short (2-3 words on average)
• Less context features
– Queries are not well-formed (typos, lower cased, …)• Less content features
![Page 10: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/10.jpg)
Outline
• Problem Definition• Motivation and Potential Applications• Challenges• Our Approach• Experimental Results• Summary
![Page 11: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/11.jpg)
Our Approach to NERQ
• Goal of NERQ becomes to find the best triple (e, t, c)* for query q satisfying
Harry Potter Walkthrough
“Harry Potter” (Named Entity) + “# Walkthrough” (Context) te“Game” Class c
ctpecpep
ctep
qGcte
qGcte
)(),,(
)(),,(
maxarg
,,maxarg *c) t,(e,
q
![Page 12: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/12.jpg)
Training With Topic Model
• Ideal Training Data T = {(ei, ti, ci)}
• Real Training Data T = {(ei, ti, *)}
– Queries are ambiguous (harry potter, harry potter review)
– Training data are a relatively few
i iii ctep ,,max
e eei c i
i c iiii c ii
ictpecpep
ctpecpepctep
max
max,,max
![Page 13: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/13.jpg)
Training With Topic Model (cont.)
e eei c ii
ctpecpepmax
harry potterkung fu pandairon man…………………………………………………………………………………………………………
# wallpapers# movies# walkthrough# book price……………………………………………………………………………………
# is a placeholder for name entity. # means “harry potter” here
Movie
Game
Book……………………
Topics
e t c
![Page 14: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/14.jpg)
Weakly Supervised Topic Model
• Introducing Supervisions– Supervisions are always better– Alignment between Implicit Topics and Explicit Classes
• Weak Supervisions– Label named entities rather than queries (doc. class labels)– Multiple class labels (Binary Indicator)
Kung Fu Panda
Movie Game Book
??
Distribution Over Classes
![Page 15: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/15.jpg)
Weakly Supervised LDA (WS-LDA)
• LDA + Soft Constraints (w.r.t. Supervisions)
• Soft Constraints
,,log, yCwpyw LDA Probability Soft Constraints
ii iz y y C ,
Document Probability on the i-th Class
Document Binary Label on the i-th Class
iz
1 1 0 0 iyTopic
![Page 16: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/16.jpg)
System Flow ChatOnline Offline
Set of named entities with labels
Create a “context” document for each seed and train WS-LDA
Contexts
epecp ,
Find new named entities by using obtained contexts and estimate p(c|e) using WS-LDA and p(e)
Entities
ctp
Input Query
Evaluate each possible triple (e, t, c)
Results
![Page 17: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/17.jpg)
Outline
• Problem Definition• Motivation and Potential Applications• Challenges• Our Approach• Experimental Results• Summary
![Page 18: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/18.jpg)
Experimental Results
• Data Set– Query log data• Over 6 billion queries and 930 million unique queries• About 12 million unique queries
– Seed named entities• 180 named entities labeled with four classes• 120 named entities are for training and 60 for testing
![Page 19: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/19.jpg)
Experimental Results (cont.)
• NERQ Precision
![Page 20: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/20.jpg)
Experimental Results (cont.)
• Named Entity Retrieval and Ranking– class distribution
• Aggregation of seed context distributions (Pasca, WWW07)• p(t |c) from WS-LDA model
– q(t |e) as entity distribution – Jensen-Shannon similarity between p(t |c) and q(t |e)
![Page 21: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/21.jpg)
Experimental Results (cont.)
• Comparison with LDA– Class Likelihood of e:
K
iii ecpy
1
![Page 22: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/22.jpg)
Outline
• Problem Definition• Motivation and Potential Applications• Challenges• Our Approach• Experimental Results• Summary
![Page 23: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/23.jpg)
Summary
• We first proposed the problem of named entity recognition in query.
• We formulized the problem into a probabilistic problem that can be solved by topic model.
• We devised weakly supervised LDA to incorporate human supervisions into training.
• The experimental results indicate that the proposed approach can accurately perform NERQ, and outperforms other baseline methods.
![Page 24: Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research](https://reader036.vdocument.in/reader036/viewer/2022062423/5697bfeb1a28abf838cb7c3d/html5/thumbnails/24.jpg)
THANKS!