![Page 1: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/1.jpg)
Named Entity Recognition at Scale with Deep Learning
Sijun He @SijunHe#TwitterCortex at #ODSCWest
1
![Page 2: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/2.jpg)
Introduction
Sijun He@SijunHe
ML Engineer IITwitter Cortex
2
![Page 3: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/3.jpg)
3
12345
NER on TweetsDataModelConfidence EstimationSystem Overview
Agenda
![Page 4: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/4.jpg)
4
12345
NER on TweetsDataModelConfidence EstimationSystem Overview
Agenda
![Page 5: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/5.jpg)
Named Entity Recognition (NER) on Tweets
PersonLocationOrganizationProductOther
5
![Page 6: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/6.jpg)
Application of NER: Trends
6
![Page 7: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/7.jpg)
Application of NER: Events Detection
7[Fedoryszak et al., 2019]
![Page 8: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/8.jpg)
Application of NER: User Interest
Last Engagements
Twitter (9), US (9), China (7), HK (7), Google (3),
Linkedin (3), Stanford CoreNLP (2), Jeremy Lin (2)
Manchester United (1)
PersonLocationOrganizationProductOther
8
![Page 9: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/9.jpg)
Why in-house NER?
● Strategic: Gauge of information extraction and content understanding at Twitter
● Unique linguistic feature of tweets○ Limited context due to brevity○ Abbreviation ○ Typos ○ Informal language○ Temporality ○ ...
● Cost of 3rd party Cloud API at production volume
9
![Page 10: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/10.jpg)
Example of NER on Tweet
Google Natural Language API
Our Model
SpaCy (Open-source)
10
![Page 11: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/11.jpg)
11
12345
NER on TweetsDataModelConfidence EstimationSystem Overview
Agenda
![Page 12: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/12.jpg)
Generating Training Data
Data Cleaning
● Process character labels into token labels to train NER model
● Regular removal of deleted tweets (GDPR)
Sampling
● Stratified sampling based on tweet engagement
● Long period of time to capture temporal signal
Labeling
● Character-based Labeling on crowdsourced labeling platform○ Person○ Location○ Organization○ Product○ Other
12
![Page 13: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/13.jpg)
13
12345
NER on TweetsDataModelConfidence EstimationSystem Overview
Agenda
![Page 14: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/14.jpg)
NER Model Setup
14
John lives in San Jose
B-Per O O B-Loc I-Loc
Model
B - Beginning token of an entityI - Inside token of an entityO - Not an entity
![Page 15: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/15.jpg)
Model Architectures
Conditional Random Field
[Lafferty et al., 2001]
Deep LearningArchitectures
[Li et al., 2018]
Fine-tunedLanguage Models
[Devlin et al., 2019]
15
![Page 16: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/16.jpg)
Conditional Random Field (CRF)
16[Lafferty et al., 2001]
John lives in San Jose
B-Per O O B-Loc I-LocHidden State
Observed State .
O
● Discriminative analog to Hidden Markov Model (HMM)● Models local context with transition matrix
![Page 17: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/17.jpg)
CRF Transition Matrix
17
From
To
![Page 18: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/18.jpg)
Deep Learning Architectures
[Li et al., 2018]
Word Embedding, Character EmbeddingHand-crafted Features...
CNN, RNN, LSTM, Transformer, Attention...
MLP+Softmax, CRF... Decode Layer
Input Layer
Context Layer
18
![Page 19: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/19.jpg)
Char-BiLSTM-CRF
Word Representation
Bidirectional LSTM
CRF
Character Representation
OtherFeatures
Decode Layer
Input Layer
Context Layer
19
![Page 20: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/20.jpg)
Character Representations
[Li et al., 2018]20
![Page 21: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/21.jpg)
Decoder
[Li et al., 2018]21
![Page 22: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/22.jpg)
Fine-tuning Pre-trained LM (e.g. BERT)
Fine-tuning
22[Devlin et al., 2019]
![Page 23: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/23.jpg)
Performance on CoNLL 2003
23nlp-progress
Model Type Performance (F1)
CRF ~ 0.85
BiLSTM-CRF ~ 0.92
BERT large ~ 0.93
![Page 24: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/24.jpg)
24
12345
NER on TweetsDataNER ModelConfidence EstimationSystem Overview
Agenda
![Page 25: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/25.jpg)
Confidence Estimation
25
Confidence Estimation
B-Per I-Per O O B-Loc I-Loc I-Loc I-Loc Sijun He is in San Jose , CA
NER Model
Sijun He Person 0.99San Jose, CA Location 0.97
Sijun He is in San Jose, CA
![Page 26: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/26.jpg)
Confidence Estimation
26
0.9 0.6 B-Loc I-Loc
San Jose is in California
NER Model
● Softmax decoder computes token confidence● CRF decoder only computes the confidence for the whole sentence
![Page 27: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/27.jpg)
Confidence Estimation with CRF
[Culotta et al., 2004]27
B I OJane
Doe
went
to
Paris
.
Total Likelihood
B I OJane
Doe
went
to
Paris
.
Constrained Total Likelihood
Entity: Jane DoeConstraints: (Jane, B), (Doe, I)
Find the total likelihood of all possible sequences a.k.a. normalizer Compute the marginal probability
Constraint Forward-Backward Algorithm
![Page 28: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/28.jpg)
28
12345
NER on TweetsDataNER ModelConfidence EstimationSystem Overview
Agenda
![Page 29: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/29.jpg)
System Overview
Model Endpoint Proxy
English NER
Spanish NER
Japanese NER
...
...
29
![Page 30: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/30.jpg)
System Overview
Model Endpoint
HDFS
Cache
Tweet Creation
Scribe
PutOnline Clients
Read
Offline Clients
30
Cache miss
System Read RPS 120k rps
Model Inference RPS 10k rps
Model Latency p99 20 ms
![Page 31: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/31.jpg)
Named Entities in External Articles
31
● One of the core pieces of public conversation on Twitter● Process NER on articles’ title and short snippet● Significant upside in entity signal coverage
No Named Entities in Tweet
Named Entities in the Linked Article:● Brunswick, GA● Detroit● Lions● Georgia
PersonLocationOrganizationProductOther
![Page 32: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/32.jpg)
Future Work
32
● Language-specific Model Architecture● Multilingual Model● Active Learning for Data Efficiency
![Page 33: Named Entity Recognition at Scale with Deep Learning](https://reader030.vdocument.in/reader030/viewer/2022041219/625141fa99234b49d669c078/html5/thumbnails/33.jpg)
Reference
33
● Mateusz Fedoryszak, Brent Frederick, Vijay Rajaram and Changtao Zhong, Real-time Event
Detection on Social Data Streams, KDD 2019, link● John Lafferty, Andrew McCallum and Fernando C.N. Pereira, Conditional Random Fields:
Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2001, link● Jing Li, Aixin Sun, Jianglei Han and Chenliang Li, A Survey on Deep Learning for Named Entity
Recognition, link● Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova, BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding, NAACL-HLT 2019, link● NLP Progress, link● Aron Culotta and Andrew McCallum, Confidence Estimation for Information Extraction,
HLT-NAACL 2004, link