wei shen †, jianyong wang †, ping luo ‡, min wang ‡ † tsinghua university, beijing, china...
TRANSCRIPT
![Page 1: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/1.jpg)
Wei Shen†, Jianyong Wang†, Ping Luo‡, Min Wang‡
†Tsinghua University, Beijing, China‡HP Labs China, Beijing, China
WWW 2012
Presented by Tom Chao ZhouJuly 17, 2012
04/18/23 1
![Page 2: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/2.jpg)
OutlineMotivationProblem DefinitionPrevious MethodsLINDEN FrameworkExperimentsConclusion
2/3404/18/23
![Page 3: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/3.jpg)
OutlineMotivationProblem DefinitionPrevious MethodsLINDEN FrameworkExperimentsConclusion
3/3404/18/23
![Page 4: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/4.jpg)
MotivationMany large scale knowledge bases have emerged
DBpedia, YAGO, Freebase, and etc.
4/3404/18/23
www.freebase.com
![Page 5: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/5.jpg)
MotivationMany large scale knowledge bases have emerged
DBpedia, YAGO, Freebase, and etc. As world evolves
New facts come into existence Digitally expressed on the Web
Maintaining and growing the existing knowledge basesIntegrating the extracted facts with knowledge
baseChallenge
Name variations “National Basketball Association” “NBA” “New York City” “Big Apple”Entity ambiguity
“Michael Jordan” … …
NBA player
Berkeley professor
5/3404/18/23
![Page 6: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/6.jpg)
OutlineMotivationProblem DefinitionPrevious MethodsLINDEN FrameworkExperimentsConclusion
6/3404/18/23
![Page 7: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/7.jpg)
Problem DefinitionEntity linking task
Input: A textual named entity mention m, already
recognized in the unstructured textOutput:
The corresponding real world entity e in the knowledge base
If the matching entity e for entity mention m does not exist in the knowledge base, we should return NIL for m
7/3404/18/23
![Page 8: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/8.jpg)
Entity linking task
Source: From Information to Knowledge:Harvesting Entities and Relationships from Web Sources. PODS’10.
German Chancellor Angela Merkel and her husband Joachim Sauer went to Ulm, Germany.
NIL
Figure 1: An example of YAGO
8/3404/18/23
![Page 9: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/9.jpg)
OutlineMotivationProblem DefinitionPrevious MethodsLINDEN FrameworkExperimentsConclusion
9/3404/18/23
![Page 10: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/10.jpg)
Previous MethodsEssential step of entity linking
Define a similarity measure between the text around the entity mention and the document associated with the entity
Bag of words modelRepresent the context as a term vectorMeasure the co-occurrence statistics of termsCannot capture the semantic knowledge
Example:Text: Michael Jordan wins NBA champion.
The bag of words model cannot work well!
10/3404/18/23
![Page 11: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/11.jpg)
OutlineMotivationProblem DefinitionPrevious MethodsLINDEN FrameworkExperimentsConclusion
11/3404/18/23
![Page 12: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/12.jpg)
LINDEN FrameworkCandidate Entity Generation
For each named entity mention m Retrieve the set of candidate entities Em
Named Entity DisambiguationFor each candidate entity e∈Em
Define a scoring measureGive a rank to Em
Unlinkable Mention PredictionFor each etop which has the highest score in Em
Validate whether the entity etop is the target entity for mention m
12/3404/18/23
![Page 13: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/13.jpg)
Candidate Entity GenerationIntuitively, the candidates in Em should have
the name of the surface form of m.We build a dictionary that contains vast amount
of information about the surface forms of entitiesLike name variations, abbreviations, confusable
names, spelling variations, nicknames, etc.Leverage the four structures of Wikipedia
Entity pages Redirect pages Disambiguation pages Hyperlinks in Wikipedia articles
13/3404/18/23
![Page 14: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/14.jpg)
Candidate Entity Generation (Cont’)
For each mention mSearch it in the field of surface formsIf a hit is found, we add all target entities of
that surface form m to the set of candidate entities Em
Table 1: An example of the dictionary
14/3404/18/23
![Page 15: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/15.jpg)
Named Entity DisambiguationGoal:
Give a rank to candidate entities according to their scores
Define four featuresFeature 1: Link probability
Based on the count information in the dictionarySemantic network based features
Feature 2: Semantic associativity Based on the Wikipedia hyperlink structure
Feature 3: Semantic similarity Derived from the taxonomy of YAGO
Feature 4: Global coherence Global document-level topical coherence among
entities 15/3404/18/23
![Page 16: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/16.jpg)
Link Probability
Feature 1: link probability LP(e|m) for candidate entity e
where countm(e) is the number of links which point to entity e and have the surface form m
Table 1: An example of the dictionary
0.81
0.05
LP
16/3404/18/23
![Page 17: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/17.jpg)
Semantic Network Construction Recognize all the Wikipedia concepts Γd in the document d
The open source toolkit Wikipedia-Miner1
Example: The Chicago Bulls’ player Michael Jordan won his first NBA championship in
1991. Set of entity mentions: {Michael Jordan, NBA} Candidate entities:
Michael Jordan {Michael J. Jordan, Michael I. Jordan} NBA {National Basketball Association, Nepal Basketball Association}
Γd : {NBA All-Star Game, David Joel Stern, Charlotte Bobcats, Chicago Bulls} Hyperlink structure of Wikipedia articles Taxonomy of concepts in YAGO
1http://wikipedia-miner.sourceforge.net/index.htm
Figure 2: An example of the constructed semantic network
17/3404/18/23
![Page 18: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/18.jpg)
Semantic AssociativityFeature 2: semantic associativity SA(e) for
each candidate entity e
Figure 2: An example of the constructed semantic network
18/3404/18/23
![Page 19: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/19.jpg)
Semantic Associativity (Cont’)Given two Wikipedia concepts e1 and e2
Wikipedia Link-based Measure (WLM) [1]Semantic associativity between them
where E1 and E2 are the sets of Wikipedia concepts that hyperlink to e1 and e2 respectively, and W is the set of all concepts in Wikipedia
[1] D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of WIKIAI, 2008.
19/3404/18/23
![Page 20: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/20.jpg)
Semantic SimilarityFeature 3: semantic similarity SS(e) for each
candidate entity e
where Θk is the set of k context concepts in Γd which have the highest semantic similarity with entity e
Figure 2: An example of the constructed semantic network
k=2
20/3404/18/23
![Page 21: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/21.jpg)
Semantic Similarity (Cont’)Given two Wikipedia concepts e1 and e2
Assume the sets of their super classes are Φe1 and Φe2 For each class C1 in the set Φe1
Assign a target class ε(C1) in another set Φe2 as
Where sim(C1, C2) is the semantic similarity between two classes C1 and C2
To compute sim(C1, C2) Adopt the information-theoretic approach introduced in
[2]
Where C0 is the lowest common ancestor node for class nodes C1 and C2 in the hierarchy, P(C) is the probability that a randomly selected object belongs to the subtree with the root of C in the taxonomy.
[2] D. Lin. An information-theoretic definition of similarity. In Proceedings of ICML, pages 296–304, 1998. 21/3404/18/23
![Page 22: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/22.jpg)
Semantic Similarity (Cont’)Calculate the semantic similarity from one
set of classes Φe1 to another set of classes Φe2
Define the semantic similarity between Wikipedia concepts e1 and e2
22/3404/18/23
![Page 23: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/23.jpg)
Global CoherenceFeature 4: global coherence GC(e) for each
candidate entity eMeasured as the average semantic associativity of
candidate entity e to the mapping entities of the other mentions
where em’ is the mapping entity of mention m’Substitute the most likely assigned entity for the
mapping entity in Formula 9
The most likely assigned entity e’m’ for mention m is defined as the candidate entity which has the maximum link probability in Em
23/3404/18/23
![Page 24: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/24.jpg)
Global Coherence (Cont’)
Figure 2: An example of the constructed semantic network
24/3404/18/23
![Page 25: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/25.jpg)
Candidates RankingTo generate a feature vector Fm(e) for each e ∈ Em
To calculate Scorem(e) for each candidate e
where is the weight vector which gives different weights for each feature element in Fm(e)
Rank the candidates and pick the top candidate as the predicted mapping entity for mention m
To learn , we use a max-margin technique based on the training data setAssume Scorem(e∗) is larger than any other Scorem(e)
with a margin
We minimize over ξm ≥ 0 and the objective25/3404/18/23
![Page 26: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/26.jpg)
Unlinkable Mention PredictionPredict mention m as an unlinkable mention
If the size of Em generated in the Candidate Entities Generation module is equal to zero
If Scorem(etop) is smaller than the learned threshold τ
26/3404/18/23
![Page 27: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/27.jpg)
OutlineMotivationProblem DefinitionPrevious MethodsLINDEN FrameworkExperimentsConclusion
27/3404/18/23
![Page 28: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/28.jpg)
Experiment SetupData sets
CZ data set: newswire data used by Cucerzan [3]
TAC-KBP2009 data set: used in the track of Knowledge Base Population (KBP) at the Text Analysis Conference (TAC) 2009
Parameters learning:10-fold cross validation
[3] S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of EMNLP-CoNLL, pages 708–716, 2007.
28/3404/18/23
![Page 29: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/29.jpg)
Results over the CZ data set
29/3404/18/23
![Page 30: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/30.jpg)
Results over the CZ data set
30/3404/18/23
![Page 31: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/31.jpg)
Results on the TAC-KBP2009 data set
31/3404/18/23
![Page 32: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/32.jpg)
Results on the TAC-KBP2009 data set
32/3404/18/23
![Page 33: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/33.jpg)
OutlineMotivationProblem DefinitionPrevious MethodsLINDEN FrameworkExperimentsConclusion
33/3404/18/23
![Page 34: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/34.jpg)
ConclusionLINDEN
A novel framework to link named entities in text with YAGO
Leveraging the rich semantic knowledge derived from the Wikipedia and the taxonomy of YAGO
Significantly outperforms the state-of-the-art methods in terms of accuracy
34/3404/18/23
![Page 35: Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao](https://reader038.vdocument.in/reader038/viewer/2022102606/56649d1f5503460f949f32ef/html5/thumbnails/35.jpg)
Thanks!Q&A
35/3404/18/23