relation extraction, symbolic semantics, distributional semantics heng ji [email protected] oct13, 2015...
TRANSCRIPT
![Page 1: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/1.jpg)
RELATION EXTRACTION, SYMBOLIC SEMANTICS,DISTRIBUTIONAL SEMANTICS
Heng Ji
[email protected], 2015
Acknowledgement: distributional semantics slides from Omer Levy, Yoav Goldberg and Ido Dagan
![Page 2: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/2.jpg)
2
Task Definition Symbolic Semantics
Basic Features World Knowledge Learning Models
Distributional Semantics
Outline
![Page 3: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/3.jpg)
relation: a semantic relationship between two entities
ACE relation type example
Agent-Artifact Rubin Military Design, the makers of the Kursk
Discourse each of whomEmployment/ Membership Mr. Smith, a senior programmer at
MicrosoftPlace-Affiliation Salzburg Red Cross officialsPerson-Social relatives of the deadPhysical a town some 50 miles south of SalzburgOther-Affiliation Republican senators
Relation Extraction: Task
![Page 4: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/4.jpg)
Test Sample
Train Sample
Train Sample
Train Sample
Train Sample
Train Sample
K=3
A Simple Baseline with K-Nearest-Neighbor (KNN)
![Page 5: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/5.jpg)
Test Sample
Train Sample: Employment
Train Sample: Physical
Train Sample: Employment
Train Sample: Employment
Train Sample: Physical
1. If the heads of the mentions don’t match: +82. If the entity types of the heads of the mentions don’t match: +203. If the intervening words don’t match: +10
the president of the United States
the previous president of the United States
the secretary of NIST
US forces in Bahrain Connecticut’s governor
his ranch in Texas
46 26
46
360
Relation Extraction with KNN
![Page 6: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/6.jpg)
Lexical Heads of the mentions and their context words, POS tags
Entity Entity and mention type of the heads of the mentions Entity Positional Structure Entity Context
Syntactic Chunking Premodifier, Possessive, Preposition, Formulaic The sequence of the heads of the constituents, chunks between the two mentions The syntactic relation path between the two mentions Dependent words of the mentions
Semantic Gazetteers Synonyms in WordNet Name Gazetteers Personal Relative Trigger Word List
Wikipedia If the head extent of a mention is found (via simple string matching) in the predicted
Wikipedia article of another mention References: Kambhatla, 2004; Zhou et al., 2005; Jiang and Zhai, 2007; Chan and Roth, 2010,2011
Typical Relation Extraction Features
![Page 7: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/7.jpg)
7
Using Background Knowledge (Chan and Roth, 2010)
• Features employed are usually restricted to being defined on the various representations of the target sentences
• Humans rely on background knowledge to recognize relations
• Overall aim of this work• Propose methods of using knowledge or resources that exists
beyond the sentence• Wikipedia, word clusters, hierarchy of relations, entity type constraints,
coreference• As additional features, or under the Constraint Conditional Model (CCM)
framework with Integer Linear Programming (ILP)
7
![Page 8: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/8.jpg)
8
8
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
![Page 9: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/9.jpg)
9
9
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
![Page 10: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/10.jpg)
10
10
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
![Page 11: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/11.jpg)
11
11
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
![Page 12: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/12.jpg)
12
12
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background KnowledgeDavid Brian Cone (born January 2, 1963) is a former Major League Baseball pitcher. He compiled an 8–3 postseason record over 21 postseason starts and was a part of five World Series championship teams (1992 with the Toronto Blue Jays and 1996, 1998, 1999 & 2000 with the New York Yankees). He had a career postseason ERA of 3.80. He is the subject of the book A Pitcher's Story: Innings With David Cone by Roger Angell. Fans of David are known as "Cone-Heads."Cone lives in Stamford, Connecticut, and is formerly a color commentator for the Yankees on the YES Network.[1]
Contents[hide]1 Early years2 Kansas City Royals3 New York Mets
Partly because of the resulting lack of leadership, after the 1994 season the Royals decided to reduce payroll by trading pitcher David Cone and outfielder Brian McRae, then continued their salary dump in the 1995 season. In fact, the team payroll, which was always among the league's highest, was sliced in half from $40.5 million in 1994 (fourth-highest in the major leagues) to $18.5 million in 1996 (second-lowest in the major leagues)
![Page 13: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/13.jpg)
13
13
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained
Employment:Staff 0.20
Employment:Executive 0.15
Personal:Family 0.10
Personal:Business 0.10
Affiliation:Citizen 0.20
Affiliation:Based-in 0.25
![Page 14: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/14.jpg)
14
14
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
![Page 15: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/15.jpg)
15
15
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
![Page 16: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/16.jpg)
16
16
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Using Background Knowledge
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
0.55
![Page 17: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/17.jpg)
17
Knowledge1: Wikipedia1 (as additional feature)
• We use a Wikifier system (Ratinov et al., 2010) which performs context-sensitive mapping of mentions to Wikipedia pages
• Introduce a new feature based on: •
• introduce a new feature by combining the above with the coarse-grained entity types of mi,mj
otherwise ,0
)(or )( if ,1),(1
imjmji
mAmAmmw ji
17
mi mj
r ?
![Page 18: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/18.jpg)
18
Knowledge1: Wikipedia2 (as additional feature)
• Given mi,mj, we use a Parent-Child system (Do and Roth, 2010) to predict whether they have a parent-child relation
• Introduce a new feature based on:
•
• combine the above with the coarse-grained entity types of mi,mj
otherwise ,0
),( if ,1),(2
jiji
mmchild-parentmmw
18
mi mj
parent-child?
![Page 19: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/19.jpg)
19
Knowledge2: Word Class Information(as additional feature)
• Supervised systems face an issue of data sparseness (of lexical features)
• Use class information of words to support generalization better: instantiated as word clusters in our work• Automatically generated from unlabeled texts using algorithm of
(Brown et al., 1992)
apple pear Apple IBM
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
19
![Page 20: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/20.jpg)
20
Knowledge2: Word Class Information
• Supervised systems face an issue of data sparseness (of lexical features)
• Use class information of words to support generalization better: instantiated as word clusters in our work• Automatically generated from unlabeled texts using algorithm of
(Brown et al., 1992)
apple pear Apple
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
20
IBM
![Page 21: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/21.jpg)
21
Knowledge2: Word Class Information
• Supervised systems face an issue of data sparseness (of lexical features)
• Use class information of words to support generalization better: instantiated as word clusters in our work• Automatically generated from unlabeled texts using algorithm of
(Brown et al., 1992)
apple pear Apple
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
21
IBM011
![Page 22: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/22.jpg)
22
Knowledge2: Word Class Information
• All lexical features consisting of single words will be duplicated with its corresponding bit-string representation
apple pear Apple IBM
0 1 0 1
0 1
bought run of in
0 1 0 1
0 1
0 1
22
00 01 10 11
![Page 23: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/23.jpg)
23
23
weight vector for“local” models collection of
classifiers
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
![Page 24: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/24.jpg)
24
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
24
weight vector for“local” models collection of
classifiers
penalty for violatingthe constraint
how far y is from a “legal” assignment
![Page 25: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/25.jpg)
25
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
25
•Wikipedia•word clusters
•hierarchy of relations•entity type constraints•coreference
![Page 26: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/26.jpg)
26
26
DavidCone,aKansasCitynative,wasoriginallysignedbytheRoyalsandbrokeintothemajorswiththeteam
Constraint Conditional Models (CCMs)
fine-grained coarse-grained
Employment:Staff 0.200.35 Employment
Employment:Executive 0.15
Personal:Family 0.100.40 Personal
Personal:Business 0.10
Affiliation:Citizen 0.200.25 Affiliation
Affiliation:Based-in 0.25
![Page 27: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/27.jpg)
27
• Key steps• Write down a linear objective function• Write down constraints as linear inequalities• Solve using integer linear programming (ILP) packages
27
Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)
![Page 28: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/28.jpg)
28
Knowledge3: Relations between our target relations
......
personal
......
employment
family biz executivestaff
28
![Page 29: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/29.jpg)
29
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
29
coarse-grainedclassifier
fine-grainedclassifier
![Page 30: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/30.jpg)
30
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
30
mi mj
coarse-grained?
fine-grained?
![Page 31: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/31.jpg)
31
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
31
![Page 32: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/32.jpg)
32
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
32
![Page 33: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/33.jpg)
33
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
33
![Page 34: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/34.jpg)
34
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
34
![Page 35: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/35.jpg)
35
Knowledge3: Hierarchy of Relations
......
personal
......
employment
family biz executivestaff
35
![Page 36: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/36.jpg)
36
Knowledge3: Hierarchy of Relations Write down a linear objective function
R LR L R rf
rfRRR rc
rcRR
RfRc
yrfpxrcp ,, )()(max
36
coarse-grainedprediction probabilities
fine-grainedprediction probabilities
![Page 37: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/37.jpg)
37
Knowledge3: Hierarchy of Relations Write down a linear objective function
R LR L R rf
rfRRR rc
rcRR
RfRc
yrfpxrcp ,, )()(max
37
coarse-grainedprediction probabilities
fine-grainedprediction probabilities
coarse-grainedindicatorvariable
fine-grainedindicatorvariable
indicator variable == relation assignment
![Page 38: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/38.jpg)
38
Knowledge3: Hierarchy of Relations Write down constraints
• If a relation R is assigned a coarse-grained label rc, then we must also assign to R a fine-grained relation rf which is a child of rc.
• (Capturing the inverse relationship) If we assign rf to R, then we must also assign to R the parent of rf, which is a corresponding coarse-grained label
38
nrfRrfRrfRrcR yyyx ,,,, 21
)(,, rfparentRrfR xy
![Page 39: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/39.jpg)
39
Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)
• Entity types are useful for constraining the possible labels that a relation R can assume
39
mi mj
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
![Page 40: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/40.jpg)
40
• Entity types are useful for constraining the possible labels that a relation R can assume
40
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
per org
per org
per
per per
per
per
org
gpe
gpe
per per
mi mj
Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)
![Page 41: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/41.jpg)
41
• We gather information on entity type constraints from ACE-2004 documentation and impose them on the coarse-grained relations• By improving the coarse-grained predictions and combining with the
hierarchical constraints defined earlier, the improvements would propagate to the fine-grained predications
41
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
per org
per org
per
per per
per
per
org
gpe
gpe
per per
mi mj
Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)
![Page 42: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/42.jpg)
42
Knowledge5: Coreference
42
mi mj
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
![Page 43: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/43.jpg)
43
Knowledge5: Coreference
• In this work, we assume that we are given the coreference information, which is available from the ACE annotation.
43
mi mj
Employment:Staff
Employment:Executive
Personal:Family
Personal:Business
Affiliation:Citizen
Affiliation:Based-in
null
![Page 44: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/44.jpg)
44
Experiment Results
44
F1% improvement from using each knowledge source
All nwire 10% of nwire
BasicRE 50.5% 31.0%
![Page 45: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/45.jpg)
• Consider different levels of syntactic information• Deep processing of text produces structural but less reliable results• Simple surface information is less structural, but more reliable
• Generalization of feature-based solutions• A kernel (kernel function) defines a similarity metric Ψ(x, y) on objects• No need for enumeration of features
• Efficient extension of normal features into high-order spaces• Possible to solve linearly non-separable problem in a higher order
space
• Nice combination properties• Closed under linear combination• Closed under polynomial extension• Closed under direct sum/product on different domains
• References: Zelenko et al., 2002, 2003; Aron Culotta and Sorensen, 2004; Bunescu and Mooney, 2005; Zhao and Grishman, 2005; Che et al., 2005, Zhang et al., 2006; Qian et al., 2007; Zhou et al., 2007; Khayyamian et al., 2009; Reichartz et al., 2009
Most Successful Learning Methods: Kernel-based
![Page 46: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/46.jpg)
)arg.,arg.(),( 212,1
211 iii
E RRKRR
).,.().,.().,.().,.(),( 2121212121 roleEroleEIsubtypeEsubtypeEItypeEtypeEItkEtkEKEEK TE
, where1) Argument
2) Local dependency
).arg.,.arg.(),( 212,1
212 dseqRdseqRKRR iiDi
lendseqi lendseqj
jiTjiD dwarcdwarcKlabelarclabelarcIdseqdseqK.0 .'0
)).',.().',.(()',(
, where
Kernel Examples for Relation Extraction
).,.().,.().,.(),( 21212121 baseTbaseTIposTposTIwordTwordTITTKT KT is a token kernel defined as:
(Zhao and Grishman, 2005)
lenpathi lenpathj
jiTjipath dwarcdwarcKlabelarclabelarcIpathpathK.0 .'0
)).',.().',.(()',(
).,.(),( 21213 pathRpathRKRR path3) Path
, where
Composite Kernels:
4/)()(),( 22121211 RR
![Page 47: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/47.jpg)
Occurrences of seed tuples:
Computer servers at Microsoft’s headquarters in Redmond…In mid-afternoon trading, share ofRedmond-based Microsoft fell…The Armonk-based IBM introduceda new line…The combined company will operatefrom Boeing’s headquarters in Seattle.
Intel, Santa Clara, cut prices of itsPentium processor.
ORGANIZATION LOCATIONMICROSOFT REDMONDIBM ARMONKBOEING SEATTLEINTEL SANTA CLARA
Initial Seed Tuples Occurrences of Seed Tuples
Generate Extraction Patterns
Generate New Seed Tuples
Augment Table
Bootstrapping for Relation Extraction
![Page 48: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/48.jpg)
• <STRING1>’s headquarters in <STRING2>
•<STRING2> -based <STRING1>
•<STRING1> , <STRING2>
Initial Seed Tuples Occurrences of Seed Tuples
Generate Extraction Patterns
Generate New Seed Tuples
Augment Table
LearnedPatterns:
Bootstrapping for Relation Extraction (Cont’)
![Page 49: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/49.jpg)
Initial Seed Tuples Occurrences of Seed Tuples
Generate Extraction Patterns
Generate New Seed Tuples
Augment Table
Generatenew seedtuples; start newiteration
ORGANIZATION LOCATIONAG EDWARDS ST LUIS157TH STREET MANHATTAN7TH LEVEL RICHARDSON3COM CORP SANTA CLARA3DO REDWOOD CITYJELLIES APPLEMACWEEK SAN FRANCISCO
Bootstrapping for Relation Extraction (Cont’)
![Page 50: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/50.jpg)
50
Task Definition Symbolic Semantics
Basic Features World Knowledge Learning Models
Distributional Semantics
Outline
![Page 51: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/51.jpg)
Word Similarity & Relatedness• How similar is pizza to pasta?• How related is pizza to Italy?
• Representing words as vectors allows easy computation of similarity
51
![Page 52: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/52.jpg)
Approaches for Representing WordsDistributional Semantics (Count)• Used since the 90’s• Sparse word-context PMI/PPMI
matrix• Decomposed with SVD
Word Embeddings (Predict)• Inspired by deep learning• word2vec (Mikolov et al., 2013)• GloVe (Pennington et al., 2014)
52
Underlying Theory: The Distributional Hypothesis (Harris, ’54; Firth, ‘57)
“Similar words occur in similar contexts”
![Page 53: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/53.jpg)
Approaches for Representing WordsBoth approaches:• Rely on the same linguistic theory• Use the same data• Are mathematically related
• “Neural Word Embedding as Implicit Matrix Factorization” (NIPS 2014)
• How come word embeddings are so much better?• “Don’t Count, Predict!” (Baroni et al., ACL 2014)
• More than meets the eye…53
![Page 54: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/54.jpg)
What’s really improving performance?
The Contributions of Word EmbeddingsNovel Algorithms(objective + training method)
• Skip Grams + Negative Sampling
• CBOW + Hierarchical Softmax• Noise Contrastive Estimation• GloVe• …
New Hyperparameters(preprocessing, smoothing, etc.)
• Subsampling• Dynamic Context Windows• Context Distribution
Smoothing• Adding Context Vectors• …
54
![Page 55: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/55.jpg)
What’s really improving performance?
The Contributions of Word EmbeddingsNovel Algorithms(objective + training method)
• Skip Grams + Negative Sampling
• CBOW + Hierarchical Softmax• Noise Contrastive Estimation• GloVe• …
New Hyperparameters(preprocessing, smoothing, etc.)
• Subsampling• Dynamic Context Windows• Context Distribution
Smoothing• Adding Context Vectors• …
55
![Page 56: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/56.jpg)
What’s really improving performance?
The Contributions of Word EmbeddingsNovel Algorithms(objective + training method)
• Skip Grams + Negative Sampling
• CBOW + Hierarchical Softmax• Noise Contrastive Estimation• GloVe• …
New Hyperparameters(preprocessing, smoothing, etc.)
• Subsampling• Dynamic Context Windows• Context Distribution
Smoothing• Adding Context Vectors• …
56
![Page 57: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/57.jpg)
What’s really improving performance?
The Contributions of Word EmbeddingsNovel Algorithms(objective + training method)
• Skip Grams + Negative Sampling
• CBOW + Hierarchical Softmax• Noise Contrastive Estimation• GloVe• …
New Hyperparameters(preprocessing, smoothing, etc.)
• Subsampling• Dynamic Context Windows• Context Distribution
Smoothing• Adding Context Vectors• …
57
![Page 58: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/58.jpg)
Our Contributions
1) Identifying the existence of new hyperparameters• Not always mentioned in papers
2) Adapting the hyperparameters across algorithms• Must understand the mathematical relation between
algorithms
58
![Page 59: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/59.jpg)
Our Contributions
1) Identifying the existence of new hyperparameters• Not always mentioned in papers
2) Adapting the hyperparameters across algorithms• Must understand the mathematical relation between
algorithms
3) Comparing algorithms across all hyperparameter settings• Over 5,000 experiments
59
![Page 60: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/60.jpg)
Background
60
![Page 61: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/61.jpg)
What is word2vec?
61
![Page 62: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/62.jpg)
What is word2vec?
How is it related to PMI?
62
![Page 63: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/63.jpg)
What is word2vec?
• word2vec is not a single algorithm• It is a software package for representing words as
vectors, containing:• Two distinct models
• CBoW• Skip-Gram
• Various training methods• Negative Sampling• Hierarchical Softmax
• A rich preprocessing pipeline• Dynamic Context Windows• Subsampling• Deleting Rare Words 63
![Page 64: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/64.jpg)
What is word2vec?
• word2vec is not a single algorithm• It is a software package for representing words as
vectors, containing:• Two distinct models
• CBoW• Skip-Gram (SG)
• Various training methods• Negative Sampling (NS)• Hierarchical Softmax
• A rich preprocessing pipeline• Dynamic Context Windows• Subsampling• Deleting Rare Words 64
![Page 65: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/65.jpg)
65
Demo
http://rare-technologies.com/word2vec-tutorial/#app
![Page 66: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/66.jpg)
Skip-Grams with Negative Sampling (SGNS)Marco saw a furry little wampimuk hiding in the tree.
“word2vec Explained…”Goldberg & Levy, arXiv 2014
66
![Page 67: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/67.jpg)
Skip-Grams with Negative Sampling (SGNS)Marco saw a furry little wampimuk hiding in the tree.
“word2vec Explained…”Goldberg & Levy, arXiv 2014
67
![Page 68: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/68.jpg)
Skip-Grams with Negative Sampling (SGNS)Marco saw a furry little wampimuk hiding in the tree.
words contextswampimuk furrywampimuk littlewampimuk hidingwampimuk in… …
“word2vec Explained…”Goldberg & Levy, arXiv 2014
(data)
68
![Page 69: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/69.jpg)
Skip-Grams with Negative Sampling (SGNS)• SGNS finds a vector for each word in our
vocabulary • Each such vector has latent dimensions (e.g. )• Effectively, it learns a matrix whose rows represent • Key point: it also learns a similar auxiliary matrix of
context vectors• In fact, each word has two embeddings
“word2vec Explained…”Goldberg & Levy, arXiv 2014
𝑊𝑑
𝑉𝑊
:wampimuk
𝐶𝑉𝐶
𝑑
:wampimuk
≠
69
![Page 70: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/70.jpg)
Skip-Grams with Negative Sampling (SGNS)
“word2vec Explained…”Goldberg & Levy, arXiv 2014
70
![Page 71: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/71.jpg)
Skip-Grams with Negative Sampling (SGNS)• Maximize:
• was observed with
wordscontextswampimuk furrywampimuk littlewampimuk hidingwampimuk in
“word2vec Explained…”Goldberg & Levy, arXiv 2014
71
![Page 72: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/72.jpg)
Skip-Grams with Negative Sampling (SGNS)• Maximize:
• was observed with
wordscontextswampimuk furrywampimuk littlewampimuk hidingwampimuk in
• Minimize: • was hallucinated with
wordscontextswampimukAustraliawampimuk cyberwampimuk thewampimuk 1985
“word2vec Explained…”Goldberg & Levy, arXiv 2014
72
![Page 73: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/73.jpg)
Skip-Grams with Negative Sampling (SGNS)• “Negative Sampling”• SGNS samples contexts at random as negative
examples• “Random” = unigram distribution
• Spoiler: Changing this distribution has a significant effect
73
![Page 74: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/74.jpg)
What is SGNS learning?
74
![Page 75: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/75.jpg)
What is SGNS learning?
• Take SGNS’s embedding matrices ( and )
“Neural Word Embeddings as Implicit Matrix Factorization”
Levy & Goldberg, NIPS 2014
𝑊𝑑
𝑉𝑊
𝑉𝐶
𝑑
𝐶75
![Page 76: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/76.jpg)
What is SGNS learning?
• Take SGNS’s embedding matrices ( and )• Multiply them• What do you get?
𝑊𝑑
𝑉𝑊
𝐶𝑉 𝐶
𝑑
“Neural Word Embeddings as Implicit Matrix Factorization”
Levy & Goldberg, NIPS 2014
76
![Page 77: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/77.jpg)
What is SGNS learning?
• A matrix• Each cell describes the relation between a specific
word-context pair
𝑊𝑑
𝑉𝑊
𝐶𝑉 𝐶
𝑑
“Neural Word Embeddings as Implicit Matrix Factorization”
Levy & Goldberg, NIPS 2014
?¿𝑉𝑊
𝑉 𝐶
77
![Page 78: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/78.jpg)
What is SGNS learning?
• We proved that for large enough and enough iterations
𝑊𝑑
𝑉𝑊
𝐶𝑉 𝐶
𝑑
“Neural Word Embeddings as Implicit Matrix Factorization”
Levy & Goldberg, NIPS 2014
?¿𝑉𝑊
𝑉 𝐶
78
![Page 79: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/79.jpg)
What is SGNS learning?
• We proved that for large enough and enough iterations
• We get the word-context PMI matrix
𝑊𝑑
𝑉𝑊
𝐶𝑉 𝐶
𝑑
“Neural Word Embeddings as Implicit Matrix Factorization”
Levy & Goldberg, NIPS 2014
𝑀𝑃𝑀𝐼¿𝑉𝑊
𝑉 𝐶
79
![Page 80: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/80.jpg)
What is SGNS learning?
• We prove that for large enough and enough iterations
• We get the word-context PMI matrix, shifted by a global constant
𝑊𝑑
𝑉𝑊
𝐶𝑉 𝐶
𝑑
“Neural Word Embeddings as Implicit Matrix Factorization”
Levy & Goldberg, NIPS 2014
𝑀𝑃𝑀𝐼¿𝑉𝑊
𝑉 𝐶
− log𝑘
80
![Page 81: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/81.jpg)
What is SGNS learning?
• SGNS is doing something very similar to the older approaches
• SGNS is factorizing the traditional word-context PMI matrix
• So does SVD!
• GloVe factorizes a similar word-context matrix
81
![Page 82: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/82.jpg)
But embeddings are still better, right?• Plenty of evidence that embeddings outperform
traditional methods• “Don’t Count, Predict!” (Baroni et al., ACL 2014)• GloVe (Pennington et al., EMNLP 2014)
• How does this fit with our story?
82
![Page 83: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/83.jpg)
The Big Impact of “Small” Hyperparameters
83
![Page 84: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/84.jpg)
The Big Impact of “Small” Hyperparameters• word2vec & GloVe are more than just
algorithms…
• Introduce new hyperparameters
• May seem minor, but make a big difference in practice
84
![Page 85: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/85.jpg)
Identifying New Hyperparameters
85
![Page 86: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/86.jpg)
New Hyperparameters
• Preprocessing (word2vec)• Dynamic Context Windows• Subsampling• Deleting Rare Words
• Postprocessing (GloVe)• Adding Context Vectors
• Association Metric (SGNS)• Shifted PMI• Context Distribution Smoothing
86
![Page 87: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/87.jpg)
New Hyperparameters
• Preprocessing (word2vec)• Dynamic Context Windows• Subsampling• Deleting Rare Words
• Postprocessing (GloVe)• Adding Context Vectors
• Association Metric (SGNS)• Shifted PMI• Context Distribution Smoothing
87
![Page 88: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/88.jpg)
New Hyperparameters
• Preprocessing (word2vec)• Dynamic Context Windows• Subsampling• Deleting Rare Words
• Postprocessing (GloVe)• Adding Context Vectors
• Association Metric (SGNS)• Shifted PMI• Context Distribution Smoothing
88
![Page 89: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/89.jpg)
New Hyperparameters
• Preprocessing (word2vec)• Dynamic Context Windows• Subsampling• Deleting Rare Words
• Postprocessing (GloVe)• Adding Context Vectors
• Association Metric (SGNS)• Shifted PMI• Context Distribution Smoothing
89
![Page 90: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/90.jpg)
Dynamic Context Windows
Marco saw a furry little wampimuk hiding in the tree.
90
![Page 91: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/91.jpg)
Dynamic Context Windows
Marco saw a furry little wampimuk hiding in the tree.
91
![Page 92: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/92.jpg)
Dynamic Context Windows
Marco saw a furry little wampimuk hiding in the tree.
word2vec:
GloVe:
Aggressive:
The Word-Space Model (Sahlgren, 2006)92
![Page 93: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/93.jpg)
Adding Context Vectors
• SGNS creates word vectors • SGNS creates auxiliary context vectors
• So do GloVe and SVD
93
![Page 94: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/94.jpg)
Adding Context Vectors
• SGNS creates word vectors • SGNS creates auxiliary context vectors
• So do GloVe and SVD
• Instead of just • Represent a word as:
• Introduced by Pennington et al. (2014)• Only applied to GloVe
94
![Page 95: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/95.jpg)
Adapting Hyperparameters across Algorithms
95
![Page 96: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/96.jpg)
Context Distribution Smoothing• SGNS samples to form negative examples
• Our analysis assumes is the unigram distribution
96
![Page 97: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/97.jpg)
Context Distribution Smoothing• SGNS samples to form negative examples
• Our analysis assumes is the unigram distribution
• In practice, it’s a smoothed unigram distribution
• This little change makes a big difference
97
![Page 98: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/98.jpg)
Context Distribution Smoothing• We can adapt context distribution smoothing to
PMI!
• Replace with :
• Consistently improves PMI on every task
• Always use Context Distribution Smoothing!98
![Page 99: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/99.jpg)
Comparing Algorithms
99
![Page 100: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/100.jpg)
Controlled Experiments
• Prior art was unaware of these hyperparameters
• Essentially, comparing “apples to oranges”
• We allow every algorithm to use every hyperparameter
100
![Page 101: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/101.jpg)
Controlled Experiments
• Prior art was unaware of these hyperparameters
• Essentially, comparing “apples to oranges”
• We allow every algorithm to use every hyperparameter*
* If transferable 101
![Page 102: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/102.jpg)
Systematic Experiments
• 9 Hyperparameters• 6 New
• 4 Word Representation Algorithms• PPMI (Sparse & Explicit)• SVD(PPMI)• SGNS• GloVe
• 8 Benchmarks• 6 Word Similarity Tasks• 2 Analogy Tasks
• 5,632 experiments102
![Page 103: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/103.jpg)
Systematic Experiments
• 9 Hyperparameters• 6 New
• 4 Word Representation Algorithms• PPMI (Sparse & Explicit)• SVD(PPMI)• SGNS• GloVe
• 8 Benchmarks• 6 Word Similarity Tasks• 2 Analogy Tasks
• 5,632 experiments103
![Page 104: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/104.jpg)
Hyperparameter Settings
Classic Vanilla Setting(commonly used for distributional baselines)
• Preprocessing• <None>
• Postprocessing• <None>
• Association Metric• Vanilla PMI/PPMI
104
![Page 105: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/105.jpg)
Hyperparameter Settings
Classic Vanilla Setting(commonly used for distributional baselines)
• Preprocessing• <None>
• Postprocessing• <None>
• Association Metric• Vanilla PMI/PPMI
Recommended word2vec Setting(tuned for SGNS)
• Preprocessing• Dynamic Context Window• Subsampling
• Postprocessing• <None>
• Association Metric• Shifted PMI/PPMI• Context Distribution
Smoothing105
![Page 106: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/106.jpg)
Experiments
PPMI (Sparse Vectors) SGNS (Embeddings)0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
WordSim-353 Relatedness
Spea
rman
’s Co
rrel
ation
106
![Page 107: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/107.jpg)
Experiments: Prior Art
PPMI (Sparse Vectors) SGNS (Embeddings)0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
VanillaSetting
0.54
VanillaSetting
0.587
word2vecSetting
0.688
word2vecSetting
0.623
WordSim-353 Relatedness
Spea
rman
’s Co
rrel
ation
107
Experiments: “Apples to Apples”Experiments: “Oranges to Oranges”
![Page 108: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/108.jpg)
Experiments: “Oranges to Oranges”Experiments: Hyperparameter Tuning
PPMI (Sparse Vectors) SGNS (Embeddings)0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
VanillaSetting
0.54
VanillaSetting
0.587
word2vecSetting
0.688
word2vecSetting
0.623
OptimalSetting
0.697
OptimalSetting
0.681
WordSim-353 Relatedness
Spea
rman
’s Co
rrel
ation
108[different settings]
![Page 109: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/109.jpg)
Overall Results
• Hyperparameters often have stronger effects than algorithms
• Hyperparameters often have stronger effects than more data
• Prior superiority claims were not accurate
109
![Page 110: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/110.jpg)
Re-evaluating Prior Claims
110
![Page 111: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/111.jpg)
Don’t Count, Predict! (Baroni et al., 2014)• “word2vec is better than count-based methods”
• Hyperparameter settings account for most of the reported gaps
• Embeddings do not really outperform count-based methods
111
![Page 112: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/112.jpg)
Don’t Count, Predict! (Baroni et al., 2014)• “word2vec is better than count-based methods”
• Hyperparameter settings account for most of the reported gaps
• Embeddings do not really outperform count-based methods*
* Except for one task…112
![Page 113: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/113.jpg)
GloVe (Pennington et al., 2014)• “GloVe is better than word2vec”
• Hyperparameter settings account for most of the reported gaps
• Adding context vectors applied only to GloVe• Different preprocessing
• We observed the opposite• SGNS outperformed GloVe on every task
• Our largest corpus: 10 billion tokens• Perhaps larger corpora behave differently?
113
![Page 114: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/114.jpg)
GloVe (Pennington et al., 2014)• “GloVe is better than word2vec”
• Hyperparameter settings account for most of the reported gaps
• Adding context vectors applied only to GloVe• Different preprocessing
• We observed the opposite• SGNS outperformed GloVe on every task
• Our largest corpus: 10 billion tokens• Perhaps larger corpora behave differently?
114
![Page 115: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/115.jpg)
Linguistic Regularities in Sparse and ExplicitWord Representations (Levy and Goldberg, 2014)
• “PPMI vectors perform on par with SGNS on analogy tasks”
• Holds for semantic analogies• Does not hold for syntactic analogies (MSR dataset)
• Hyperparameter settings account for most of the reported gaps
• Different context type for PPMI vectors
• Syntactic Analogies: there is a real gap in favor of SGNS115
![Page 116: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/116.jpg)
Conclusions
116
![Page 117: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/117.jpg)
Conclusions: Distributional SimilarityThe Contributions of Word Embeddings:• Novel Algorithms• New Hyperparameters
What’s really improving performance?• Hyperparameters (mostly)• The algorithms are an improvement• SGNS is robust
117
![Page 118: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/118.jpg)
Conclusions: Distributional SimilarityThe Contributions of Word Embeddings:• Novel Algorithms• New Hyperparameters
What’s really improving performance?• Hyperparameters (mostly)• The algorithms are an improvement• SGNS is robust & efficient
118
![Page 119: RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji jih@rpi.edu Oct13, 2015 Acknowledgement: distributional semantics slides from](https://reader030.vdocument.in/reader030/viewer/2022013012/56649f595503460f94c7ebf8/html5/thumbnails/119.jpg)
Conclusions: Methodology
• Look for hyperparameters
• Adapt hyperparameters across different algorithms
• For good results: tune hyperparameters
• For good science: tune baselines’ hyperparameters
Thank you :)119