natural language inference in the real world · natural language inference in the real world ellie...
Post on 07-Jul-2020
6 Views
Preview:
TRANSCRIPT
Natural Language Inference in the
Real WorldEllie Pavlick
University of Pennsylvania
Natural Language Inference
Natural Language Inference
A man is pointing at a silver sedan.
Natural Language Inference
There is no man pointing at a car.
A man is pointing at a silver sedan.
Natural Language Inference
There is no man pointing at a car.
A man is pointing at a silver sedan.
Natural Language Inference
There is no man pointing at a car.
A man is pointing at a silver sedan.
Natural Language Inference
There is no man pointing at a car.
A man is pointing at a silver sedan.
Natural Language Inference
There is no man pointing at a car.
A man is pointing at a silver sedan.
280 Bos
NP/N: A N/N: record N: date
λq.λp.(x
;q@x;p@x) λp.λx.(
y
record(y)nn(y,x)
;p@x) λx.date(x)
[fa]N: record date
λx.(
y
record(y)nn(y,x)
;date(x)
)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [merge]
λx.
y
record(y)nn(y,x)date(x)
[fa]NP: A record date
λp.(x
;
y
record(y)nn(y,x)date(x)
;p@x)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [merge]
λp.
x y
record(y)nn(y,x)date(x)
;p@x
Figure 1: Derivation with λ-DRSs, including β-conversion, for “A record date”. Com-binatory rules are indicated by solid lines, semantic rules by dotted lines.
DRS composed by the α-operator. The merge conjoins two DRSs into a larger DRS— semantically the merge is interpretated as (dynamic) logical conjunction. Merge-reduction is the process of eliminating the merge operation by forming a new DRSresulting from the union of the domains and conditions of the argument DRSso ofa merge, respectively (obeying certain constraints). Figure 1 illustrates the syntax-semantics interface (and merge-reduction) for a derivation of a simple noun phrase.
Boxer adopts Van der Sandt’s view as presupposition as anaphora (Van der Sandt,1992), in which presuppositional expressions are either resolved to previously estab-lished discourse entities or accommodated on a suitable level of discourse. Van derSandt’s proposal is cast in DRT, and therefore relatively easy to integrate in Boxer’ssemantic formalism. The α-operator indicates information that has to be resolved inthe context, and is lexically introduced by anaphoric or presuppositional expressions.A DRS constructed with α resembles the proto-DRS of Van der Sandt’s theory of pre-supposition (Van der Sandt, 1992) although they are syntactically defined in a slightlydifferent way to overcome problems with free and bound variables, following Bos(2003). Note that the difference between anaphora and presupposition collapses inVan der Sandt’s theory.
The types are the ingredients of a typed lambda calculus that is employed to con-struct DRSs in a bottom-up fashion, compositional way. The language of lambda-
Bos (2008)
Natural Language Inference
There is no man pointing at a car.
A man is pointing at a silver sedan.
280 Bos
NP/N: A N/N: record N: date
λq.λp.(x
;q@x;p@x) λp.λx.(
y
record(y)nn(y,x)
;p@x) λx.date(x)
[fa]N: record date
λx.(
y
record(y)nn(y,x)
;date(x)
)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [merge]
λx.
y
record(y)nn(y,x)date(x)
[fa]NP: A record date
λp.(x
;
y
record(y)nn(y,x)date(x)
;p@x)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [merge]
λp.
x y
record(y)nn(y,x)date(x)
;p@x
Figure 1: Derivation with λ-DRSs, including β-conversion, for “A record date”. Com-binatory rules are indicated by solid lines, semantic rules by dotted lines.
DRS composed by the α-operator. The merge conjoins two DRSs into a larger DRS— semantically the merge is interpretated as (dynamic) logical conjunction. Merge-reduction is the process of eliminating the merge operation by forming a new DRSresulting from the union of the domains and conditions of the argument DRSso ofa merge, respectively (obeying certain constraints). Figure 1 illustrates the syntax-semantics interface (and merge-reduction) for a derivation of a simple noun phrase.
Boxer adopts Van der Sandt’s view as presupposition as anaphora (Van der Sandt,1992), in which presuppositional expressions are either resolved to previously estab-lished discourse entities or accommodated on a suitable level of discourse. Van derSandt’s proposal is cast in DRT, and therefore relatively easy to integrate in Boxer’ssemantic formalism. The α-operator indicates information that has to be resolved inthe context, and is lexically introduced by anaphoric or presuppositional expressions.A DRS constructed with α resembles the proto-DRS of Van der Sandt’s theory of pre-supposition (Van der Sandt, 1992) although they are syntactically defined in a slightlydifferent way to overcome problems with free and bound variables, following Bos(2003). Note that the difference between anaphora and presupposition collapses inVan der Sandt’s theory.
The types are the ingredients of a typed lambda calculus that is employed to con-struct DRSs in a bottom-up fashion, compositional way. The language of lambda-
Bos (2008)
20
LAP
tokenizer)tagger)NER)parser)
coref3resol.)
EDA
Parse)tree))deriva9on))genera9on)
Tree)space)search)
derived trees
derivation steps
From T to H
good candidates Classifier Initial parse tree of T&H
Syntactic knowledge components
Lexical knowledge components
Entailment decision
COMPONENTS
Fig. 6. System decomposition for the transformation-based BIUTEE system
7 System Mapping
In this section, we show the practical usability of our architecture. We illustratehow three existing RTE systems can be mapped concretely onto our architecture:BIUTEE, a transformation-based system; EDITS, an alignment-based system; andTIE, a multi-stage classification system, which represent different approaches toRTE. We also discuss mapping for formal reasoning-based systems.
7.1 BIUTEE
BIUTEE (Stern and Dagan, 2011) is a system for recognizing textual entailmentbased on the transformation-based approach outlined in Section 2.2 developedat Bar-Ilan University (BIU).12 The system derives the Hypothesis (H) from theText (T) with a sequence of rewrite steps. Since BIUTEE represents H and T asdependency trees, these rewrite steps can either involve just lexical knowledge (noderewrites) or lexico-syntactic knowledge (subtree rewrites). Additional operations(e.g., the insertion of syntactic structure or modification of POS types) ensure theexistence for a derivation for any T-H pair. The likelihood that a derivation preservesentailment is finally calculated by a confidence model.
Figure 6 shows the decomposition of BIUTEE in terms of the EXCITEMENTarchitecture. The LAP provides all required annotations, including parse trees. TheEDA encapsulates the “top level” of the algorithm. It consists of three parts: ageneration part where rewrite steps are performed, i.e., entailed trees are generated;a search part which selects good candidates among the generated derivations; and aclassifier. The generation part relies on entailment rules which are stored modularlyin (lexico-)syntactic knowledge components and lexical knowledge components. The
12 BIUTEE 2.4.1 is downloadable from http://www.cs.biu.ac.il/~nlp/downloads/biutee/
Padó et al. (2015)
Natural Language Inference
There is no man pointing at a car.
A man is pointing at a silver sedan.
280 Bos
NP/N: A N/N: record N: date
λq.λp.(x
;q@x;p@x) λp.λx.(
y
record(y)nn(y,x)
;p@x) λx.date(x)
[fa]N: record date
λx.(
y
record(y)nn(y,x)
;date(x)
)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [merge]
λx.
y
record(y)nn(y,x)date(x)
[fa]NP: A record date
λp.(x
;
y
record(y)nn(y,x)date(x)
;p@x)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [merge]
λp.
x y
record(y)nn(y,x)date(x)
;p@x
Figure 1: Derivation with λ-DRSs, including β-conversion, for “A record date”. Com-binatory rules are indicated by solid lines, semantic rules by dotted lines.
DRS composed by the α-operator. The merge conjoins two DRSs into a larger DRS— semantically the merge is interpretated as (dynamic) logical conjunction. Merge-reduction is the process of eliminating the merge operation by forming a new DRSresulting from the union of the domains and conditions of the argument DRSso ofa merge, respectively (obeying certain constraints). Figure 1 illustrates the syntax-semantics interface (and merge-reduction) for a derivation of a simple noun phrase.
Boxer adopts Van der Sandt’s view as presupposition as anaphora (Van der Sandt,1992), in which presuppositional expressions are either resolved to previously estab-lished discourse entities or accommodated on a suitable level of discourse. Van derSandt’s proposal is cast in DRT, and therefore relatively easy to integrate in Boxer’ssemantic formalism. The α-operator indicates information that has to be resolved inthe context, and is lexically introduced by anaphoric or presuppositional expressions.A DRS constructed with α resembles the proto-DRS of Van der Sandt’s theory of pre-supposition (Van der Sandt, 1992) although they are syntactically defined in a slightlydifferent way to overcome problems with free and bound variables, following Bos(2003). Note that the difference between anaphora and presupposition collapses inVan der Sandt’s theory.
The types are the ingredients of a typed lambda calculus that is employed to con-struct DRSs in a bottom-up fashion, compositional way. The language of lambda-
Bos (2008)
Published as a conference paper at ICLR 2016
x1
c1
h1
x2
c2
h2
x3
c3
h3
x4
c4
h4
x5
c5
h5
x6
c6
h6
x7
c7
h7
x8
c8
h8
x9
c9
h9
A wedding party taking pictures :: Someone got married
Premise Hypothesis
(A) ConditionalEncoding
(C) Word-by-wordAttention
(B) Attention
Figure 1: Recognizing textual entailment using (A) conditional encoding via two LSTMs, one overthe premise and one over the hypothesis conditioned on the representation of the premise (c5), (B)attention only based on the last output vector (h9) or (C) word-by-word attention based on all outputvectors of the hypothesis (h7, h8 and h9).
state is initialized with the last cell state of the previous LSTM (c5 in the example), i.e. it is condi-tioned on the representation that the first LSTM built for the premise (A). We use word2vec vectors(Mikolov et al., 2013) as word representations, which we do not optimize during training. Out-of-vocabulary words in the training set are randomly initialized by sampling values uniformly from(�0.05, 0.05) and optimized during training.1 Out-of-vocabulary words encountered at inferencetime on the validation and test corpus are set to fixed random vectors. By not tuning representationsof words for which we have word2vec vectors, we ensure that at inference time their representationstays close to unseen similar words for which we have word2vec embeddings. We use a linear layerto project word vectors to the dimensionality of the hidden size of the LSTM, yielding input vectorsxi
. Finally, for classification we use a softmax layer over the output of a non-linear projection ofthe last output vector (h9 in the example) into the target space of the three classes (ENTAILMENT,NEUTRAL or CONTRADICTION), and train using the cross-entropy loss.
2.3 ATTENTION
Attentive neural networks have recently demonstrated success in a wide range of tasks ranging fromhandwriting synthesis (Graves, 2013), digit classification (Mnih et al., 2014), machine translation(Bahdanau et al., 2015), image captioning (Xu et al., 2015), speech recognition (Chorowski et al.,2015) and sentence summarization (Rush et al., 2015), to geometric reasoning (Vinyals et al., 2015).The idea is to allow the model to attend over past output vectors (see Figure 1 B), thereby mitigatingthe LSTM’s cell state bottleneck. More precisely, an LSTM with attention for RTE does not need tocapture the whole semantics of the premise in its cell state. Instead, it is sufficient to output vectorswhile reading the premise and accumulating a representation in the cell state that informs the secondLSTM which of the output vectors of the premise it needs to attend over to determine the RTE class.
Let Y 2 Rk⇥L be a matrix consisting of output vectors [h1 · · · hL
] that the first LSTM producedwhen reading the L words of the premise, where k is a hyperparameter denoting the size of em-beddings and hidden layers. Furthermore, let e
L
2 RL be a vector of 1s and hN
be the last outputvector after the premise and hypothesis were processed by the two LSTMs respectively. The atten-tion mechanism will produce a vector ↵ of attention weights and a weighted representation r of thepremise via
M = tanh(WyY +WhhN
⌦ eL
) M 2 Rk⇥L (7)
↵ = softmax(wTM) ↵ 2 RL (8)
r = Y↵T r 2 Rk (9)
1We found 12.1k words in SNLI for which we could not obtain word2vec embeddings, resulting in 3.65Mtunable parameters.
3
Rocktäschel et al. (2016)
20
LAP
tokenizer)tagger)NER)parser)
coref3resol.)
EDA
Parse)tree))deriva9on))genera9on)
Tree)space)search)
derived trees
derivation steps
From T to H
good candidates Classifier Initial parse tree of T&H
Syntactic knowledge components
Lexical knowledge components
Entailment decision
COMPONENTS
Fig. 6. System decomposition for the transformation-based BIUTEE system
7 System Mapping
In this section, we show the practical usability of our architecture. We illustratehow three existing RTE systems can be mapped concretely onto our architecture:BIUTEE, a transformation-based system; EDITS, an alignment-based system; andTIE, a multi-stage classification system, which represent different approaches toRTE. We also discuss mapping for formal reasoning-based systems.
7.1 BIUTEE
BIUTEE (Stern and Dagan, 2011) is a system for recognizing textual entailmentbased on the transformation-based approach outlined in Section 2.2 developedat Bar-Ilan University (BIU).12 The system derives the Hypothesis (H) from theText (T) with a sequence of rewrite steps. Since BIUTEE represents H and T asdependency trees, these rewrite steps can either involve just lexical knowledge (noderewrites) or lexico-syntactic knowledge (subtree rewrites). Additional operations(e.g., the insertion of syntactic structure or modification of POS types) ensure theexistence for a derivation for any T-H pair. The likelihood that a derivation preservesentailment is finally calculated by a confidence model.
Figure 6 shows the decomposition of BIUTEE in terms of the EXCITEMENTarchitecture. The LAP provides all required annotations, including parse trees. TheEDA encapsulates the “top level” of the algorithm. It consists of three parts: ageneration part where rewrite steps are performed, i.e., entailed trees are generated;a search part which selects good candidates among the generated derivations; and aclassifier. The generation part relies on entailment rules which are stored modularlyin (lexico-)syntactic knowledge components and lexical knowledge components. The
12 BIUTEE 2.4.1 is downloadable from http://www.cs.biu.ac.il/~nlp/downloads/biutee/
Padó et al. (2015)
Natural Language Inference
Natural Language Inference. (MacCartney PhD Thesis 2009)
No man is pointing at a car.
A man is pointing at a silver sedan.
Natural Language Inference
Natural Language Inference. (MacCartney PhD Thesis 2009)
No man is pointing at a car.
A man is pointing at a silver sedan.
Natural Language Inference
Natural Language Inference. (MacCartney PhD Thesis 2009)
No man is pointing at a car.
A man is pointing at a silver sedan.
A man is pointing at a silver car. SUB(sedan, car) ⊏ yes
Natural Language Inference
Natural Language Inference. (MacCartney PhD Thesis 2009)
No man is pointing at a car.
A man is pointing at a silver sedan.
A man is pointing at a silver car.
A man is pointing at a car.
SUB(sedan, car) ⊏DEL(silver) ⊏
yes
yes
Natural Language Inference
Natural Language Inference. (MacCartney PhD Thesis 2009)
No man is pointing at a car.
A man is pointing at a silver sedan.
A man is pointing at a silver car.
A man is pointing at a car.
SUB(sedan, car) ⊏DEL(silver) ⊏INS(no) ^
yes
yes
no
Natural Language Inference
A man is pointing at a silver sedan.
Natural Language Inference
A man is pointing at a silver sedan.
Jimmy Dean refused to dance without pants.
Natural Language Inference. (MacCartney PhD Thesis 2009)
Natural Language Inference
A man is pointing at a silver sedan.
Last December they had argued that the council had failed to consider possible environmental
effects of contaminated land at the site.
Natural Language Inference
A man is pointing at a silver sedan.
Last December they had argued that the council had failed to consider possible environmental
effects of contaminated land at the site.
Natural Language Inference
A man is pointing at a silver sedan.
Last December they had argued that the council had failed to consider possible environmental
effects of contaminated land at the site.
Natural Language Inference
Last December they had argued that the council had failed to consider possible environmental
effects of contaminated land at the site.
Question Answering
Did the council consider the environmental effects?
Yes/No
Natural Language Inference
Summarization
They argued that the council didn’t consider environmental effects.
Last December they had argued that the council had failed to consider possible environmental
effects of contaminated land at the site.
Natural Language Inference
Dialogue
Remove from calendar?
I decided I don’t want to go to that party on Saturday.
Wish List• Models of language that are descriptive rather
than prescriptive based on judgements by real people rather than linguists
• Methods that are derived from data, rather than reliant on lexicons or ontologies
• Natural language itself as a logical form, rather than requiring an intermediate representation
Wish List• We want to get out of the “lab” and model
language that people actually use
• We want models of language that are descriptive rather than prescriptive based on judgements by real people rather than linguists
• We want methods that are derived from data, rather than reliant on lexicons or ontologies
Wish List• We want to get out of the “lab” and model
language that people actually use
• We want rules for logical composition that are descriptive rather than prescriptive based on judgements by real people
• We want methods that are derived from data, rather than reliant on lexicons or ontologies
Wish List• We want to get out of the “lab” and model
language that people actually use
• We want rules for logical composition that are descriptive rather than prescriptive based on judgements by real people
• We want methods that are derived from data, rather than reliant on lexicons or ontologies
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
lexical semantics
SUB(effects, consequences)
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
modifiers
INS(environmental)
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
(non-subsective) modifiers
DEL(possible)
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
predicates
SUB(consider, consider)
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
“higher order” predicates
DEL(fail to)
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
lexical semantics
SUB(effects, consequences)
Lexical Semantics
Lexical Semantics
WordNet
entity
physical entity
person
male child
boy
Cub Scout
abstraction
right
legal right cabotage
body of water
female child
girl
Lexical Semantics
WordNet
entity
physical entity
person
male child
boy
Cub Scout
abstraction
right
legal right cabotage
body of water
female child
girl
a boy is a male child
Lexical Semantics
WordNet
entity
physical entity
person
male child
boy
Cub Scout
abstraction
right
legal right cabotage
body of water
female child
girl
a boy is a male childa Cub Scout is a boy
Lexical Semantics
WordNet
entity
physical entity
person
male child
boy
Cub Scout
abstraction
right
legal right cabotage
body of water
female child
girla girl is NOT a boy
a boy is a male childa Cub Scout is a boy
Lexical Semantics
entity
physical entity
person
male child
boy
Cub Scout
abstraction
right
legal right cabotage
body of water
female child
girl
WordNet
Lexical Semantics
Bilingual Pivoting (PPDB)
entity
physical entity
person
male child
boy
Cub Scout
abstraction
right
legal right cabotage
body of water
female child
girl
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
WordNet
Lexical Semantics
WordNet
Bilingual Pivoting (PPDB)
Vector Space Models (word2vec)
entity
physical entity
person
male child
boy
Cub Scout
abstraction
right
legal right cabotage
body of water
female child
girl
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
Lexical Semanticsboy kid boy girl boy doggy boy mother
boy sons boy dude boy pup boy mommy
boy guys boy fella boy ah boy father
boy males boy laddie boy infant boy student
boy son boy gentleman boy pooch boy person
boy child boy male boy yarn boy type
boy boyfriend boy youth boy sonny boy cheeky
boy hans boy juvenile boy childhood boy buster
boy lad boy toddler boy doggy boy husband
boy guy boy puppy boy servant boy offspring
boy wraps boy fellow boy grandson boy wee
boy gus boy mate boy colt boy idiot
boy bollocks boy little boy darling boy partner
boy teenager boy bro boy teen boy toy
boy baby boy bloke boy junior boy old-timer
boy waiter boy boss boy baby boy calf
boy men boy kiddo boy bit boy protege
boy mandog boy apprentice boy daughter boy sage
boy buddy boy brat boy foal boy kitty
boy friend boy lapdog boy bearer boy bloodhound
boy does boy children boy shorty boy homie
boy pops boy bachelor boy foal boy cub
boy youngster boy soldier boy chum boy wolf
boy brother boy sweetheart boy blood boy honey
Lexical Semanticsboy kid boy girl boy doggy boy mother
boy sons boy dude boy pup boy mommy
boy guys boy fella boy ah boy father
boy males boy laddie boy infant boy student
boy son boy gentleman boy pooch boy person
boy child boy male boy yarn boy type
boy boyfriend boy youth boy sonny boy cheeky
boy hans boy juvenile boy childhood boy buster
boy lad boy toddler boy doggy boy husband
boy guy boy puppy boy servant boy offspring
boy wraps boy fellow boy grandson boy wee
boy gus boy mate boy colt boy idiot
boy bollocks boy little boy darling boy partner
boy teenager boy bro boy teen boy toy
boy baby boy bloke boy junior boy old-timer
boy waiter boy boss boy baby boy calf
boy men boy kiddo boy bit boy protege
boy mandog boy apprentice boy daughter boy sage
boy buddy boy brat boy foal boy kitty
boy friend boy lapdog boy bearer boy bloodhound
boy does boy children boy shorty boy homie
boy pops boy bachelor boy foal boy cub
boy youngster boy soldier boy chum boy wolf
boy brother boy sweetheart boy blood boy honey
boy ⊏ kid
boy ⊏ person
Lexical Semanticsboy kid boy girl boy doggy boy mother
boy sons boy dude boy pup boy mommy
boy guys boy fella boy ah boy father
boy males boy laddie boy infant boy student
boy son boy gentleman boy pooch boy person
boy child boy male boy yarn boy type
boy boyfriend boy youth boy sonny boy cheeky
boy hans boy juvenile boy childhood boy buster
boy lad boy toddler boy doggy boy husband
boy guy boy puppy boy servant boy offspring
boy wraps boy fellow boy grandson boy wee
boy gus boy mate boy colt boy idiot
boy bollocks boy little boy darling boy partner
boy teenager boy bro boy teen boy toy
boy baby boy bloke boy junior boy old-timer
boy waiter boy boss boy baby boy calf
boy men boy kiddo boy bit boy protege
boy mandog boy apprentice boy daughter boy sage
boy buddy boy brat boy foal boy kitty
boy friend boy lapdog boy bearer boy bloodhound
boy does boy children boy shorty boy homie
boy pops boy bachelor boy foal boy cub
boy youngster boy soldier boy chum boy wolf
boy brother boy sweetheart boy blood boy honey
boy ∣ girl
boy ∣ puppy
Lexical Semanticsboy kid boy girl boy doggy boy mother
boy sons boy dude boy pup boy mommy
boy guys boy fella boy ah boy father
boy males boy laddie boy infant boy student
boy son boy gentleman boy pooch boy person
boy child boy male boy yarn boy type
boy boyfriend boy youth boy sonny boy cheeky
boy hans boy juvenile boy childhood boy buster
boy lad boy toddler boy doggy boy husband
boy guy boy puppy boy servant boy offspring
boy wraps boy fellow boy grandson boy wee
boy gus boy mate boy colt boy idiot
boy bollocks boy little boy darling boy partner
boy teenager boy bro boy teen boy toy
boy baby boy bloke boy junior boy old-timer
boy waiter boy boss boy baby boy calf
boy men boy kiddo boy bit boy protege
boy dog boy apprentice boy daughter boy sage
boy buddy boy brat boy foal boy kitty
boy friend boy lapdog boy bearer boy bloodhound
boy does boy children boy shorty boy homie
boy pops boy bachelor boy foal boy cub
boy youngster boy soldier boy chum boy wolf
boy brother boy sweetheart boy blood boy honey
boy # toddler
boy # idiot
The Paraphrase Databaseboy kid boy girl boy doggy boy motherboy sons boy dude boy pup boy mommyboy guys boy fella boy ah boy fatherboy males boy laddie boy infant boy studentboy son boy gentleman boy pooch boy personboy child boy male boy yarn boy typeboy boyfriend boy youth boy sonny boy cheekyboy hans boy juvenile boy childhood boy busterboy lad boy toddler boy doggy boy husbandboy guy boy puppy boy servant boy offspringboy wraps boy fellow boy grandson boy weeboy gus boy mate boy colt boy idiotboy bollocks boy little boy darling boy partnerboy teenager boy bro boy teen boy toyboy baby boy bloke boy junior boy old-timerboy waiter boy boss boy baby boy calfboy men boy kiddo boy bit boy protegeboy dog boy apprentice boy daughter boy sageboy buddy boy brat boy foal boy kittyboy friend boy lapdog boy bearer boy bloodhoundboy does boy children boy shorty boy homieboy pops boy bachelor boy foal boy cubboy youngster boy soldier boy chum boy wolfboy brother boy sweetheart boy blood boy honey
The Paraphrase Databaseboy kid boy girl boy doggy boy motherboy sons boy dude boy pup boy mommyboy guys boy fella boy ah boy fatherboy males boy laddie boy infant boy studentboy son boy gentleman boy pooch boy personboy child boy male boy yarn boy typeboy boyfriend boy youth boy sonny boy cheekyboy hans boy juvenile boy childhood boy busterboy lad boy toddler boy doggy boy husbandboy guy boy puppy boy servant boy offspringboy wraps boy fellow boy grandson boy weeboy gus boy mate boy colt boy idiotboy bollocks boy little boy darling boy partnerboy teenager boy bro boy teen boy toyboy baby boy bloke boy junior boy old-timerboy waiter boy boss boy baby boy calfboy men boy kiddo boy bit boy protegeboy dog boy apprentice boy daughter boy sageboy buddy boy brat boy foal boy kittyboy friend boy lapdog boy bearer boy bloodhoundboy does boy children boy shorty boy homieboy pops boy bachelor boy foal boy cubboy youngster boy soldier boy chum boy wolfboy brother boy sweetheart boy blood boy honey
The Paraphrase Databaseboy kid boy girl boy doggy boy motherboy sons boy dude boy pup boy mommyboy guys boy fella boy ah boy fatherboy males boy laddie boy infant boy studentboy son boy gentleman boy pooch boy personboy child boy male boy yarn boy typeboy boyfriend boy youth boy sonny boy cheekyboy hans boy juvenile boy childhood boy busterboy lad boy toddler boy doggy boy husbandboy guy boy puppy boy servant boy offspringboy wraps boy fellow boy grandson boy weeboy gus boy mate boy colt boy idiotboy bollocks boy little boy darling boy partnerboy teenager boy bro boy teen boy toyboy baby boy bloke boy junior boy old-timerboy waiter boy boss boy baby boy calfboy men boy kiddo boy bit boy protegeboy dog boy apprentice boy daughter boy sageboy buddy boy brat boy foal boy kittyboy friend boy lapdog boy bearer boy bloodhoundboy does boy children boy shorty boy homieboy pops boy bachelor boy foal boy cubboy youngster boy soldier boy chum boy wolfboy brother boy sweetheart boy blood boy honey
~100M word and phrase pairs
Natural Logic
to capture these diverse relations.
couch !
sofa
crow
bird
unableable
feline
carni vore
canine
hippo food
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
equivalence
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
forward entailment
reverse entailment
negation
alternation
independence no pathhypernymy
hyponymy shared hypernym
antonymsynonym
Thai
Asian
Figure 2: Mappings of set-theorhetic entailment relationsonto the WordNet hierarchy. In the Venn diagrams, re-produced from (MacCartney, 2009), the left circle repre-sents states in which x is true and the right circle states inwhich y is true. The shaded area represents all possibletrue states. E.g. when x ⌘ y (equivalence), in every truestate, either both x and y are true, or neither is.
3 Entailment Relations
We use the relations from Bill MacCartney’s the-sis on natural language inference as the basis forour categorization of relations (MacCartney, 2009).MacCartney’s work focused on integrating the se-mantic properties previously employed by systemsfor question answering (Harabagiu and Hickl, 2006)and RTE (Bar-Haim et al., 2007) within the formaltheory of natural logic (Lakoff, 1972). As a result,he provides a simple framework which models lexi-cal entailment in 7 “basic entailment relationships”:
Equivalence (⌘): if X is true then Y is true, andif Y is true then X is true.
Forward entailment (@): if X is true then Y istrue, but if Y is true then X may or may not be true.
Reverse entailment (A): if Y is true then X istrue, but if X is true then Y may or may not be true.
Negation (^): if X is true then Y is false, and ifY is false then X is true; either X or Y must be true.
Alternation (|): if X is true then Y is false, but ifY is false then X may or may not be true.
Cover (^): if X is true then Y may or may not betrue, and if Y is true then X may or may not be true;
either X or Y must be true. We omit this relation,since its applicability to RTE is not clear.
Independence (#): if X is true then Y may ormay not be true, and if Y is true then X may or maynot be true.
4 Extracting entailment relations fromWordNet
4.1 Mapping onto Natural Logic relations
We would like to train a model to automatically dis-tinguish between the relationships described above.In order to gather labelled training data, we first lookto the information available in the existing Word-Net hierarchy. For roughly 2.5 million (60%) of thenoun pairs in PPDB, both nouns appear in WordNet(although not necessarily in the same synset). Weuse the rules in Table 2 (shown graphically in Fig-ure 2) to map a pair of nodes in the WordNet nounhierarchy onto one of the basic entailment relationsdescribed in section 3.
Other Relatedness In addition to MacCartney’srelations, we define a sixth catch-all category forterms which are flagged as related by WordNet butwhose relation is not built into the hierarchical struc-ture. These noun pairs do not meet the criteria of thebasic entailment relations but carry more informa-tion than do truly independent terms. We combineholonymy (part/whole relationships), attributes (ad-jectives closely tied to a specific noun), and deriva-tionally related terms as “other” relations.
4.2 Shortcomings of WordNet labeling
Our definitions give the desired results for the ⌘,A, @, and “other” categories. Table 1 shows someexamples of nouns in PPDB which were assignedto each of these labels. However, whereas Mac-Cartney’s alternation is strictly contradictory (X !¬Y ), co-hyponyms of a common parent in WordNetdo not necessarily have this property. Some exam-ples behave well (e.g. lunch and dinner share theparent meal), but others are better labelled as @ or #(e.g. boy and guy share the parent man and brotherand worker share the parent person). As a result,the WordNet alternation provides no information interms of entailment, behaving instead like MacCart-ney’s independence (i.e. “if X is true then Y may or
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Natural Logic
to capture these diverse relations.
couch !
sofa
crow
bird
unableable
feline
carni vore
canine
hippo food
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
equivalence
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
forward entailment
reverse entailment
negation
alternation
independence no pathhypernymy
hyponymy shared hypernym
antonymsynonym
Thai
Asian
Figure 2: Mappings of set-theorhetic entailment relationsonto the WordNet hierarchy. In the Venn diagrams, re-produced from (MacCartney, 2009), the left circle repre-sents states in which x is true and the right circle states inwhich y is true. The shaded area represents all possibletrue states. E.g. when x ⌘ y (equivalence), in every truestate, either both x and y are true, or neither is.
3 Entailment Relations
We use the relations from Bill MacCartney’s the-sis on natural language inference as the basis forour categorization of relations (MacCartney, 2009).MacCartney’s work focused on integrating the se-mantic properties previously employed by systemsfor question answering (Harabagiu and Hickl, 2006)and RTE (Bar-Haim et al., 2007) within the formaltheory of natural logic (Lakoff, 1972). As a result,he provides a simple framework which models lexi-cal entailment in 7 “basic entailment relationships”:
Equivalence (⌘): if X is true then Y is true, andif Y is true then X is true.
Forward entailment (@): if X is true then Y istrue, but if Y is true then X may or may not be true.
Reverse entailment (A): if Y is true then X istrue, but if X is true then Y may or may not be true.
Negation (^): if X is true then Y is false, and ifY is false then X is true; either X or Y must be true.
Alternation (|): if X is true then Y is false, but ifY is false then X may or may not be true.
Cover (^): if X is true then Y may or may not betrue, and if Y is true then X may or may not be true;
either X or Y must be true. We omit this relation,since its applicability to RTE is not clear.
Independence (#): if X is true then Y may ormay not be true, and if Y is true then X may or maynot be true.
4 Extracting entailment relations fromWordNet
4.1 Mapping onto Natural Logic relations
We would like to train a model to automatically dis-tinguish between the relationships described above.In order to gather labelled training data, we first lookto the information available in the existing Word-Net hierarchy. For roughly 2.5 million (60%) of thenoun pairs in PPDB, both nouns appear in WordNet(although not necessarily in the same synset). Weuse the rules in Table 2 (shown graphically in Fig-ure 2) to map a pair of nodes in the WordNet nounhierarchy onto one of the basic entailment relationsdescribed in section 3.
Other Relatedness In addition to MacCartney’srelations, we define a sixth catch-all category forterms which are flagged as related by WordNet butwhose relation is not built into the hierarchical struc-ture. These noun pairs do not meet the criteria of thebasic entailment relations but carry more informa-tion than do truly independent terms. We combineholonymy (part/whole relationships), attributes (ad-jectives closely tied to a specific noun), and deriva-tionally related terms as “other” relations.
4.2 Shortcomings of WordNet labeling
Our definitions give the desired results for the ⌘,A, @, and “other” categories. Table 1 shows someexamples of nouns in PPDB which were assignedto each of these labels. However, whereas Mac-Cartney’s alternation is strictly contradictory (X !¬Y ), co-hyponyms of a common parent in WordNetdo not necessarily have this property. Some exam-ples behave well (e.g. lunch and dinner share theparent meal), but others are better labelled as @ or #(e.g. boy and guy share the parent man and brotherand worker share the parent person). As a result,the WordNet alternation provides no information interms of entailment, behaving instead like MacCart-ney’s independence (i.e. “if X is true then Y may or
Equivalence
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Natural Logic
to capture these diverse relations.
couch !
sofa
crow
bird
unableable
feline
carni vore
canine
hippo food
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
equivalence
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
forward entailment
reverse entailment
negation
alternation
independence no pathhypernymy
hyponymy shared hypernym
antonymsynonym
Thai
Asian
Figure 2: Mappings of set-theorhetic entailment relationsonto the WordNet hierarchy. In the Venn diagrams, re-produced from (MacCartney, 2009), the left circle repre-sents states in which x is true and the right circle states inwhich y is true. The shaded area represents all possibletrue states. E.g. when x ⌘ y (equivalence), in every truestate, either both x and y are true, or neither is.
3 Entailment Relations
We use the relations from Bill MacCartney’s the-sis on natural language inference as the basis forour categorization of relations (MacCartney, 2009).MacCartney’s work focused on integrating the se-mantic properties previously employed by systemsfor question answering (Harabagiu and Hickl, 2006)and RTE (Bar-Haim et al., 2007) within the formaltheory of natural logic (Lakoff, 1972). As a result,he provides a simple framework which models lexi-cal entailment in 7 “basic entailment relationships”:
Equivalence (⌘): if X is true then Y is true, andif Y is true then X is true.
Forward entailment (@): if X is true then Y istrue, but if Y is true then X may or may not be true.
Reverse entailment (A): if Y is true then X istrue, but if X is true then Y may or may not be true.
Negation (^): if X is true then Y is false, and ifY is false then X is true; either X or Y must be true.
Alternation (|): if X is true then Y is false, but ifY is false then X may or may not be true.
Cover (^): if X is true then Y may or may not betrue, and if Y is true then X may or may not be true;
either X or Y must be true. We omit this relation,since its applicability to RTE is not clear.
Independence (#): if X is true then Y may ormay not be true, and if Y is true then X may or maynot be true.
4 Extracting entailment relations fromWordNet
4.1 Mapping onto Natural Logic relations
We would like to train a model to automatically dis-tinguish between the relationships described above.In order to gather labelled training data, we first lookto the information available in the existing Word-Net hierarchy. For roughly 2.5 million (60%) of thenoun pairs in PPDB, both nouns appear in WordNet(although not necessarily in the same synset). Weuse the rules in Table 2 (shown graphically in Fig-ure 2) to map a pair of nodes in the WordNet nounhierarchy onto one of the basic entailment relationsdescribed in section 3.
Other Relatedness In addition to MacCartney’srelations, we define a sixth catch-all category forterms which are flagged as related by WordNet butwhose relation is not built into the hierarchical struc-ture. These noun pairs do not meet the criteria of thebasic entailment relations but carry more informa-tion than do truly independent terms. We combineholonymy (part/whole relationships), attributes (ad-jectives closely tied to a specific noun), and deriva-tionally related terms as “other” relations.
4.2 Shortcomings of WordNet labeling
Our definitions give the desired results for the ⌘,A, @, and “other” categories. Table 1 shows someexamples of nouns in PPDB which were assignedto each of these labels. However, whereas Mac-Cartney’s alternation is strictly contradictory (X !¬Y ), co-hyponyms of a common parent in WordNetdo not necessarily have this property. Some exam-ples behave well (e.g. lunch and dinner share theparent meal), but others are better labelled as @ or #(e.g. boy and guy share the parent man and brotherand worker share the parent person). As a result,the WordNet alternation provides no information interms of entailment, behaving instead like MacCart-ney’s independence (i.e. “if X is true then Y may or
Equivalence
Entailment
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Natural Logic
to capture these diverse relations.
couch !
sofa
crow
bird
unableable
feline
carni vore
canine
hippo food
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
equivalence
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
forward entailment
reverse entailment
negation
alternation
independence no pathhypernymy
hyponymy shared hypernym
antonymsynonym
Thai
Asian
Figure 2: Mappings of set-theorhetic entailment relationsonto the WordNet hierarchy. In the Venn diagrams, re-produced from (MacCartney, 2009), the left circle repre-sents states in which x is true and the right circle states inwhich y is true. The shaded area represents all possibletrue states. E.g. when x ⌘ y (equivalence), in every truestate, either both x and y are true, or neither is.
3 Entailment Relations
We use the relations from Bill MacCartney’s the-sis on natural language inference as the basis forour categorization of relations (MacCartney, 2009).MacCartney’s work focused on integrating the se-mantic properties previously employed by systemsfor question answering (Harabagiu and Hickl, 2006)and RTE (Bar-Haim et al., 2007) within the formaltheory of natural logic (Lakoff, 1972). As a result,he provides a simple framework which models lexi-cal entailment in 7 “basic entailment relationships”:
Equivalence (⌘): if X is true then Y is true, andif Y is true then X is true.
Forward entailment (@): if X is true then Y istrue, but if Y is true then X may or may not be true.
Reverse entailment (A): if Y is true then X istrue, but if X is true then Y may or may not be true.
Negation (^): if X is true then Y is false, and ifY is false then X is true; either X or Y must be true.
Alternation (|): if X is true then Y is false, but ifY is false then X may or may not be true.
Cover (^): if X is true then Y may or may not betrue, and if Y is true then X may or may not be true;
either X or Y must be true. We omit this relation,since its applicability to RTE is not clear.
Independence (#): if X is true then Y may ormay not be true, and if Y is true then X may or maynot be true.
4 Extracting entailment relations fromWordNet
4.1 Mapping onto Natural Logic relations
We would like to train a model to automatically dis-tinguish between the relationships described above.In order to gather labelled training data, we first lookto the information available in the existing Word-Net hierarchy. For roughly 2.5 million (60%) of thenoun pairs in PPDB, both nouns appear in WordNet(although not necessarily in the same synset). Weuse the rules in Table 2 (shown graphically in Fig-ure 2) to map a pair of nodes in the WordNet nounhierarchy onto one of the basic entailment relationsdescribed in section 3.
Other Relatedness In addition to MacCartney’srelations, we define a sixth catch-all category forterms which are flagged as related by WordNet butwhose relation is not built into the hierarchical struc-ture. These noun pairs do not meet the criteria of thebasic entailment relations but carry more informa-tion than do truly independent terms. We combineholonymy (part/whole relationships), attributes (ad-jectives closely tied to a specific noun), and deriva-tionally related terms as “other” relations.
4.2 Shortcomings of WordNet labeling
Our definitions give the desired results for the ⌘,A, @, and “other” categories. Table 1 shows someexamples of nouns in PPDB which were assignedto each of these labels. However, whereas Mac-Cartney’s alternation is strictly contradictory (X !¬Y ), co-hyponyms of a common parent in WordNetdo not necessarily have this property. Some exam-ples behave well (e.g. lunch and dinner share theparent meal), but others are better labelled as @ or #(e.g. boy and guy share the parent man and brotherand worker share the parent person). As a result,the WordNet alternation provides no information interms of entailment, behaving instead like MacCart-ney’s independence (i.e. “if X is true then Y may or
Entailment
ExclusionEquivalence
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Natural Logic
to capture these diverse relations.
couch !
sofa
crow
bird
unableable
feline
carni vore
canine
hippo food
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
equivalence
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
CHAPTER 5. ENTAILMENT RELATIONS 71
R0000 R0001 R0010 R0011
R0100 R0101 R0111
R1000 R1010 R1011
R1100 R1101 R1110 R1111
R0110
R1001
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Eachbox represents the universe U , and the two circles within the box represent the setsx and y. A region is white if it is empty, and shaded if it is non-empty. Thus in thediagram labeled R1101, only the region x� y is empty, indicating that � � x � y � U .
equivalence class in which only partition 10 is empty.) These equivalence classes aredepicted graphically in figure 5.2.
In fact, each of these equivalence classes is a set relation, that is, a set of orderedpairs of sets. We will refer to these 16 set relations as the elementary set relations,and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation inR) and mutually exclusive (no ordered pair of sets belongs to two different relationsin R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
forward entailment
reverse entailment
negation
alternation
independence no pathhypernymy
hyponymy shared hypernym
antonymsynonym
Thai
Asian
Figure 2: Mappings of set-theorhetic entailment relationsonto the WordNet hierarchy. In the Venn diagrams, re-produced from (MacCartney, 2009), the left circle repre-sents states in which x is true and the right circle states inwhich y is true. The shaded area represents all possibletrue states. E.g. when x ⌘ y (equivalence), in every truestate, either both x and y are true, or neither is.
3 Entailment Relations
We use the relations from Bill MacCartney’s the-sis on natural language inference as the basis forour categorization of relations (MacCartney, 2009).MacCartney’s work focused on integrating the se-mantic properties previously employed by systemsfor question answering (Harabagiu and Hickl, 2006)and RTE (Bar-Haim et al., 2007) within the formaltheory of natural logic (Lakoff, 1972). As a result,he provides a simple framework which models lexi-cal entailment in 7 “basic entailment relationships”:
Equivalence (⌘): if X is true then Y is true, andif Y is true then X is true.
Forward entailment (@): if X is true then Y istrue, but if Y is true then X may or may not be true.
Reverse entailment (A): if Y is true then X istrue, but if X is true then Y may or may not be true.
Negation (^): if X is true then Y is false, and ifY is false then X is true; either X or Y must be true.
Alternation (|): if X is true then Y is false, but ifY is false then X may or may not be true.
Cover (^): if X is true then Y may or may not betrue, and if Y is true then X may or may not be true;
either X or Y must be true. We omit this relation,since its applicability to RTE is not clear.
Independence (#): if X is true then Y may ormay not be true, and if Y is true then X may or maynot be true.
4 Extracting entailment relations fromWordNet
4.1 Mapping onto Natural Logic relations
We would like to train a model to automatically dis-tinguish between the relationships described above.In order to gather labelled training data, we first lookto the information available in the existing Word-Net hierarchy. For roughly 2.5 million (60%) of thenoun pairs in PPDB, both nouns appear in WordNet(although not necessarily in the same synset). Weuse the rules in Table 2 (shown graphically in Fig-ure 2) to map a pair of nodes in the WordNet nounhierarchy onto one of the basic entailment relationsdescribed in section 3.
Other Relatedness In addition to MacCartney’srelations, we define a sixth catch-all category forterms which are flagged as related by WordNet butwhose relation is not built into the hierarchical struc-ture. These noun pairs do not meet the criteria of thebasic entailment relations but carry more informa-tion than do truly independent terms. We combineholonymy (part/whole relationships), attributes (ad-jectives closely tied to a specific noun), and deriva-tionally related terms as “other” relations.
4.2 Shortcomings of WordNet labeling
Our definitions give the desired results for the ⌘,A, @, and “other” categories. Table 1 shows someexamples of nouns in PPDB which were assignedto each of these labels. However, whereas Mac-Cartney’s alternation is strictly contradictory (X !¬Y ), co-hyponyms of a common parent in WordNetdo not necessarily have this property. Some exam-ples behave well (e.g. lunch and dinner share theparent meal), but others are better labelled as @ or #(e.g. boy and guy share the parent man and brotherand worker share the parent person). As a result,the WordNet alternation provides no information interms of entailment, behaving instead like MacCart-ney’s independence (i.e. “if X is true then Y may or
Independent
Entailment
ExclusionEquivalence
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Monolingual Featuressymmetric and asymmetric similarities based on
dependency context
cosine:0.431 lin:0.544 balprec:0.396 weeds:0.289…
1 1 1 1 1 0 0 1 0 0 1 01 0 1 0 1 1 1 1 1 1 1 1kid
little girl nsubj-draw
lost-dep
nsubj-vomit
poss-shoe
dep-brother
nsubjpass-buckled
nsubj-yell
rcmod-tired
…
Discovery of Inference Rules from Text. (Lin and Pantel SIGKDD 2001)
Monolingual Features
…if this can happen to my little girl , it can happen to other kids…
advcl
nmodnmod
[X]<-nmod-[happen]-advcl->[happen]-nmod->[Y]:1 [X]<-conj-[Y]:1 [X]<-pobj-[as]<-prep-[know]<-rcmod-[Y]:1
lexico-syntactic patterns
Automatic acquisition of hyponyms from large text corpora. (Hearst COLING 1992) Semantic taxonomy induction from heterogenous evidence. (Snow et al. ACL 2006)
Bilingual Features
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
Paraphrasing with bilingual parallel corpora. (Bannard and Callison-Burch ACL 2005) PPDB: The paraphrase database. (Ganitkevitch et al. NAACL 2013)
Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
equiv.entail. excl. other indep.Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
Monolingual features
only
Bilingual features
only
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
equiv.entail. excl. other indep.Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
Monolingual features
only
Bilingual features
only
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
equiv.entail. excl. other indep.Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
Monolingual features
only
Bilingual features
only
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
equiv. entail. excl. other indep.
Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
equiv.
entail.
excl.
other
indep.
Table 1
mono Predicted label (using monolingual features)
Predicted label (using bilingual features)
Predicted label (using all features)
ind syn hyp exl oth ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~ ≣ ⊐ ¬ # ~syn 1 3 1 0 0 4 ≣ 58% 20% 4% 15% 3% 62% 21% 5% 4% 8% 83% 10% 0% 2% 4%
hyp 2 3 7 0 1 13 ⊐ 20% 51% 3% 18% 7% 27% 5% 7% 7% 54% 6% 76% 2% 7% 8%
exl 1 1 1 1 0 4 ¬ 26% 14% 37% 17% 6% 6% 14% 30% 36% 14% 2% 8% 73% 13% 3%
ind 14 2 2 0 1 20 # 8% 13% 2% 71% 6% 1% 7% 6% 78% 8% 1% 4% 2% 88% 6%
oth 3 1 2 0 2 10 ~ 15% 21% 5% 36% 23% 8% 19% 9% 30% 35% 5% 10% 3% 18% 64%
bi
ind syn hyp exl oth
syn 0 3 1 0 0 4
hyp 2 7 1 2 13 24
exl 1 0 1 1 1 4
ind 15 0 1 1 2 20
oth 3 1 2 1 3 10
both
ind syn hyp exl oth
syn 9 368 46 1 19 443
hyp 97 83 1004 29 108 1321
exl 49 9 29 275 13 375
ind 1730 15 82 35 114 1976
oth 169 48 97 33 609 956
True
labe
l
Figure 4: Confusion matrices for classifier trained using only monolingual features (distributional and path) versus bilingualfeatures (paraphrase and translation). True labels are shown along rows, predicted along columns. The matrix is normalizedalong rows, so that the predictions for each (true) class sum to 100%.
� F1 when excludingAll Lex. Dist. Path Para. Tran. WN
# 79 -2.0 -0.2 -1.2 -1.7 -0.2 -0.1⌘ 57 -3.5 +0.2 -0.7 -2.4 -3.7 +0.5A 68 -4.6 -0.3 -0.8 -0.8 -0.7 -1.6¬ 49 -4.0 -0.8 -2.9 +0.3 -0.0 -2.2⇠ 51 -4.9 -0.5 -0.7 -1.2 -0.9 -0.3
Table 5: F1 measure (⇥100) achieved by entailment classifierusing 10-fold cross validation on the training data.
Table 3 shines some light onto the differencesbetween monolingual and bilingual similarities.While the monolingual asymmetric metrics aregood for identifying A pairs, the symmetric met-rics consistently identify ¬ pairs; none of themonolingual scores we explored were effectivein making the subtle distinction between ⌘ pairsand the other types of paraphrase. In contrast,the bilingual similarity metric is fairly precisefor identifying ⌘ pairs, but provides less infor-mation for distinguishing between types of non-equivalent paraphrase. These differences are fur-ther exhibited in the confusion matrices shown inFigure 4; when the classifier is trained using onlymonolingual features, it misclassifies 26% of ¬pairs as ⌘, whereas the bilingual features makethis error only 6% of the time. On the other hand,the bilingual features completely fail to predict theA class, calling over 80% of such pairs ⌘ or ⇠.
7 Evaluation
7.1 Intrinsic Evaluation
We test the performance of our classifier intrinsi-cally, through its ability to reproduce the humanlabels for the phrase pairs from the SICK test sen-tences. Table 7 shows the precision and recallachieved by the classifier for each of our 5 en-tailment classes. The classifier is able to achieve
an overall 79% accuracy, reaching >70% preci-sion while maintaining good levels of recall on allclasses.
True Pred. N Example misclassifications⇠ # 169 boy/little, an empy/the air# ⇠ 114 little/toy, color/hairA ⇠ 108 drink/juice, ocean/surfA # 97 in front of/the face of, vehicle/horseA ⌘ 83 cat/kitten, pavement/sidewalk⌘ A 46 big/grand, a girl/a young ladyA ¬ 29 kid/teenager, no small/a large¬ A 29 old man/young man, a car/a window# ⌘ 15 a person/one, a crowd/a large⌘ # 9 he is/man is, photo/still⌘ ¬ 1 girl is/she is
Table 6: Example misclassifications from some of the mostfrequent and most interesting error categories.
Figure 4 shows the classifier’s confusion ma-trix and Table 6 shows some examples of commonand interesting error cases. The majority of errors(26%) come from confusing the ⇠ class with the# class. This mistake is not too concerning froman RTE perspective since ⇠ can be treated as aspecial case of # (Section 5). There are very fewcases in which the classifier makes extreme errors,e.g. confusing ⌘ with ¬ or with #; some interest-ing examples of such errors arise when the phrasescontain pronouns (e.g. girl ⌘ she) or when therelation uses a highly infrequent word sense (e.g.photo ⌘ still).
7.2 The Nutcracker RTE System
To further test our classifier, we evaluate the use-fulness of the automatic entailment predictions ina downstream RTE task. We run our experimentsusing Nutcracker, a state-of-the-art RTE systembased on formal semantics (Bjerva et al., 2014).In the SemEval 2014 RTE challenge, this system
All features
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Predicting entailment relations
Equivalent Entailment Exclusion Other Independent
look at/watch little girl/girl close/open swim/water girl/play
a person/someone
kuwait/country
minimal/significant
husband/marry found/party
clean/cleanse
tower/building
boy/young girl oil/oil price man/talk
distant/remote
sneaker/footwear
nobody/someone
country/patriotic profit/year
phone/telephone heroin/drug blue/green drive/
vehicleholiday/series
last autumn/last fall
typhoon/storm
france/germany playing/toy city/south
Adding Semantics to Data-Driven Paraphrasing. (Pavlick et al. ACL 2015)
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
modifiers
INS(environmental)
Denotational Semantics
consequences
Denotational Semantics
consequences
environmental consequences
Denotational Semantics
consequences
environmental consequences
irreversible environmental consequences
Denotational Semantics
consequences
environmental consequences
Does environmental consequence entail consequence?
Denotational Semantics
consequences
environmental consequences
Does consequence entail environmental consequence?
Natural Language Inference
Natural Language Inference. (MacCartney PhD Thesis 2009)
No man is pointing at a car.
A man is pointing at a silver sedan.
A man is pointing at a silver car.
A man is pointing at a car.
SUB(sedan, car) ⊏DEL(silver) ⊏
yes
yes
Natural Language Inference
Natural Language Inference. (MacCartney PhD Thesis 2009)
No man is pointing at a car.
A man is pointing at a silver sedan.
A man is pointing at a silver car.
A man is pointing at a car.
SUB(sedan, car) ⊏DEL(silver) ⊏
yes
yes
From image descriptions to visual denotations. (Young et al. TACL 2014)
Entailment above the word level in distributional semantics. (Baroni et al. EACL 2012)
BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment (Stern and Dagan. ACL 2012)
Denotational Semantics
consequences
environmental consequences
Does consequence entail environmental consequence? ✘
The policy will have consequences.
Denotational Semantics
consequences
environmental consequences
Does consequence entail environmental consequence?
The contaminated land will have consequences.
✔
Natural Logic Entailment Relations
N does not entail AN N does entail AN
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Natural Logic Entailment Relations
N does not entail AN N does entail AN
Forward Entailment (AN ⊏ N)
ANbrown dog
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Natural Logic Entailment Relations
N does not entail AN N does entail AN
Forward Entailment (AN ⊏ N)
ANbrown dog Equivalence (AN ≣ N)
ANentire world
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Natural Logic Entailment Relations
N does not entail AN N does entail AN
N AN
Alternation (N | AN)
Forward Entailment (AN ⊏ N)
AN
Independent (N # AN)
N AN
Equivalence (AN ≣ N)
AN
Reverse Entailment (AN ⊐ N)
N
alleged criminal
former senator
brown dog
possible winner
entire world
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Experimental Design
Wellers hopes the financial system will be fully operational by 2015 .
Wellers hopes the system will be fully operational by 2015 .
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Experimental Design
Wellers hopes the financial system will be fully operational by 2015 .
Wellers hopes the system will be fully operational by 2015 .
Contradiction Entailment Can’t Tell
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Experimental Design
Wellers hopes the system will be fully operational by 2015 .
Wellers hopes the financial system will be fully operational by 2015 .
Contradiction Entailment Can’t Tell
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Experimental Design
Wellers hopes the system will be fully operational by 2015 .
Wellers hopes the financial system will be fully operational by 2015 .
Contradiction Entailment Can’t Tell
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
~200 human annotators
~5,000 sentences
4 genres
Empirical Analysis
Image Captions
3%3%8%
85%
AN < N AN = N AN # N AN > N AN | N None
Literature
5%4%
29%61%
News
7%7%
26% 57%
Forums
7%7%
37%
47%
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Empirical Analysis
Image Captions
3%3%8%
85%
AN < N AN = N AN # N AN > N AN | N None
Literature
5%4%
29%61%
News
7%7%
26% 57%
Forums
7%7%
37%
47%
Up to 53% error rate by assuming all adjectives are restrictive.
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
When does N entail AN?
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
(probably sandy) (probably not sandy)
When does N entail AN?Sometimes, it is a property of the adjective and noun…
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
When does N entail AN?…or even just the adjective.
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
The [deadly] attack killed at least 12 civilians.
The [entire] bill is now subject to approval by the parliament.
Equivalence AN
When does N entail AN?Other times, it is a property of the context+word knowledge.
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Red numbers spelled out their [perfect] record: 9-2.
Schilling even stayed busy after serving Epstein turkey at his [former] home on Thursday.
Alternation N AN
When does N entail AN?NOT
^
Other times, it is a property of the context+word knowledge.
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Undefined Relations
When does N entail AN?NOT
^
Other times, it is a property of the context+word knowledge.
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
No N is an AN, but every AN is an N.
Undefined Relations
When does N entail AN?NOT
^
Other times, it is a property of the context+word knowledge.
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Undefined Relations
When does N entail AN?NOT
^
Other times, it is a property of the context+word knowledge.
Bush travels Monday to Michigan to make remarks on the [Japanese] economy.
No N is an AN, but every AN is an N.
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
When does N entail AN?NOT
^
should
A dictionary of nonsubsective adjectives. (Nayak et al. Tech Report 2014)
fake former artificial
counterfeit possible probable unlikely likely
…
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
(non-subsective) modifiers
DEL(possible)
When does N entail AN?NOT
^
should
fake former artificial
counterfeit possible probable unlikely likely
…
A dictionary of nonsubsective adjectives. (Nayak et al. Tech Report 2014)
When does N entail AN?NOT
^
should
ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
She was the expected winner. She was the winner.
pred
icte
d
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
When does N entail AN?NOT
^
should
ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
(a) Privative (b) Plain Non-Sub. (c) Subsective
Figure 2: Observed entailment judgements for insertion (blue) and deletion (red) of adjectives. Compareto expected distributions in Figure 1.
to receive labels of UNKNOWN in both directions.We expect the subsective adjectives to receive la-bels of ENTAILMENT in the deletion direction (redcar ) car) and labels of UNKNOWN in the inser-tion direction (car 6) red car). Figure 1 depictsthese expected distributions.
Observations. The observed entailment patternsfor insertion and deletion of non-subsective adjec-tives are shown in Figure 2. Our control sampleof subsective adjectives (Figure 2c) largely pro-duced the expected results, with 96% of deletionsproducing ENTAILMENTs and 73% of insertionsproducing UNKNOWNs.3 The entailment patternsproduced by the non-subsective adjectives, how-ever, did not match our predictions. The plain non-subsective adjectives (e.g. alleged) behave nearlyidentically to how we expect regular, subsectiveadjectives to behave (Figure 2b). That is, in 80%of cases, deleting the plain non-subsective adjec-tive was judged to produce ENTAILMENT, ratherthan the expected UNKNOWN. The examples inTable 2 shed some light onto why this is the case.Often, the differences between N and AN are notrelevant to the main point of the utterance. For ex-ample, while an expected surge in unemploymentis not a surge in unemployment, a policy that dealswith an expected surge deals with a surge.
The privative adjectives (e.g. fake) also failto match the predicted distribution. While in-sertions often produce the expected CONTRADIC-TIONs, deletions produce a surprising number ofENTAILMENTs (Figure 2a). Such a pattern doesnot fit into any of the adjective classes from Fig-ure 1. While some ANs (e.g. counterfeit money)behave in the prototypically privative way, others
3A full discussion of the 27% of insertions that deviatedfrom the expected behavior is given in Pavlick and Callison-Burch (2016).
(1) Swiss officials on Friday said they’ve launched aninvestigation into Urs Tinner’s alleged role.
(2) To deal with an expected surge in unemployment,the plan includes a huge temporary jobs program.
(3) They kept it close for a half and had a theoreticalchance come the third quarter.
Table 2: Contrary to expectations, the deletion ofplain non-subsective adjectives often preserves the(plausible) truth in a model. E.g. alleged role 6)role, but investigation into alleged role ) investi-gation into role.
(e.g. mythical beast) have the property in whichN)¬AN, but AN)N (Figure 3). Table 3 pro-vides some telling examples of how this AN)Ninference, in the case of privative adjectives, oftendepends less on the adjective itself, and more onproperties of the modified noun that are at issue inthe given context. For example, in Table 3 Exam-ple 2(a), a mock debate probably contains enoughof the relevant properties (namely, arguments) thatit can entail debate, while in Example 2(b), a mockexecution lacks the single most important property(the death of the executee) and so cannot entail ex-ecution. (Note that, from Example 3(b), it appearsthe jury is still out on whether leaps in artificialintelligence entail leaps in intelligence...)
5 Discussion
The results presented suggest a few important pat-terns for NLP systems. First, that while a non-subsective AN might not be an instance of the N(taxonomically speaking), statements that are trueof an AN are often true of the N as well. This isrelevant for IE and QA systems, and is likely to be-come more important as NLP systems focus moreon “micro reading” tasks (Nakashole and Mitchell,2014), where facts must be inferred from singledocuments or sentences, rather than by exploiting
Actually behave like normal, subsective adjectives (e.g. red).
obse
rved
pred
icte
d
She was the expected winner. She was the winner.
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
When does N entail AN?NOT
^
should
Actually behave like normal, subsective adjectives (e.g. red).
To deal with an expected surge in unemployment, the plan includes a huge
temporary jobs program.
ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
(a) Privative (b) Plain Non-Sub. (c) Subsective
Figure 2: Observed entailment judgements for insertion (blue) and deletion (red) of adjectives. Compareto expected distributions in Figure 1.
to receive labels of UNKNOWN in both directions.We expect the subsective adjectives to receive la-bels of ENTAILMENT in the deletion direction (redcar ) car) and labels of UNKNOWN in the inser-tion direction (car 6) red car). Figure 1 depictsthese expected distributions.
Observations. The observed entailment patternsfor insertion and deletion of non-subsective adjec-tives are shown in Figure 2. Our control sampleof subsective adjectives (Figure 2c) largely pro-duced the expected results, with 96% of deletionsproducing ENTAILMENTs and 73% of insertionsproducing UNKNOWNs.3 The entailment patternsproduced by the non-subsective adjectives, how-ever, did not match our predictions. The plain non-subsective adjectives (e.g. alleged) behave nearlyidentically to how we expect regular, subsectiveadjectives to behave (Figure 2b). That is, in 80%of cases, deleting the plain non-subsective adjec-tive was judged to produce ENTAILMENT, ratherthan the expected UNKNOWN. The examples inTable 2 shed some light onto why this is the case.Often, the differences between N and AN are notrelevant to the main point of the utterance. For ex-ample, while an expected surge in unemploymentis not a surge in unemployment, a policy that dealswith an expected surge deals with a surge.
The privative adjectives (e.g. fake) also failto match the predicted distribution. While in-sertions often produce the expected CONTRADIC-TIONs, deletions produce a surprising number ofENTAILMENTs (Figure 2a). Such a pattern doesnot fit into any of the adjective classes from Fig-ure 1. While some ANs (e.g. counterfeit money)behave in the prototypically privative way, others
3A full discussion of the 27% of insertions that deviatedfrom the expected behavior is given in Pavlick and Callison-Burch (2016).
(1) Swiss officials on Friday said they’ve launched aninvestigation into Urs Tinner’s alleged role.
(2) To deal with an expected surge in unemployment,the plan includes a huge temporary jobs program.
(3) They kept it close for a half and had a theoreticalchance come the third quarter.
Table 2: Contrary to expectations, the deletion ofplain non-subsective adjectives often preserves the(plausible) truth in a model. E.g. alleged role 6)role, but investigation into alleged role ) investi-gation into role.
(e.g. mythical beast) have the property in whichN)¬AN, but AN)N (Figure 3). Table 3 pro-vides some telling examples of how this AN)Ninference, in the case of privative adjectives, oftendepends less on the adjective itself, and more onproperties of the modified noun that are at issue inthe given context. For example, in Table 3 Exam-ple 2(a), a mock debate probably contains enoughof the relevant properties (namely, arguments) thatit can entail debate, while in Example 2(b), a mockexecution lacks the single most important property(the death of the executee) and so cannot entail ex-ecution. (Note that, from Example 3(b), it appearsthe jury is still out on whether leaps in artificialintelligence entail leaps in intelligence...)
5 Discussion
The results presented suggest a few important pat-terns for NLP systems. First, that while a non-subsective AN might not be an instance of the N(taxonomically speaking), statements that are trueof an AN are often true of the N as well. This isrelevant for IE and QA systems, and is likely to be-come more important as NLP systems focus moreon “micro reading” tasks (Nakashole and Mitchell,2014), where facts must be inferred from singledocuments or sentences, rather than by exploiting
obse
rved
pred
icte
d
She was the expected winner. She was the winner.
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
When does N entail AN?NOT
^
should
She was carrying a fake gun. She was carrying a gun.ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
pred
icte
d
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
When does N entail AN?NOT
^
should
She was carrying a fake gun. She was carrying a gun.
Don’t behave symmetrically.
ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
(a) Privative (b) Plain Non-Sub. (c) Subsective
Figure 2: Observed entailment judgements for insertion (blue) and deletion (red) of adjectives. Compareto expected distributions in Figure 1.
to receive labels of UNKNOWN in both directions.We expect the subsective adjectives to receive la-bels of ENTAILMENT in the deletion direction (redcar ) car) and labels of UNKNOWN in the inser-tion direction (car 6) red car). Figure 1 depictsthese expected distributions.
Observations. The observed entailment patternsfor insertion and deletion of non-subsective adjec-tives are shown in Figure 2. Our control sampleof subsective adjectives (Figure 2c) largely pro-duced the expected results, with 96% of deletionsproducing ENTAILMENTs and 73% of insertionsproducing UNKNOWNs.3 The entailment patternsproduced by the non-subsective adjectives, how-ever, did not match our predictions. The plain non-subsective adjectives (e.g. alleged) behave nearlyidentically to how we expect regular, subsectiveadjectives to behave (Figure 2b). That is, in 80%of cases, deleting the plain non-subsective adjec-tive was judged to produce ENTAILMENT, ratherthan the expected UNKNOWN. The examples inTable 2 shed some light onto why this is the case.Often, the differences between N and AN are notrelevant to the main point of the utterance. For ex-ample, while an expected surge in unemploymentis not a surge in unemployment, a policy that dealswith an expected surge deals with a surge.
The privative adjectives (e.g. fake) also failto match the predicted distribution. While in-sertions often produce the expected CONTRADIC-TIONs, deletions produce a surprising number ofENTAILMENTs (Figure 2a). Such a pattern doesnot fit into any of the adjective classes from Fig-ure 1. While some ANs (e.g. counterfeit money)behave in the prototypically privative way, others
3A full discussion of the 27% of insertions that deviatedfrom the expected behavior is given in Pavlick and Callison-Burch (2016).
(1) Swiss officials on Friday said they’ve launched aninvestigation into Urs Tinner’s alleged role.
(2) To deal with an expected surge in unemployment,the plan includes a huge temporary jobs program.
(3) They kept it close for a half and had a theoreticalchance come the third quarter.
Table 2: Contrary to expectations, the deletion ofplain non-subsective adjectives often preserves the(plausible) truth in a model. E.g. alleged role 6)role, but investigation into alleged role ) investi-gation into role.
(e.g. mythical beast) have the property in whichN)¬AN, but AN)N (Figure 3). Table 3 pro-vides some telling examples of how this AN)Ninference, in the case of privative adjectives, oftendepends less on the adjective itself, and more onproperties of the modified noun that are at issue inthe given context. For example, in Table 3 Exam-ple 2(a), a mock debate probably contains enoughof the relevant properties (namely, arguments) thatit can entail debate, while in Example 2(b), a mockexecution lacks the single most important property(the death of the executee) and so cannot entail ex-ecution. (Note that, from Example 3(b), it appearsthe jury is still out on whether leaps in artificialintelligence entail leaps in intelligence...)
5 Discussion
The results presented suggest a few important pat-terns for NLP systems. First, that while a non-subsective AN might not be an instance of the N(taxonomically speaking), statements that are trueof an AN are often true of the N as well. This isrelevant for IE and QA systems, and is likely to be-come more important as NLP systems focus moreon “micro reading” tasks (Nakashole and Mitchell,2014), where facts must be inferred from singledocuments or sentences, rather than by exploiting
obse
rved
pred
icte
d
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
When does N entail AN?NOT
^
should
She was carrying a fake gun. She was carrying a gun.
Don’t behave symmetrically.
The 27-year-old Gazan seeks an id to get through security checkpoints and
find work in Cairo.
Does he seek a fake id?
ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
(a) Privative (b) Plain Non-Sub. (c) Subsective
Figure 2: Observed entailment judgements for insertion (blue) and deletion (red) of adjectives. Compareto expected distributions in Figure 1.
to receive labels of UNKNOWN in both directions.We expect the subsective adjectives to receive la-bels of ENTAILMENT in the deletion direction (redcar ) car) and labels of UNKNOWN in the inser-tion direction (car 6) red car). Figure 1 depictsthese expected distributions.
Observations. The observed entailment patternsfor insertion and deletion of non-subsective adjec-tives are shown in Figure 2. Our control sampleof subsective adjectives (Figure 2c) largely pro-duced the expected results, with 96% of deletionsproducing ENTAILMENTs and 73% of insertionsproducing UNKNOWNs.3 The entailment patternsproduced by the non-subsective adjectives, how-ever, did not match our predictions. The plain non-subsective adjectives (e.g. alleged) behave nearlyidentically to how we expect regular, subsectiveadjectives to behave (Figure 2b). That is, in 80%of cases, deleting the plain non-subsective adjec-tive was judged to produce ENTAILMENT, ratherthan the expected UNKNOWN. The examples inTable 2 shed some light onto why this is the case.Often, the differences between N and AN are notrelevant to the main point of the utterance. For ex-ample, while an expected surge in unemploymentis not a surge in unemployment, a policy that dealswith an expected surge deals with a surge.
The privative adjectives (e.g. fake) also failto match the predicted distribution. While in-sertions often produce the expected CONTRADIC-TIONs, deletions produce a surprising number ofENTAILMENTs (Figure 2a). Such a pattern doesnot fit into any of the adjective classes from Fig-ure 1. While some ANs (e.g. counterfeit money)behave in the prototypically privative way, others
3A full discussion of the 27% of insertions that deviatedfrom the expected behavior is given in Pavlick and Callison-Burch (2016).
(1) Swiss officials on Friday said they’ve launched aninvestigation into Urs Tinner’s alleged role.
(2) To deal with an expected surge in unemployment,the plan includes a huge temporary jobs program.
(3) They kept it close for a half and had a theoreticalchance come the third quarter.
Table 2: Contrary to expectations, the deletion ofplain non-subsective adjectives often preserves the(plausible) truth in a model. E.g. alleged role 6)role, but investigation into alleged role ) investi-gation into role.
(e.g. mythical beast) have the property in whichN)¬AN, but AN)N (Figure 3). Table 3 pro-vides some telling examples of how this AN)Ninference, in the case of privative adjectives, oftendepends less on the adjective itself, and more onproperties of the modified noun that are at issue inthe given context. For example, in Table 3 Exam-ple 2(a), a mock debate probably contains enoughof the relevant properties (namely, arguments) thatit can entail debate, while in Example 2(b), a mockexecution lacks the single most important property(the death of the executee) and so cannot entail ex-ecution. (Note that, from Example 3(b), it appearsthe jury is still out on whether leaps in artificialintelligence entail leaps in intelligence...)
5 Discussion
The results presented suggest a few important pat-terns for NLP systems. First, that while a non-subsective AN might not be an instance of the N(taxonomically speaking), statements that are trueof an AN are often true of the N as well. This isrelevant for IE and QA systems, and is likely to be-come more important as NLP systems focus moreon “micro reading” tasks (Nakashole and Mitchell,2014), where facts must be inferred from singledocuments or sentences, rather than by exploiting
obse
rved
pred
icte
d
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
When does N entail AN?NOT
^
should
She was carrying a fake gun. She was carrying a gun.
Don’t behave symmetrically.
The 27-year-old Gazan seeks an id to get through security checkpoints and
find work in Cairo.
Does he seek a fake id? ✘
ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
(a) Privative (b) Plain Non-Sub. (c) Subsective
Figure 2: Observed entailment judgements for insertion (blue) and deletion (red) of adjectives. Compareto expected distributions in Figure 1.
to receive labels of UNKNOWN in both directions.We expect the subsective adjectives to receive la-bels of ENTAILMENT in the deletion direction (redcar ) car) and labels of UNKNOWN in the inser-tion direction (car 6) red car). Figure 1 depictsthese expected distributions.
Observations. The observed entailment patternsfor insertion and deletion of non-subsective adjec-tives are shown in Figure 2. Our control sampleof subsective adjectives (Figure 2c) largely pro-duced the expected results, with 96% of deletionsproducing ENTAILMENTs and 73% of insertionsproducing UNKNOWNs.3 The entailment patternsproduced by the non-subsective adjectives, how-ever, did not match our predictions. The plain non-subsective adjectives (e.g. alleged) behave nearlyidentically to how we expect regular, subsectiveadjectives to behave (Figure 2b). That is, in 80%of cases, deleting the plain non-subsective adjec-tive was judged to produce ENTAILMENT, ratherthan the expected UNKNOWN. The examples inTable 2 shed some light onto why this is the case.Often, the differences between N and AN are notrelevant to the main point of the utterance. For ex-ample, while an expected surge in unemploymentis not a surge in unemployment, a policy that dealswith an expected surge deals with a surge.
The privative adjectives (e.g. fake) also failto match the predicted distribution. While in-sertions often produce the expected CONTRADIC-TIONs, deletions produce a surprising number ofENTAILMENTs (Figure 2a). Such a pattern doesnot fit into any of the adjective classes from Fig-ure 1. While some ANs (e.g. counterfeit money)behave in the prototypically privative way, others
3A full discussion of the 27% of insertions that deviatedfrom the expected behavior is given in Pavlick and Callison-Burch (2016).
(1) Swiss officials on Friday said they’ve launched aninvestigation into Urs Tinner’s alleged role.
(2) To deal with an expected surge in unemployment,the plan includes a huge temporary jobs program.
(3) They kept it close for a half and had a theoreticalchance come the third quarter.
Table 2: Contrary to expectations, the deletion ofplain non-subsective adjectives often preserves the(plausible) truth in a model. E.g. alleged role 6)role, but investigation into alleged role ) investi-gation into role.
(e.g. mythical beast) have the property in whichN)¬AN, but AN)N (Figure 3). Table 3 pro-vides some telling examples of how this AN)Ninference, in the case of privative adjectives, oftendepends less on the adjective itself, and more onproperties of the modified noun that are at issue inthe given context. For example, in Table 3 Exam-ple 2(a), a mock debate probably contains enoughof the relevant properties (namely, arguments) thatit can entail debate, while in Example 2(b), a mockexecution lacks the single most important property(the death of the executee) and so cannot entail ex-ecution. (Note that, from Example 3(b), it appearsthe jury is still out on whether leaps in artificialintelligence entail leaps in intelligence...)
5 Discussion
The results presented suggest a few important pat-terns for NLP systems. First, that while a non-subsective AN might not be an instance of the N(taxonomically speaking), statements that are trueof an AN are often true of the N as well. This isrelevant for IE and QA systems, and is likely to be-come more important as NLP systems focus moreon “micro reading” tasks (Nakashole and Mitchell,2014), where facts must be inferred from singledocuments or sentences, rather than by exploiting
obse
rved
pred
icte
d
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
When does N entail AN?NOT
^
should
She was carrying a fake gun. She was carrying a gun.
Don’t behave symmetrically.
The 27-year-old Gazan seeks a fake id to get through security checkpoints
and find work in Cairo.
Does he seek a id? ✔
ANN
Privative (e.g. fake)
Plain Non-subsective (e.g. alleged)
ANN AN
Subsective (e.g. red)
Figure 1: Three main classes of adjectives. If their entailment behavior is consistent with their theoreticaldefinitions, we would expect our annotations (Section 3) to produce the insertion (blue) and deletion(red) patterns shown by the bar graphs. Bars (left to right) represent CONTRADICTION, UNKNOWN, andENTAILMENT
While these generalizations are intuitive, thereis little experimental evidence to support them.In this paper, we collect human judgements ofthe validity of inferences following from the in-sertion and deletion of various classes of adjec-tives and analyze the results. Our findings suggestthat, in practice, most sentences involving non-subsective ANs can be safely generalized to state-ments about the N. That is, non-subsective adjec-tives often behave like normal, subsective adjec-tives. On further analysis, we reveal that, whenadjectives do behave non-subsectively, they oftenexhibit asymmetric entailment behavior in whichinsertion leads to contradictions (ID ) ¬ fake ID)but deletion leads to entailments (fake ID ) ID).We present anecdotal evidence for how the en-tailment associated with inserting/deleting a non-subsective adjective depends on the salient prop-erties of the noun phrase under discussion, ratherthan on the adjective itself.
2 Background and Related Work
Classes of Adjectives. Adjectives are com-monly classified taxonomically as either subsec-tive or non-subsective (Kamp and Partee, 1995).Subsective adjectives are adjectives which pickout a subset of the set denoted by the unmodifiednoun; that is, AN ⇢ N1. For non-subsective adjec-tives, in contrast, the AN cannot be guaranteed tobe a subset of N. For example, clever is subsective,and so a clever thief is always a thief. However,
1We use the notation N and AN to refer both the the nat-ural language expression itself (e.g. red car) as well as itsdenotation, e.g. {x|x is a red car}.
alleged is non-subsective, so there are many pos-sible worlds in which an alleged thief is not in facta thief. Of course, there may also be many possi-ble worlds in which the alleged thief is a thief, butthe word alleged, being non-subsective, does notguarantee this to hold.
Non-subsective adjectives can be further di-vided into two classes: privative and plain. Setsdenoted by privative ANs are completely disjointfrom the set denoted by the head N (AN \ N =;), and this mutual exclusivity is encoded in themeaning of the A itself. For example, fake is con-sidered to be a quintessential privative adjectivesince, given the usual definition of fake, a fake IDcan not actually be an ID. For plain non-subsectiveadjectives, there may be worlds in which the ANis and N, and worlds in which the AN is not an N:neither inference is guaranteed by the meaning ofthe A. As mentioned above, alleged is quintessen-tially plain non-subsective since, for example, analleged thief may or may not be an actual thief.In short, we can summarize the classes of adjec-tives in the following way: subsective adjectivesentail the nouns they modify, privative adjectivescontradict the nouns they modify, and plain non-subsective adjectives are compatible with (but donot entail) the nouns they modify. Figure 1 depictsthese distinctions.
While the hierarchical classification of adjec-tives described above is widely accepted and oftenapplied in NLP tasks (Amoia and Gardent, 2006;Amoia and Gardent, 2007; Boleda et al., 2012;McCrae et al., 2014), it is not undisputed. Somelinguists take the position that in fact privative ad-
(a) Privative (b) Plain Non-Sub. (c) Subsective
Figure 2: Observed entailment judgements for insertion (blue) and deletion (red) of adjectives. Compareto expected distributions in Figure 1.
to receive labels of UNKNOWN in both directions.We expect the subsective adjectives to receive la-bels of ENTAILMENT in the deletion direction (redcar ) car) and labels of UNKNOWN in the inser-tion direction (car 6) red car). Figure 1 depictsthese expected distributions.
Observations. The observed entailment patternsfor insertion and deletion of non-subsective adjec-tives are shown in Figure 2. Our control sampleof subsective adjectives (Figure 2c) largely pro-duced the expected results, with 96% of deletionsproducing ENTAILMENTs and 73% of insertionsproducing UNKNOWNs.3 The entailment patternsproduced by the non-subsective adjectives, how-ever, did not match our predictions. The plain non-subsective adjectives (e.g. alleged) behave nearlyidentically to how we expect regular, subsectiveadjectives to behave (Figure 2b). That is, in 80%of cases, deleting the plain non-subsective adjec-tive was judged to produce ENTAILMENT, ratherthan the expected UNKNOWN. The examples inTable 2 shed some light onto why this is the case.Often, the differences between N and AN are notrelevant to the main point of the utterance. For ex-ample, while an expected surge in unemploymentis not a surge in unemployment, a policy that dealswith an expected surge deals with a surge.
The privative adjectives (e.g. fake) also failto match the predicted distribution. While in-sertions often produce the expected CONTRADIC-TIONs, deletions produce a surprising number ofENTAILMENTs (Figure 2a). Such a pattern doesnot fit into any of the adjective classes from Fig-ure 1. While some ANs (e.g. counterfeit money)behave in the prototypically privative way, others
3A full discussion of the 27% of insertions that deviatedfrom the expected behavior is given in Pavlick and Callison-Burch (2016).
(1) Swiss officials on Friday said they’ve launched aninvestigation into Urs Tinner’s alleged role.
(2) To deal with an expected surge in unemployment,the plan includes a huge temporary jobs program.
(3) They kept it close for a half and had a theoreticalchance come the third quarter.
Table 2: Contrary to expectations, the deletion ofplain non-subsective adjectives often preserves the(plausible) truth in a model. E.g. alleged role 6)role, but investigation into alleged role ) investi-gation into role.
(e.g. mythical beast) have the property in whichN)¬AN, but AN)N (Figure 3). Table 3 pro-vides some telling examples of how this AN)Ninference, in the case of privative adjectives, oftendepends less on the adjective itself, and more onproperties of the modified noun that are at issue inthe given context. For example, in Table 3 Exam-ple 2(a), a mock debate probably contains enoughof the relevant properties (namely, arguments) thatit can entail debate, while in Example 2(b), a mockexecution lacks the single most important property(the death of the executee) and so cannot entail ex-ecution. (Note that, from Example 3(b), it appearsthe jury is still out on whether leaps in artificialintelligence entail leaps in intelligence...)
5 Discussion
The results presented suggest a few important pat-terns for NLP systems. First, that while a non-subsective AN might not be an instance of the N(taxonomically speaking), statements that are trueof an AN are often true of the N as well. This isrelevant for IE and QA systems, and is likely to be-come more important as NLP systems focus moreon “micro reading” tasks (Nakashole and Mitchell,2014), where facts must be inferred from singledocuments or sentences, rather than by exploiting
obse
rved
pred
icte
d
So-Called Non-Subsective Adjectives. (Pavlick and Callison-Burch *SEM 2016, Best Paper!)
Bush travels Monday to Michigan to make remarks on the economy.
Does N entail AN?
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Bush travels Monday to Michigan to make remarks on the economy.
Does N entail AN?
Bush travels Monday to Michigan to make remarks on the American economy.✔
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Bush travels Monday to Michigan to make remarks on the economy.
Does N entail AN?
Bush travels Monday to Michigan to make remarks on the American economy.
Bush travels Monday to Michigan to make remarks on the Japanese economy.✘
✔
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Bush travels Monday to Michigan to make remarks on the economy.
Does N entail AN?
Bush travels Monday to Michigan to make remarks on the American economy.
Bush travels Monday to Michigan to make remarks on the Japanese economy.✘
✔
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
5,378 naturally-occurring sentences
4,868 train / 510 test
Bush travels Monday to Michigan to make remarks on the economy.
Does N entail AN?
Bush travels Monday to Michigan to make remarks on the American economy.
Bush travels Monday to Michigan to make remarks on the Japanese economy.✘
✔
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
2,990 unique ANs
Disjoint ANs in train/test
5,378 naturally-occurring sentences
4,868 train / 510 test
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
Accu
racy
Does N entail AN?
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
93.3
Accu
racy
Does N entail AN?
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
85.393.3
Accu
racy
Does N entail AN?
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
92.285.3
93.3
Accu
racy
Does N entail AN?
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
86.68692.2
85.393.3
Accu
racy
Does N entail AN?Pretty strong baselines
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
85.386.68692.2
85.393.3
Accu
racy
Does N entail AN?Supervised Model using involved NLP pipeline
(Magnini et al. 2014)
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
86.685.386.68692.2
85.393.3
Accu
racy
Does N entail AN?Several DNN Models (Bowman et al. 2015)
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
20
36
52
68
84
100
Human Most Freq.Class
MFC by Adj.
BOW BOV MaxEnt+LR
LSTM Trans.
86.886.685.386.68692.2
85.393.3
Accu
racy
Does N entail AN?Several DNN Models, with transfer learning
(Bowman et al. 2015)
Compositional Entailment in Adjective Nouns. (Pavlick and Callison-Burch ACL 2016)
Summary• Humans are flexible with their language, they don’t abide by
hard-and-fast logical rules of composition and entailment.
• We can acquire lexical entailments at scale…
• …but lexical entailments are not enough. We need composition in order to model unseen phrases and full sentences.
• Inferences involving composition is too complex to capture using simple heuristics, and requires models to incorporate context and common sense when performing reasoning.
• Humans are flexible with their language, they don’t abide by hard-and-fast logical rules of composition and entailment.
• We can acquire lexical entailments at scale…
• …but lexical entailments are not enough. We need composition in order to model unseen phrases and full sentences.
• Inferences involving composition is too complex to capture using simple heuristics, and requires models to incorporate context and common sense when performing reasoning.
Summary
• Humans are flexible with their language, they don’t abide by hard-and-fast logical rules of composition and entailment.
• We can acquire lexical entailments at scale…
• …but lexical entailments are not enough. We need composition in order to model unseen phrases and full sentences.
• Inferences involving composition is too complex to capture using simple heuristics, and requires models to incorporate context and common sense when performing reasoning.
Summary
• Humans are flexible with their language, they don’t abide by hard-and-fast logical rules of composition and entailment.
• We can acquire lexical entailments at scale…
• …but lexical entailments are not enough. We need composition in order to model unseen phrases and full sentences.
• Inferences involving composition is too complex to capture using simple heuristics, and requires models to incorporate context and common sense when performing reasoning.
Summary
• Humans are flexible with their language, they don’t abide by hard-and-fast logical rules of composition and entailment.
• We can acquire lexical entailments at scale…
• …but lexical entailments are not enough. We need composition in order to model unseen phrases and full sentences.
• Inference involving composition is too complex to capture using simple heuristics, and requires models to incorporate context and common sense when performing reasoning.
Summary
Current Work
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
predicates
SUB(consider, consider)
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
born
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
born(in France) ⊏ born(in Europe)
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
born(in Europe) ⊐ born(in France)
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
born(in France) | born(in Germany)
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
born(in France) # born(in the city)
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
has traveled(in France) # has traveled(in Germany)
Current Work-in-progress.
Projection through Predicates
Input Output
in France ⊏ in Europe Forward
in Europe ⊐ in France Reverse
in France | in Germany Alternation
in France # in the city Independent
is banned(in France) ⊐ is banned(in Europe)
Current Work-in-progress.
Projection through Predicates
Identity Non-Exclusive, Identity
Non-Exclusive, Negation
is a has a lacks a
lives in considers avoidswas born in has visited bannedis married to likes dislikes
Last December they had argued that the council had failed to consider possible effects of contaminated land
at the site.
The council considered environmental consequences.
“higher order” predicates
DEL(fail to)
Implicative Verbs
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
She managed to fix the bug.
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
She managed to fix the bug.
She wanted to fix the bug.
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
She managed to fix the bug.
She wanted to fix the bug.
Did she fix the bug?
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
She managed to fix the bug.
She wanted to fix the bug.
✔
✘
Did she fix the bug?
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
She managed to fix the bug last night.
She managed to fix the bug tomorrow.
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
She managed to fix the bug last night.
She managed to fix the bug tomorrow.
✔
✘
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
She wanted to fix the bug last night.
She wanted to fix the bug tomorrow.
✔
✔
Implicative Verbs. (Karttunen Language 1971)
Implicative Verbs
venture to
forget to
manage to
bother to
happen to
get to
decide to
dare to
try to
agree to
promise towant to
intend to
plan to
hope to
Tense Manages to Predict Implicatives. (Pavlick and Callison-Burch EMNLP 2016)
Implicative Verbsventure to
forget tomanage tobother to
happen toget to
decide todare totry to
agree topromise towant to
intend toplan tohope to
Tense Manages to Predict Implicatives. (Pavlick and Callison-Burch EMNLP 2016)
Implicative Verbsventure to
forget tomanage tobother to
happen toget to
decide todare totry to
agree topromise towant to
intend toplan tohope to
Gallager chose to accept a full scholarship to play football for Temple
University.
Tense Manages to Predict Implicatives. (Pavlick and Callison-Burch EMNLP 2016)
Implicative Verbsventure to
forget tomanage tobother to
happen toget to
decide todare totry to
agree topromise towant to
intend toplan tohope to
Wilkins was allowed to leave in 1987 to join
French outfit Paris Saint-Germain.
Tense Manages to Predict Implicatives. (Pavlick and Callison-Burch EMNLP 2016)
Gallager chose to accept a full scholarship to play football for Temple
University.
Implicative Verbs
She managed to fix the bug.
She fixed the bug.
Contradiction Entailment Can’t Tell
Tense Manages to Predict Implicatives. (Pavlick and Callison-Burch EMNLP 2016)
Implicative Verbs
4
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
EMNLP 2016 Submission 25. Confidential review copy. DO NOT DISTRIBUTE.
(0.14) UFJ wants to merge with Mitsubishi, a combination that’d surpass Citigroup as the world’s biggest bank.6) The merger of Japanese Banks creates the world’s biggest bank.
(0.55) After graduating, Gallager chose to accept a full scholarship to play football for Temple University.) Gallager attended Temple University.
(0.68) Wilkins was allowed to leave in 1987 to join French outfit Paris Saint-Germain.) Wilkins departed Milan in 1987.
Table 3: Examples from the RTE3 dataset (Giampiccolo et al., 2007) which require recognizing implicative behavior, even in verbsthat are not implicative by definition. The tendency of certain verbs (e.g. be allowed) to behave as de facto implicatives is capturedsurprisingly well by the tense agreement score (shown in parentheses).
the main verb of a sentence, the truth of the compli-ment could only be inferred 30% of the time. Whena verb with high tense agreement appeared as themain verb, the truth of the compliment could be in-ferred 91% of the time. This difference is significantat p < 0.01. That is, tense agreement provides astrong signal for identifying non-implicative verbs,and thus can help systems avoid false-positive en-tailment judgements, e.g. incorrectly inferring thatwanting to merge ) merging.
Figure 1: Whether or not compliment is entailed for main verbswith varying levels of tense agreement. Verbs with high tenseagreement yield more definitive judgments (true/false). Eachbar represents aggregated judgements over approx. 20 verbs.
Interestingly, tense agreement accurately mod-els verbs that are not implicative by definition, butwhich nonetheless tend to behave implicatively inpractice. For example, our method finds high tenseagreement for choose to and be allowed to, whichare often used to communicate, albeit indirectly, thattheir compliments did in fact happen. To convinceourselves that treating such verbs as implicativesmakes sense in practice, we manually look throughthe RTE3 dataset (Giampiccolo et al., 2007) for ex-amples containing high-scoring verbs according toour method. Table 3 shows some example inferencesthat hinge precisely on recognizing these types of de
facto implicatives.
5 Discussion and Related Work
Language understanding tasks such as RTE (Clarket al., 2007; MacCartney, 2009) and bias detection(Recasens et al., 2013) have been shown to requireknowledge of implicative verbs, but such knowledgehas previously come from manually-built word listsrather than from data. Nairn et al. (2006) and Mar-tin et al. (2009) describe automatic systems to han-dle implicatives, but require hand-crafted rules foreach unique verb that is handled. The tense agree-ment method we present offers a starting point foracquiring such rules from data, and is well-suitedfor incorporating into statistical systems. The clearnext step is to explore similar data-driven means forlearning the specific behaviors of individual implica-tive verbs, which has been well-studied from a the-oretical perspective (Karttunen, 1971; Nairn et al.,2006; Amaral et al., 2012; Karttunen, 2012). An-other interesting extension concerns the role of tensein word representations. While currently, tense israrely built directly into distributional representa-tions of words (Mikolov et al., 2013; Pennington etal., 2014), our results suggest it may offer importantinsights into the semantics of individual words. Weleave this question as a direction for future work.
6 Conclusion
Differentiating between implicative and non-implicative verbs is important for discriminatinginferences that can and cannot be made in naturallanguage. We have presented a data-driven methodthat captures the implicative tendencies of verbs byexploiting the tense relationship between the verband its compliment clauses. This method effectivelyseparates known implicatives from known non-implicatives, and, more importantly, provides goodpredictive signal in an entailment recognition task.
Tense Manages to Predict Implicatives. (Pavlick and Callison-Burch EMNLP 2016)
Contradiction Entailment Can’t Tell
Implicative Verbs
4
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
EMNLP 2016 Submission 25. Confidential review copy. DO NOT DISTRIBUTE.
(0.14) UFJ wants to merge with Mitsubishi, a combination that’d surpass Citigroup as the world’s biggest bank.6) The merger of Japanese Banks creates the world’s biggest bank.
(0.55) After graduating, Gallager chose to accept a full scholarship to play football for Temple University.) Gallager attended Temple University.
(0.68) Wilkins was allowed to leave in 1987 to join French outfit Paris Saint-Germain.) Wilkins departed Milan in 1987.
Table 3: Examples from the RTE3 dataset (Giampiccolo et al., 2007) which require recognizing implicative behavior, even in verbsthat are not implicative by definition. The tendency of certain verbs (e.g. be allowed) to behave as de facto implicatives is capturedsurprisingly well by the tense agreement score (shown in parentheses).
the main verb of a sentence, the truth of the compli-ment could only be inferred 30% of the time. Whena verb with high tense agreement appeared as themain verb, the truth of the compliment could be in-ferred 91% of the time. This difference is significantat p < 0.01. That is, tense agreement provides astrong signal for identifying non-implicative verbs,and thus can help systems avoid false-positive en-tailment judgements, e.g. incorrectly inferring thatwanting to merge ) merging.
Figure 1: Whether or not compliment is entailed for main verbswith varying levels of tense agreement. Verbs with high tenseagreement yield more definitive judgments (true/false). Eachbar represents aggregated judgements over approx. 20 verbs.
Interestingly, tense agreement accurately mod-els verbs that are not implicative by definition, butwhich nonetheless tend to behave implicatively inpractice. For example, our method finds high tenseagreement for choose to and be allowed to, whichare often used to communicate, albeit indirectly, thattheir compliments did in fact happen. To convinceourselves that treating such verbs as implicativesmakes sense in practice, we manually look throughthe RTE3 dataset (Giampiccolo et al., 2007) for ex-amples containing high-scoring verbs according toour method. Table 3 shows some example inferencesthat hinge precisely on recognizing these types of de
facto implicatives.
5 Discussion and Related Work
Language understanding tasks such as RTE (Clarket al., 2007; MacCartney, 2009) and bias detection(Recasens et al., 2013) have been shown to requireknowledge of implicative verbs, but such knowledgehas previously come from manually-built word listsrather than from data. Nairn et al. (2006) and Mar-tin et al. (2009) describe automatic systems to han-dle implicatives, but require hand-crafted rules foreach unique verb that is handled. The tense agree-ment method we present offers a starting point foracquiring such rules from data, and is well-suitedfor incorporating into statistical systems. The clearnext step is to explore similar data-driven means forlearning the specific behaviors of individual implica-tive verbs, which has been well-studied from a the-oretical perspective (Karttunen, 1971; Nairn et al.,2006; Amaral et al., 2012; Karttunen, 2012). An-other interesting extension concerns the role of tensein word representations. While currently, tense israrely built directly into distributional representa-tions of words (Mikolov et al., 2013; Pennington etal., 2014), our results suggest it may offer importantinsights into the semantics of individual words. Weleave this question as a direction for future work.
6 Conclusion
Differentiating between implicative and non-implicative verbs is important for discriminatinginferences that can and cannot be made in naturallanguage. We have presented a data-driven methodthat captures the implicative tendencies of verbs byexploiting the tense relationship between the verband its compliment clauses. This method effectivelyseparates known implicatives from known non-implicatives, and, more importantly, provides goodpredictive signal in an entailment recognition task.
}High tense agreement strongly predicts
whether the truth of the compliment can
be inferred.
Tense Manages to Predict Implicatives. (Pavlick and Callison-Burch EMNLP 2016)
Contradiction Entailment Can’t Tell
• Humans are flexible with their language. Computers need to be flexible too.
• What aspects of meaning do we expect our semantic representations have built-in? What do we expect to have to deal with in context, at runtime?
• What types of semantic tasks should we need to optimize for explicitly? Shouldn’t some things come “for free” when we train for harder tasks?
Discussion
• Humans are flexible with their language. Computers need to be flexible too.
• What aspects of meaning do we expect our semantic representations have built-in? What do we expect to have to deal with in context, at runtime?
• What types of semantic tasks should we need to optimize for explicitly? Shouldn’t some things come “for free” when we train for harder tasks?
Discussion
• Humans are flexible with their language. Computers need to be flexible too.
• What aspects of meaning do we expect our semantic representations have built-in? What do we expect to have to deal with in context, at runtime?
• What types of semantic tasks should we need to optimize for explicitly? Shouldn’t some things come “for free” when we train for harder tasks?
Discussion
• Humans are flexible with their language. Computers need to be flexible too.
• What aspects of meaning do we expect our semantic representations have built-in? What do we expect to have to deal with in context, at runtime?
• What types of semantic tasks should we need to optimize for explicitly? Shouldn’t some things come “for free” when we train for harder tasks?
Discussion
BibliographyBannard and Callison-Burch. Paraphrasing with bilingual parallel corpora. ACL 2005.
Bos. Wide-coverage semantic analysis with boxer. STEP 2008.
Bowman et al. A large annotated corpus for learning natural language inference. EMNLP 2015.
Ganitkevitch et al. PPDB: The paraphrase database. NAACL 2013.
Hearst. Automatic acquisition of hyponyms from large text corpora. COLING 1992.
Karttunen. Implicative Verbs. Language 1971.
Lin and Pantel. Discovery of Inference Rules from Text. SIGKDD 2001.
MacCartney. Natural Language Inference. PhD Thesis 2009.
Padó et al. Design and realization of a modular architecture for textual entailment. NLE 2014.
Pavlick and Callison-Burch. Most babies are little and most problems are huge: Compositional Entailment in Adjective-Nouns. ACL 2016.
Pavlick and Callison-Burch. So-Called Non-Subsective Adjective.*SEM 2016.
Pavlick and Callison-Burch. Tense Manages to Predict Implicative Behavior in Verbs. EMNLP 2016.
Pavlick et al. Adding Semantics to Data-Driven Paraphrasing. ACL 2015.
Pavlick et al. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. ACL 2015.
Pavlick et al. Domain-Specific Paraphrase Extraction. ACL 2015.
Rocktäschel et al. Reasoning about Entailment with Neural Attention. ICLR 2016.
Snow et al. Semantic taxonomy induction from heterogenous evidence. ACL 2006.
Thank you!
top related