Mono- and bilingual modeling of selectional preferences
Sebastian PadóInstitute for Computational Linguistics
Heidelberg University
(joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)
Some context
•Computational lexical semantics: modeling the meaning of words and phrases
•Distributional approach• Observe the usage of words in corpora
• Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model
Knowledge
Corpus
Structure
Methods:Distributional
semantics
Phenomena:Semantic
relations in bilingual
dictionaries
Application:Predictions of plausibility judgments
Plausibility of Verb-Relation-Argument-Triples
Verb Relation Argument
Plausibility
eat subject customer 6.9
eat object customer 1.5
eat subject apple 1.0
eat object apple 6.4• Central aspect of language• Selectional preferences [Katz & Fodor 1963, Wilks
1975]
• Generalization of lexical similarity
• Incremental language processing [McRae & Matsuki 2009]
• Disambiguation [Toutanova et al. 2005], Applicability of inference rules [Pantel et al.
2007], SRL [Gildea & Jurafsky 2002]
Modelling Plausibility
•Approximating plausibility by frequency
•Two lexical variables: Frequency of most triples is zero•Implausibility or sparse data?• Generalization based on an ontology (WordNet)
[Resnik 1996]
• Generalization based on vector space [Erk, Padó, und Padó 2010]
English corpus
(eat, obj, apple) 100
(eat, obj, hat) 1(eat, obj,
telephone) 0(eat, obj, caviar) 0
(eat, obj, apple): highly plausible(eat, obj, hat): somewhat plausible(eat, obj, telephone): ?(eat, obj, caviar): ?
Semantic Spaces
• Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998]
• Geometrically: Vector in high-dimensional space
• High vector similarity implies high semantic similarity• Next neighbors = synonyms
cultiver
rouler
mandarine
5 1
clémentine
4 1
voiture 1 20
Fr
cultiver
rouler
mandarine
clémentinevoiture
Similarity-based generalization[Pado, Pado & Erk 2010]
•Plausibility is average vector space similarity to seen arguments
• (v, r, a): verb – relation – argument head word triple
• seenargs: set of argument head words seen in the corpus
• wt: weight function• Z: normalization constant• sim: semantic (vector space) similarity
Geometrical interpretation
Peter
husbandchild
orangeapple
breakfastcaviar
Seen objects of “eat”
Seen subjects of “eat”
telephone
Evaluation
•Triples with human plausibility ratings [McRae et al. 1996]
• Evaluation: Correlation of model predictions with human judgments• Spearman’s ρ = 1: perfect correlation; ρ = 0:
no correlation
•Result: Vector space model attains almost quality of “deep” model at 98% coverage
Modell Abdeckung
Spearman’s rho
Resnik 1996 [ontology-based]
100% 0.123 n.s.
EPP [vector space-based] 98% 0.325 ***
U. Pado et al. 2006 [“deep” model]
78% 0.415 ***
From one to many languages…
•Vector space model reduces the need for language resources to predict plausibility judgments• No ontologies
•Still necessary: Observations of triples, target words• Large, accurately parsed corpus• Problematic for basically all languages except
English
•Can we extend our strategy to new languages?
Resnik [Brockmann & Lapata 2002]
TIGER+ GermaNet
ρ= .37
EPP [Pado & Peirsman 2010]
HGC ρ= .33
Predicting plausibility for new languages
•Transfer with a bilingual lexicon [Koehn and Knight 2002]
• Cross-lingual knowledge transfer
•Print dictionaries are problematic• Instead: acquire from distributional data
cultiver – grow
pomme – apple
(cultiver, Obj, pomme) Englishmodel
Englishcorpus
(grow, obj, apple): highly plausible
Bilingual semantic space
• Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997]
• Dimensions are bilingual word pairs, can be bootstrapped
• Frequencies observable from comparable corpora
• Nearest neighbors: Cross-lingual synonyms ⟷ Translations
(cultiver, grow)
(rouler, drive)
mandarine
5 1
mandarin
4 2
car 1 20
Fr
cultiver/grow
rouler/drive
mandarine
mandarincar
E
Nearest neighbors in bilingual space
• Similar usages / context profiles do not necessarily indicate synonymy
(cultiver, grow)
(rouler, drive)
pear 5 1
pomme 4 2
car 1 20
Fr
cultiver/grow
rouler/drive
pear
pommecar
E
• Bilingual case: Peirsman & Pado (2011)• Lexicon extraction for EN/DE and
EN/NL
Evaluation against Gold Standard
•Evaluation of nearest cross-lingual neighbors against a translators’ dictionary
Analysis of 200 noun pairs (EN-DE)
Meta-Relation Relation Frequency
Example
Synonymy (50%) 99 Verhältnis - relationship
Semantic similarity (16%)
Antonymy 1 Inneres - exterior
Co-Hyponymy
15 Straßenbahn - bus
Hyponymy 3 Kunstwerk - painting
Hypernymy 15 Dramatiker - poet
Semantic relatedness (19%)
39 Kapitel - essay
Errors (14%) 28 DDR-Zeit – trainee
Similarity by relation
How to proceed?
•Classical reaction: Focus on cross-lingual synonyms• Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues
•Our hypothesis (prelimimary version):• Non-synonymous pairs still provide information about
bilingual similarity• Should be exploited for cross-lingual knowledge transfer• Experimental validation: Vary number of synonyms,
observe effect on cross-lingual knowledge transfer
Varying the number of neighbors
•Nearest neighbors: 50% of synonyms•Further neighbors: quick decline to 10% of synonyms
Experimental setup
rouler – drive
bagnole – jalopy, banger,
car
(bagnole, subj, rouler) English model
Englishcorpus
Consider plausibilities für:
(jalopy, subj, drive)(banger, subj, drive)
(car, subj, drive)
Details
• Model:• English model: trained on BNC as before• Bilingual lexicon extracted from BNC und
Stuttgarter Nachrichtenkorpus HGC as comparable corpora
• Prediction based on n nearest English neighbours for German argument
• Evaluation:• 90 German (v,r,a) triples with human
plausibility ratings [Brockmann & Lapata 2003]
Results – EN-DE
1 NN
2 NN
3 NN
4 NN
5 NN
Translated English EPP 0.34 0.41 0.44 0.46 0.40
Model Resources Sperman’s ρ
Resnik [Brockmann & Lapata 2002]
TIGER corpus, German Word Net
.37
EPP German [Pado & Peirsman 2010]
HGC corpus parsed with PCFG
.33
• Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included
Results: Details
1 NN
2 NN
3 NN
4 NN 5 NN
English EPP (all ) 0.34 0.41 0.44 0.46 0.40
English EPP (subjects) 0.53 0.51 0.56 0.56 0.55
English EPP (objects) 0.58 0.61 0.61 0.64 0.58
English EPP (pp objects)
0.33 0.45 0.45 0.46 0.42
Sources of the positive effect
•Non-synonyms are in fact informative for plausibility translation
•Semantically similar verbs: eat – munch – feast• Similar events, similar arguments [Fillmore et al.
2003, Levin 1993]
•Semantically related verbs: peel – cook – eat• Schemas/narrative chains: shared participants
[Shank & Abelson 1977, Chambers & Jurafsky 2009]
Our hypothesis with qualifications
• Using non-synonymous translation pairs is helpful1. if transferred knowledge is lexical• Many infrequently observed datapoints
2. if knowledge is stable across semantically related/similar word pairs
• Counterexample: polarity/sentiment judgments• food – feast – grub • Parallel experiment: best results for single
nearest neighbor
Summary
•Plausibility can be modeled with fairly shallow methods• Seen head words plus generalization in vector
space• Precondition: accurately parsed corpus
•If unavailable: Transfer from better-endowed language• Translation through automatically induced
lexicons
•Transfer of knowledge about certain phenomena can benefit from non-synonymous translations• Corresponding to monolingual results from QA
[Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …