modelling human thematic fit judgments igk colloquium 3/2/2005 ulrike padó
TRANSCRIPT
Modelling Human Thematic Fit Judgments
IGK Colloquium3/2/2005
Ulrike Padó
Overview
• (Very) quick introduction to my framework
• Testing the Semantic Module Different input corpora Smoothing
• Comparing the Semantic Module to standard selectional preference methods
Modelling Semantic Processing
• General idea: Build a probabilistic
large scale
broad coverage
model of syntactic and semantic sentence processing
Semantic Processing
• Assign thematic roles on the basis of co-occurrence statistics from semantically annotated corpora
• Corpus-based frequency estimates of: Semantic Subcategorisation (Probability
of seeing the role with the verb) Selectional Preferences (Probability of
seeing the argument head in a role given the verb frame)
Testing the Semantic Module
• Evaluate just thematic fit of verbs and argument phrases
• Evaluation:1. Correlate predictions with human
judgments2. Role labelling (prefer correct role)
• Try Different input corpora Smoothing
Training Data
Frequency counts from
• the PropBank (ca. 3000 verb types) Very specific domain
Relatively flat, syntax-based annotation
• FrameNet (ca. 1500 verb types) Deep semantic annotation: Frames code situations,
group verbs that describe similar events and their arguments
Extracted from balanced corpus
Skewed sample through frame-wise annotation
Development/Test Data
• Development: 60 verb-argument pairs from McRae et al. 98 Two judgments for each data point:
Agent/Patient
Use to determine optimal parameters of clustering (number of clusters, smoothing)
• Test: 50 verb-argument pairs, 100 data points
Sparse Data
• Raw frequencies are sparse: 1 (Dev)/2 (Test) pairs seen in PropBank
0 (Dev)/2 (Test) pairs seen in FrameNet
• Use semantic classes as level of abstraction: Class-based smoothing
Smoothing
Reconstruct probabilities for unseen data
• Smoothing by verb and noun classes Count class members instead of word
tokens
• Compare two alternatives: Hand-constructed classes Induced verb classes (clustering)
Hand-constructed Verb and Noun classes
• WordNet: Use top-level ontology and synsets as noun classes
• VerbNet: Use top-level classes for verbs
• Presumably correct and reliable• Result: No significant correlations
with human data for either training corpus
Induced Verb Classes
• Automatically cluster verbs Group by similarities of argument heads,
paths from argument to verb, frame, role labels
Determine optimal number of clusters and parameters of the clustering algorithm on the development set
Induced Classes, PB/FN
Data points covered
/Significance
Raw data2 -/-
2 -/-
All Arguments
59 ns
12=0.55/p<0.05
Just NPs
48 ns
16=0.56/p<0.05
Results
• Hand-built classes do not work (with this amount of data)
• Module achieves reliable correlations with FN data: Important result for the overall
feasibility of my model
Adding Noun Classes (PB/FN)
Data points covered
/Significance
Raw data2 -/-
2 -/-
PB, all args, Noun classes
4 =1/ p<0.01
FN, just NPs,Noun classes
18=0.63/ p<0.01
Results
• Hand-built classes do not work (with this amount of data)
• Module achieves reliable correlations with FN data
• Adding noun classes helps yet a little
Comparison with Selectional Preference
Methods• Have established that our system
reliably predicts human data• How do we do in comparison to
standard computational linguistics methods?
Selectional Preference Methods
• Clark & Weir (2002) Add data points by finding the topmost
class in WN that still reliably mirrors the target word frequency
• Resnik (1996) Quantify contribution of WN class n to
the overall preference strength of the verb
• Both rely on WN noun classes, no verb class smoothing
Selectional Preference Methods (PB/FN)
Data points covered
/Significance
Labelling (Cov/Acc)
Sem. Module 1 18=0.63/ p<0.01
38%/47.4%
Sem. Module 2 16=0.56/p<0.05
30%/60%
Clark & Weir72 ns 84%/50%
23 ns 36%/50%
Resnik75 ns 74%/48.6%
46 ns 50%/48%
Results
• Too little input data No results for selectional preference
models Small coverage for Semantic Module
• Semantic module manages to make predictions all the same Relies on verb clusters: Verbs are less
sparse than nouns in small corpora• Annotate larger corpus with FN roles
Annotating the BNC
• Annotate large, balanced corpus: BNC More data points for verbs covered in FN More verb coverage (though purely syntactic
annotation for unknown verbs)
• Results: Annotation relatively sensible and reliable for
non-FN verbs Frame-wise annotation in FN causes problems
for FN verbs