princeton 11/06/03 1 penn putting meaning into your trees martha palmer university of pennsylvania...
TRANSCRIPT
Princeton 11/06/03 1
Penn
Putting Meaning Into Your Trees
Martha PalmerUniversity of Pennsylvania
Princeton Cognitive Science Laboratory
November 6, 2003
Princeton 11/06/03 2
PennOutline Introduction Background: WordNet, Levin classes, VerbNet Proposition Bank – capturing shallow
semantics Mapping PropBank to VerbNet Mapping PropBank to WordNet
Princeton 11/06/03 3
PennWord sense in Machine Translation
Different syntactic framesJohn left the room Juan saiu do quarto. (Portuguese)John left the book on the table. Juan deizou o livro na mesa.
Same syntactic frame?John left a fortune. Juan deixou uma fortuna.
Princeton 11/06/03 4
PennAsk Jeeves – A Q/A, IR ex. What do you call a successful movie? Tips on Being a Successful Movie Vampire ... I shall call
the police. Successful Casting Call & Shoot for ``Clash of
Empires'' ... thank everyone for their participation in the making of yesterday's movie.
Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague...
VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer.
Blockbuster
Princeton 11/06/03 5
PennAsk Jeeves – filtering w/ POS tagWhat do you call a successful movie? Tips on Being a Successful Movie Vampire ... I shall call
the police. Successful Casting Call & Shoot for ``Clash of
Empires'' ... thank everyone for their participation in the making of yesterday's movie.
Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague...
VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer.
Princeton 11/06/03 6
PennFiltering out “call the police”
call(you,movie,what)
call(you,police)
Syntax
Princeton 11/06/03 7
PennEnglish lexical resource is required
That provides sets of possible syntactic frames for verbs.
And provides clear, replicable sense distinctions.
AskJeeves: Who do you call for a good electronic lexical database for English?
Princeton 11/06/03 8
PennWordNet – call, 28 senses1. name, call -- (assign a specified, proper name to; "They named their son David"; …) -> LABEL2. call, telephone, call up, phone, ring -- (get or try to get into
communication (with someone) by telephone; "I tried to call you all night"; …)
->TELECOMMUNICATE3. call -- (ascribe a quality to or give a name of a common
noun that reflects a quality; "He called me a bastard"; …)
-> LABEL4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!")
-> ORDER
Princeton 11/06/03 9
PennWordNet – Princeton (Miller 1985, Fellbaum 1998)
On-line lexical reference (dictionary) Nouns, verbs, adjectives, and adverbs grouped into
synonym sets Other relations include hypernyms (ISA), antonyms,
meronyms
Limitations as a computational lexicon Contains little syntactic information No explicit predicate argument structures No systematic extension of basic senses Sense distinctions are very fine-grained, ITA 73% No hierarchical entries
Princeton 11/06/03 10
PennLevin classes (Levin, 1993)
3100 verbs, 47 top level classes, 193 second and third level
Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily.
John cut the bread. / *The bread cut. / Bread cuts easily.
John hit the wall. / *The wall hit. / *Walls hit easily.
Princeton 11/06/03 11
PennLevin classes (Levin, 1993)
Verb class hierarchy: 3100 verbs, 47 top level classes, 193
Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. change-of-state
John cut the bread. / *The bread cut. / Bread cuts easily. change-of-state, recognizable action, sharp instrument
John hit the wall. / *The wall hit. / *Walls hit easily. contact, exertion of force
Princeton 11/06/03 12
Penn
Princeton 11/06/03 13
PennConfusions in Levin classes? Not semantically homogenous
{braid, clip, file, powder, pluck, etc...} Multiple class listings
homonymy or polysemy? Conflicting alternations?
Carry verbs disallow the Conative, (*she carried at the ball), but include {push,pull,shove,kick,draw,yank,tug}also in Push/pull class, does take the Conative
(she kicked at the ball)
Princeton 11/06/03 14
PennIntersective Levin Classes
“at” ¬CH-LOC“across the room”
CH-LOC
“apart” CH-STATE
Dang, Kipper & Palmer, ACL98
Princeton 11/06/03 15
PennIntersective Levin Classes More syntactically and semantically coherent
sets of syntactic patternsexplicit semantic componentsrelations between senses
VERBNETwww.cis.upenn.edu/verbnet
Dang, Kipper & Palmer, IJCAI00, Coling00
Princeton 11/06/03 16
PennVerbNet – Karin Kipper
Class entries:Capture generalizations about verb behaviorOrganized hierarchicallyMembers have common semantic elements,
semantic roles and syntactic frames Verb entries:
Refer to a set of classes (different senses)each class member linked to WN synset(s)
(not all WN senses are covered)
Princeton 11/06/03 17
PennSemantic role labels:
Christiane broke the LCD projector.
break (agent(Christiane), patient(LCD-projector))
cause(agent(Christiane), broken(LCD-projector))
agent(A) -> intentional(A), sentient(A), causer(A), affector(A)
patient(P) -> affected(P), change(P),…
Princeton 11/06/03 18
PennHand built resources vs. Real data
VerbNet is based on linguistic theory – how useful is it?
How well does it correspond to syntactic variations found in naturally occurring text?
PropBank
Princeton 11/06/03 19
PennProposition Bank:From Sentences to Propositions
Powell met Zhu Rongji
Proposition: meet(Powell, Zhu Rongji)Powell met with Zhu Rongji
Powell and Zhu Rongji met
Powell and Zhu Rongji had a meeting
. . .When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))
debate
consult
joinwrestle
battle
meet(Somebody1, Somebody2)
Princeton 11/06/03 20
PennCapturing semantic roles*
George broke [ ARG1 the laser pointer.]
[ARG1 The windows] were broken by the hurricane.
[ARG1 The vase] broke into pieces when it toppled over.
SUBJ
SUBJ
SUBJ
*See also Framenet, http://www.icsi.berkeley.edu/~framenet/
Princeton 11/06/03 21
PennEnglish lexical resource is required
That provides sets of possible syntactic frames for verbs with semantic role labels.
And provides clear, replicable sense distinctions.
Princeton 11/06/03 22
PennA TreeBanked Sentence
Analysts
S
NP-SBJ
VP
have VP
been VP
expectingNP
a GM-Jaguar pact
NP
that
SBAR
WHNP-1
*T*-1
S
NP-SBJVP
wouldVP
give
the US car maker
NP
NP
an eventual 30% stake
NP
the British company
NP
PP-LOC
in
(S (NP-SBJ Analysts) (VP have (VP been (VP expecting
(NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1) (VP would
(VP give (NP the U.S. car maker)
(NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company))))))))))))
Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.
Princeton 11/06/03 23
PennThe same sentence, PropBanked
Analysts
have been expecting
a GM-Jaguar pact
Arg0 Arg1
(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting
Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1) (VP would
(VP give Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %) stake)
(PP-LOC in (NP the British company))))))))))))
that would give
*T*-1
the US car maker
an eventual 30% stake in the British company
Arg0
Arg2
Arg1
expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)
Princeton 11/06/03 24
PennFrames File Example: expectRoles: Arg0: expecter Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines in interest rates.
Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates
Princeton 11/06/03 25
PennFrames File example: giveRoles: Arg0: giver Arg1: thing given Arg2: entity given to
Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation
Princeton 11/06/03 26
PennWord Senses in PropBank Orders to ignore word sense not feasible for 700+
verbs Mary left the room Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":Arg0: entity leavingArg1: place left
Frameset leave.02 "give":Arg0: giver Arg1: thing givenArg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Princeton 11/06/03 27
PennAnnotation procedure
PTB II - Extraction of all sentences with given verb Create Frame File for that verb Paul Kingsbury
(3100+ lemmas, 4400 framesets,118K predicates) Over 300 created automatically via VerbNet
First pass: Automatic tagging (Joseph Rosenzweig) http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon
Second pass: Double blind hand correction
Paul Kingsbury Tagging tool highlights discrepancies Scott Cotton Third pass: Solomonization (adjudication)
Betsy Klipple, Olga Babko-Malaya
Princeton 11/06/03 28
PennTrends in Argument Numbering Arg0 = agent Arg1 = direct object / theme / patient Arg2 = indirect object / benefactive /
instrument / attribute / end state Arg3 = start point / benefactive / instrument /
attribute Arg4 = end point Per word vs frame level – more general?
Princeton 11/06/03 29
Penn
Additional tags (arguments or adjuncts?)
Variety of ArgM’s (Arg#>4): TMP - when? LOC - where at? DIR - where to? MNR - how? PRP -why? REC - himself, themselves, each other PRD -this argument refers to or modifies
another ADV –others
Princeton 11/06/03 30
Penn
Function tags for Chinese (arguments or adjuncts?)
Variety of ArgM’s (Arg#>4): TMP - when? LOC - where at? DIR - where to? MNR - how? PRP -why? TPC – topic PRD -this argument refers to or modifies another ADV –others CND – conditional DGR – degree FRQ - frequency
Princeton 11/06/03 31
Penn
Additional function tags for Chinese for phrasal verbs-
Correspond to groups of “prepositions”
AS AT
INTO
ONTO
TO
TOWARDS
Princeton 11/06/03 32
PennInflection Verbs also marked for tense/aspect
Passive/Active Perfect/Progressive Third singular (is has does was) Present/Past/Future Infinitives/Participles/Gerunds/Finites
Modals and negations marked as ArgMs
Princeton 11/06/03 33
PennFrames: Multiple Framesets Framesets are not necessarily consistent between
different senses of the same verb
Framesets are consistent between different verbs that share similar argument structures, (like FrameNet)
Out of the 787 most frequent verbs:
1 FrameNet – 521 2 FrameNet – 169 3+ FrameNet 97 (includes light verbs)
Princeton 11/06/03 34
PennErgative/Unaccusative Verbs
Roles (no ARG0 for unaccusative verbs)Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16 billion.
The Nasdaq composite index added 1.01 to 456.6 on paltry volume.
Princeton 11/06/03 35
PennActual data for leave http://www.cs.rochester.edu/~gildea/PropBank/Sort/Leave .01 “move away from” Arg0 rel Arg1 Arg3Leave .02 “give” Arg0 rel Arg1 Arg2
sub-ARG0 obj-ARG1 44 sub-ARG0 20 sub-ARG0 NP-ARG1-with obj-ARG2 17 sub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10 sub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6 sub-ARG0 sub-ARG1 VP-ARG3-PRD 5 NP-ARG1-with obj-ARG2 4 obj-ARG1 3 sub-ARG0 sub-ARG2 VP-ARG3-PRD 3
Princeton 11/06/03 36
PennPropBank/FrameNet
Buy
Arg0: buyer
Arg1: goods
Arg2: seller
Arg3: rate
Arg4: payment
Sell
Arg0: seller
Arg1: goods
Arg2: buyer
Arg3: rate
Arg4: payment
More generic, more neutral – maps readily to VN,TR Rambow, et al, PMLB03
Princeton 11/06/03 37
PennAnnotator accuracy – ITA 84%
1000 10000 100000 10000000.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96hertlerb
forbesk
solaman2istreit
wiarmstr kingsbur
ksledge
nryant
malayaojaywang
delilkanpteppercotter
Annotator Accuracy-primary labels only
# of annotations (log scale)
accu
racy
Princeton 11/06/03 38
PennEnglish lexical resource is required
That provides sets of possible syntactic
frames for verbs with semantic role labels?
And provides clear, replicable sense distinctions.
Princeton 11/06/03 39
PennEnglish lexical resource is required
That provides sets of possible syntactic frames for verbs with semantic role labels
that can be automatically assigned accurately to new text?
And provides clear, replicable sense distinctions.
Princeton 11/06/03 40
Penn
Automatic Labelling of Semantic Relations• Stochastic Model• Features:
PredicatePhrase TypeParse Tree PathPosition (Before/after predicate)Voice (active/passive)Head Word
Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02
Princeton 11/06/03 41
PennSemantic Role Labelling Accuracy-Known Boundaries
79.673.682.0Automatic parses
83.177.0Gold St. parses
PropBank
≥ 10 instancesPropBankFramene
t
≥ 10 inst
•Accuracy of semantic role prediction for known boundaries--the system is given the constituents to classify.•FrameNet examples (training/test) are handpicked to be unambiguous.• Lower performance with unknown boundaries.• Higher performance with traces. • Almost evens out.
Princeton 11/06/03 42
PennAdditional Automatic Role Labelers
Performance improved from 77% to 88% Colorado (Gold Standard parses, < 10 instances) Same features plus
Named Entity tags Head word POS For unseen verbs – backoff to automatic verb clusters
SVM’s Role or not role For each likely role, for each Arg#, Arg# or not No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Princeton 11/06/03 43
PennWord Senses in PropBank Orders to ignore word sense not feasible for 700+
verbs Mary left the room Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":Arg0: entity leavingArg1: place left
Frameset leave.02 "give":Arg0: giver Arg1: thing givenArg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Princeton 11/06/03 44
PennMapping from PropBank to VerbNet
Frameset id = leave.02
Sense =
give
VerbNet class =
future-having 13.3
Arg0 Giver Agent
Arg1 Thing given
Theme
Arg2 Benefactive
Recipient
Princeton 11/06/03 45
PennMapping from PB to VerbNet
Princeton 11/06/03 46
PennMapping from PropBank to VerbNet
Overlap with PropBank framesets 50,000 PropBank instances < 50% VN entries, > 85% VN classes
Results MATCH - 78.63%. (80.90% relaxed) (VerbNet isn’t just linguistic theory!)
Benefits Thematic role labels and semantic predicates Can extend PropBank coverage with VerbNet classes WordNet sense tags
Kingsbury & Kipper, NAACL03, Text Meaning Workshophttp://www.cs.rochester.edu/~gildea/VerbNet/
Princeton 11/06/03 47
PennWordNet as a WSD sense inventory
Senses unnecessarily fine-grained?
Word Sense Disambiguation bakeoffsSenseval1 – Hector, ITA = 95.5%Senseval2 – WordNet 1.7, ITA verbs = 71%Groupings of Senseval2 verbs, ITA =82%
Used syntactic and semantic criteria
Princeton 11/06/03 48
PennGroupings Methodology(w/ Dang and Fellbaum) Double blind groupings, adjudication Syntactic Criteria (VerbNet was useful)
Distinct subcategorization frames call him a bastard call him a taxi
Recognizable alternations – regular sense extensions: play an instrument play a song play a melody on an instrument
SIGLEX01, SIGLEX02, JNLE04
Princeton 11/06/03 49
PennGroupings Methodology (cont.)
Semantic Criteria Differences in semantic classes of arguments
Abstract/concrete, human/animal, animate/inanimate, different instrument types,…
Differences in the number and type of arguments Often reflected in subcategorization frames John left the room. I left my pearls to my daughter-in-law in my will.
Differences in entailments Change of prior entity or creation of a new entity?
Differences in types of events Abstract/concrete/mental/emotional/….
Specialized subject domains
Princeton 11/06/03 50
PennResults – averaged over 28 verbs
Total
WN polysemy 16.28
Group polysemy 8.07
ITA-fine 71%
ITA-group 82%
MX-fine 60.2%
MX-group 69%
MX – Maximum Entropy WSD, p(sense|context)Features: topic, syntactic constituents, semantic classes +2.5%, +1.5 to +5%, +6%
Dang and Palmer, Siglex02,Dang et al,Coling02
Princeton 11/06/03 51
PennGrouping improved ITA and Maxent WSD
Call: 31% of errors due to confusion between senses within same group 1:name, call -- (assign a specified, proper name to; They named
their son David)call -- (ascribe a quality to or give a name of a common noun
that reflects a quality; He called me a bastard)call -- (consider or regard as being;I would not call her beautiful)
75% with training and testing on grouped senses vs.43% with training and testing on fine-grained senses
Princeton 11/06/03 52
PennWordNet: - call, 28 senses, groups
WN5, WN16,WN12 WN15 WN26
WN3 WN19 WN4 WN 7 WN8 WN9
WN1 WN22
WN20 WN25
WN18 WN27
WN2 WN 13 WN6 WN23
WN28
WN17 , WN 11 WN10, WN14, WN21, WN24,
Loud cry
Label
Phone/radio
Bird or animal cry
Request
Call a loan/bond
Visit
Challenge
Bid
Princeton 11/06/03 53
PennWordNet: - call, 28 senses, groups
WN5, WN16,WN12 WN15 WN26
WN3 WN19 WN4 WN 7 WN8 WN9
WN1 WN22
WN20 WN25
WN18 WN27
WN2 WN 13 WN6 WN23
WN28
WN17 , WN 11 WN10, WN14, WN21, WN24,
Loud cry
Label
Phone/radio
Bird or animal cry
Request
Call a loan/bond
Visit
Challenge
Bid
Princeton 11/06/03 54
PennOverlap between Groups and Framesets – 95%
WN1 WN2 WN3 WN4
WN6 WN7 WN8 WN5 WN 9 WN10
WN11 WN12 WN13 WN 14
WN19 WN20
Frameset1
Frameset2
developPalmer, Dang & Fellbaum, NLE 2004
Princeton 11/06/03 55
PennSense Hierarchy PropBank Framesets – coarse grained distinctions
Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap
WordNet – fine grained distinctions
Princeton 11/06/03 56
PennEnglish lexical resource is available
That provides sets of possible syntactic
frames for verbs with semantic role labels that can be automatically assigned accurately to new text.
And provides clear, replicable sense distinctions.
Princeton 11/06/03 57
PennSummary of English PropBankPaul Kingsbury, Olga Babko-Malaya, Scott Cotton
Genre Words Frames Files Frameset
Tags
Released
Wall Street Journal*
(financial subcorpus)
300K < 2000 400 July, 02
Wall Street Journal*
(Penn TreeBank II)
1000K < 4000 700 Dec, 03?
(March, 03)
English Translation of
Chinese TreeBank *
ITIC funding
100K <1500 July, 04
Sinorama, English corpus
NSF-ITR funding
150K <2000 July, 05
English half of DLI
Military Corpus
ARL funding
50K < 1000 July, 05
Princeton 11/06/03 58
PennA Chinese Treebank Sentence
国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law
“The Congress passed the banking law recently.”
(IP (NP-SBJ (NN 国会 /Congress))
(VP (ADVP (ADV 最近 /recently))
(VP (VV 通过 /pass)
(AS 了 /ASP)
(NP-OBJ (NN 银行法 /banking law)))))
Princeton 11/06/03 59
PennThe Same Sentence, PropBanked
通过 (f2) (pass)
arg0 argM arg1
国会 最近 银行法 (law)
(congress)
(IP (NP-SBJ arg0 (NN 国会 ))
(VP argM (ADVP (ADV 最近 ))
(VP f2 (VV 通过 )
(AS 了 )
arg1 (NP-OBJ (NN 银行法 )))))
Princeton 11/06/03 60
PennChinese PropBank Status - (w/ Bert Xue and Scott Cotton)
Create Frame File for that verb - Similar alternations – causative/inchoative,
unexpressed object 5000 lemmas, 2000 DONE, (hired Jiang)
First pass: Automatic tagging 2000 DONE Subcat frame matcher (Xue & Kulick, MT03)
Second pass: Double blind hand correction In progress (includes frameset tagging), 600 DONE Ported RATS to CATS, in use since May
Third pass: Solomonization (adjudication)
Princeton 11/06/03 61
PennSummary of Chinese PropBankNianwen Xue, Meiyu Chang, Zhiyi, Ping
Genre Words Frames Files Frameset
Tags
Released
Xinhua News
DOD funding
250K 4867 200 July, 04
Sinorama
NSF-ITR funding
150K < 4000 July, 05
Princeton 11/06/03 62
PennA Korean Treebank Sentence
(S (NP-SBJ 그 /NPN+ 은 /PAU) (VP (S-COMP (NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP (NP-ADV 3/NNU
월 /NNX+ 말 /NNX+ 까지 /PAU) (VP (NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN) ./SFN)
그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다 .
He added that Renault has a deadline until the end of March for a merger proposal.
Princeton 11/06/03 63
PennThe same sentence, PropBanked
덧붙이었다
그는 갖고 있다
르노가 인수제의 시한을
덧붙이다 ( 그는 , 르노가 3 월말까지 인수제의 시한을 갖고 있다 ) (add) (he) (Renaut has a deadline until the end of March for a merger proposal)
갖다 ( 르노가 , 3 월말까지 , 인수제의 시한을 ) (has) (Renaut) (until the end of March) (a deadline for a merger proposal)
Arg0 Arg2
Arg0 Arg1
(S Arg0 (NP-SBJ 그 /NPN+ 은 /PAU) (VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP ( ArgM NP-ADV 3/NNU 월 /NNX+ 말 /NNX+ 까지 /PAU) (VP ( Arg1 NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN) ./SFN)
3 월말까지
ArgM
Princeton 11/06/03 64
PennKorean PropBank funded by ARO - CORK
50K words Virginia Corpus of Korean Treebank New semantic augmentations
Predicate-argument relations for predicatesLabel arguments: Arg0, Arg1, Arg2 …
Korean lexical resource – Frames filesUse “semantic role glosses” unique to each predicateCreate manually for 900 predicatesRefer to the hand-corrected DSynt files from Virginia
Extend to newswire domain – 130K words
Princeton 11/06/03 65
PennPropBank II Nominalizations NYU Lexical Frames DONE Event Variables, (including temporals and
locatives) More fine-grained sense tagging
Tagging nominalizations w/ WordNet senseSelected verbs and nouns
Nominal Coreference not names
Clausal Discourse connectives – selected subset
Princeton 11/06/03 66
PennPropBank I
Also, [Arg0substantially lower Dutch corporate tax rates] helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3-PRD flat] [ArgM-ADV relative to earnings growth]].
relative to earnings…
flatits tax outlaythe company
keep
the company keep its tax outlay flat
tax rateshelp
ArgM-ADVArg3-PRD
Arg1Arg0REL
Event variables;
ID#h23
k16
nominal reference;sense tags;
help2,5 tax rate1
keep1 company1
discourse connectives
{ }
I
Princeton 11/06/03 67
Penn
Summary of Multilingual TreeBanks, PropBanks
Parallel Corpora
Text Treebank PropBank I
Prop II
Chinese Treebank
Chinese 500K
English 400K
Chinese 500K
English 100K
Chinese 500K
English 350K
Ch 100K
En 100K
Arabic
Treebank
Arabic 500K
English 500K
Arabic 500K
English ?
?
?
Korean
Treebank
Korean 180K
English 50K
Korean 180K
English 50K
Korean 180K
English 50K
Princeton 11/06/03 68
PennLevin class: escape-51.1-1 WordNet Senses: WN 1, 5, 8
Thematic Roles: Location[+concrete] Theme[+concrete]
Frames with Semantics Basic Intransitive
"The convict escaped" motion(during(E),Theme) direction(during(E),Prep,Theme, ~Location)
Intransitive (+ path PP) "The convict escaped from the prison"
Locative Preposition Drop "The convict escaped the prison"
Princeton 11/06/03 69
PennLevin class: future_having-13.3 WordNet Senses: WN 2,10,13
Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[]
Frames with Semantics Dative
"I promised somebody my time" Agent V Recipient Theme has_possession(start(E),Agent,Theme) future_possession(end(E),Recipient,Theme) cause(Agent,E)
Transitive (+ Recipient PP) "We offered our paycheck to her" Agent V Theme Prep(to) Recipient )
Transitive (Theme Object) "I promised my house (to somebody)" Agent V Theme
Princeton 11/06/03 70
PennActual data for leave http://www.cs.rochester.edu/~gildea/PropBank/Sort/Leave .01 “move away from” Arg0 rel Arg1 Arg3Leave .02 “give” Arg0 rel Arg1 Arg2
sub-ARG0 obj-ARG1 44 sub-ARG0 20 sub-ARG0 NP-ARG1-with obj-ARG2 17 sub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10 sub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6 sub-ARG0 sub-ARG1 VP-ARG3-PRD 5 NP-ARG1-with obj-ARG2 4 obj-ARG1 3 sub-ARG0 sub-ARG2 VP-ARG3-PRD 3
Princeton 11/06/03 71
Penn
SENSEVAL – Word Sense Disambiguation Evaluation
SENSEVAL1
1998
SENSEVAL2
2001
Languages
Systems
3
24
12
90
Eng. Lexical Sample
Verbs/Poly/Instances
Yes
13/12/215
Yes
29/16/110
Sense Inventory Hector,
95.5%
WordNet, 73+%
NLE99, CHUM01, NLE02, NLE03
DARPA style bakeoff: training data, testing data, scoring algorithm.
Princeton 11/06/03 72
PennMaximum Entropy WSDHoa Dang, best performer on Verbs
Maximum entropy framework, p(sense|context) Contextual Linguistic Features
Topical feature for W: keywords (determined automatically)
Local syntactic features for W: presence of subject, complements, passive? words in subject, complement positions, particles,
preps, etc.Local semantic features for W:
Semantic class info from WordNet (synsets, etc.) Named Entity tag (PERSON, LOCATION,..) for
proper Ns words within +/- 2 word window