![Page 1: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/1.jpg)
Exploring PropBanks for English and Hindi
Ashwini Vaidya
Dept of Linguistics
University of Colorado, Boulder
![Page 2: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/2.jpg)
Why is semantic information important?
• Imagine an automatic question answering system
• Who created the first effective polio vaccine?
• Two possible choices:
– Becton Dickinson created the first disposable syringe for use with the mass administration of the first effective polio vaccine
– The first effective polio vaccine was created in 1952 by Jonas Salk at the University of Pittsburgh
![Page 3: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/3.jpg)
Question Answering
• Who created the first effective polio vaccine?
– Becton Dickinson created the first disposable syringe for use with the mass administration of the first effective polio vaccine
– The first effective polio vaccine was created in 1952 by Jonas Salk at the University of Pittsburgh
![Page 4: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/4.jpg)
Question Answering
• Who created the first effective polio vaccine?
– [Becton Dickinson] created the [first disposable syringe] for use with the mass administration of the first effective polio vaccine
– [The first effective polio vaccine] was created in 1952 by [Jonas Salk] at the University of Pittsburgh
![Page 5: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/5.jpg)
Question Answering
• Who created the first effective polio vaccine?
– [Becton Dickinsonagent] created the [first disposable syringetheme] for use with the mass administration of the first effective polio vaccine
– [The first effective polio vaccinetheme] was created in 1952 by [Jonas Salkagent] at the University of Pittsburgh
![Page 6: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/6.jpg)
Question Answering
• We need semantic information to prefer the right answer
• The theme of create should be ‘the first effective polio vaccine’
• The theme in the first sentence was ‘the first disposable syringe’
• We can filter out the wrong answer
![Page 7: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/7.jpg)
We need semantic information
• To find out about events and their participants
• To capture semantic information across syntactic variation
![Page 8: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/8.jpg)
Semantic information
• Semantic information about verbs and participants expressed through semantic roles
• Agent, Experiencer, Theme, Result etc.
• However, difficult to have a standard set of thematic roles
![Page 9: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/9.jpg)
Proposition Bank
• Proposition Bank (PropBank) provides a way to carry out general purpose Semantic role labelling
• A PropBank is a large annotated corpus of predicate-argument information
• A set of semantic roles is defined for each verb
• A syntactically parsed corpus is then tagged with verb-specific semantic role information
![Page 10: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/10.jpg)
Outline
• English PropBank• Background
• Annotation
• Frame files & Tagset
• Hindi PropBank development• Adapting Frame files
• Light verbs
• Mapping from dependency labels
![Page 11: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/11.jpg)
Proposition Bank
• The first (English) PropBank was created on a 1 million syntactically parsed Wall Street Journal corpus
• PropBank annotation has also been done on different genres e.g. web text, biomedical text
• Arabic, Chinese & Hindi PropBanks have been created
![Page 12: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/12.jpg)
English PropBank
• English PropBank envisioned as the next level of Penn Treebank (Kingsbury & Palmer, 2003)
• Added a layer of predicate-argument information to the Penn Treebank
• Broad in its coverage- covering every instance of a verb and its semantic arguments in the corpus
• Amenable to collecting representative statistics
![Page 13: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/13.jpg)
English PropBank Annotation
• Two steps are involved in annotation
– Choose a sense ID for the predicate
– Annotate the arguments of that predicate with semantic roles
• This requires two components: frame files and PropBank tagset
![Page 14: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/14.jpg)
PropBank Frame files
• PropBank defines semantic roles on a verb-by-verb basis
• This is defined in a verb lexicon consisting of frame files
• Each predicate will have a set of roles associated with a distinct usage
• A polysemous predicate can have several rolesets within its frame file
![Page 15: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/15.jpg)
An example
• John rings the bell
ring.01 Make sound of bell
Arg0 Causer of ringing
Arg1 Thing rung
Arg2 Ring for
![Page 16: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/16.jpg)
An example
• John rings the bell
• Tall aspen trees ring the lake
ring.01 Make sound of bell
Arg0 Causer of ringing
Arg1 Thing rung
Arg2 Ring for
ring.02 To surround
Arg1 Surrounding entity
Arg2 Surrounded entity
![Page 17: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/17.jpg)
An example
• [John] rings [the bell]
• [Tall aspen trees] ring [the lake]
ring.01 Make sound of bell
Arg0 Causer of ringing
Arg1 Thing rung
Arg2 Ring forring.02 To surround
Arg1 Surrounding entity
Arg2 Surrounded entity
Ring.01
Ring.02
![Page 18: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/18.jpg)
An example
• [JohnARG0] rings [the bellARG1]
• [Tall aspen treesARG1] ring [the lakeARG2]
ring.01 Make sound of bell
Arg0 Causer of ringing
Arg1 Thing rung
Arg2 Ring forring.02 To surround
Arg1 Surrounding entity
Arg2 Surrounded entity
Ring.01
Ring.02
![Page 19: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/19.jpg)
Frame files
• The Penn Treebank had about 3185 unique lemmas (Palmer, Gildea, Kingsbury, 2005)
• Most frequently occurring verb: say
• Small number of verbs had several framesets e.g. go, come, take, make
• Most others had only one frameset per file
![Page 20: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/20.jpg)
PropBank annotation pane in Jubilee
![Page 21: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/21.jpg)
English PropBank Tagset
• Numbered arguments Arg0, Arg1, and so on until Arg4
• Modifiers with function tags e.g. ArgM-LOC (location) , ArgM-TMP (time), ArgM-PRP (purpose)
• Modifiers give additional information about when, where or how the event occurred
![Page 22: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/22.jpg)
PropBank tagset
Numbered Argument
Description
Arg0 Agent, causer, experiencer
Arg1 Theme, patient
Arg2 Instrument, benefactive, attribute
Arg3 starting point, benefactive, attribute
Arg4 ending point
• Correspond to the valency requirements of the verb• Or, those that occur with high frequency with that verb
![Page 23: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/23.jpg)
PropBank tagset
• 15 modifier labels for English PropBank
• [HeArg0] studied [economic growthArg1] [in IndiaArgM-LOC]
Modifier Description
ArgM-LOC Location
ArgM-TMP Time
ArgM-GOL Goal
ArgM-MNR Manner
ArgM-CAU Cause
ArgM-ADV Adverbial
![Page 24: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/24.jpg)
PropBank tagset
• Verb specific and more generalized
• Arg0 and Arg1 correspond to Dowty’s Proto Roles
• Leverage the commonalities among semantic roles
• Agents, causers, experiencers – Arg0
• Undergoers, patients, themes- Arg1
![Page 25: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/25.jpg)
PropBank tagset
• While annotating Arg0 and Arg1:
– Unaccusative verbs take Arg1 as their subject argument
• [The windowArg1] broke
– Unergatives will take Arg0
• [JohnArg0] sang
• Distinction is also made between internally caused events (blush: Arg0) & externally caused events (redden: Arg1)
![Page 26: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/26.jpg)
PropBank tagset
• How might these map to the more familiar thematic roles?
• Yi, Loper & Palmer (2007) describe such a mapping to VerbNet roles
![Page 27: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/27.jpg)
•More frequent Arg0 and Arg1 (85%) are learnt more easily by automatic systems•Arg2 is less frequent, maps to more than one thematic role•Arg3-5 are even more infrequent
![Page 28: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/28.jpg)
Using PropBank
• As a computational resource– Train semantic role labellers (Pradhan et al, 2005)– Question answering systems (with FrameNet)– Project semantic roles onto a parallel corpus in
another language (Pado & Lapata, 2005)
• For linguists, to study various phenomena related to predicate-argument structure
![Page 29: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/29.jpg)
Outline
• English PropBank• Background
• Annotation
• Frame files & Tagset
• Hindi PropBank development• Adapting Frame files
• Light verbs
• Mapping from dependency labels
![Page 30: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/30.jpg)
Developing PropBank for Hindi-Urdu
• Hindi-Urdu PropBank is part of a project to develop a Multi-layered and multi-representational treebank for Hindi-Urdu
– Hindi Dependency Treebank
– Hindi PropBank
– Hindi Phrase Structure Treebank
• Ongoing project at CU-Boulder
![Page 31: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/31.jpg)
Hindi-Urdu PropBank
• Corpus of 400,000 words for Hindi
• Smaller corpus of 150,000 words for Urdu
• Hindi corpus consists of newswire text from ‘Amar Ujala’
• So far..
– 220 verb frames
– ~100K words annotated
![Page 32: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/32.jpg)
Developing Hindi PropBank
• Making a PropBank resource for a new language
– Linguistic differences
• Capturing relevant language-specific phenomena
– Annotation practices
• Maintain similar annotation practices
– Consistency across PropBanks
![Page 33: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/33.jpg)
Developing Hindi PropBank
• PropBank annotation for English, Chinese & Arabic was done on top of phrase structure trees
• Hindi PropBank is annotated on dependency trees
![Page 34: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/34.jpg)
Dependency tree
• Represent relations that hold between constituents (chunks)
• Karaka labels show the relations between head verb and its dependents
दि येgave
k1k2
पैसेmoney
औरत कोwoman dat
राम नेRaam erg
k4
![Page 35: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/35.jpg)
Hindi PropBank
• There are three components to the annotation
• Hindi Frame file creation
• Insertion of empty categories
• Semantic role labelling
![Page 36: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/36.jpg)
Hindi PropBank
• There are three components to the annotation
• Hindi Frame file creation
• Insertion of empty categories
• Semantic role labelling
• Both frame creation and labelling require new strategies for Hindi
![Page 37: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/37.jpg)
Hindi PropBank
• Hindi frame files were adapted to include
– Morphological causatives
– Unaccusative verbs
– Experiencers
• Additionally, changes had to be made to analyze the large number (nearly 40%) of light verbs
![Page 38: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/38.jpg)
Causatives
• Verbs that are related via morphological derivation are grouped together as individual predicates in the same frame file.
• E.g. Cornerstone’s multi-lemma mode is used for the verbs खा (KA ‘eat’, खखऱा (KilA ‘feed’ and खखऱवा (KilvA‘cause to feed’)
![Page 39: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/39.jpg)
Causatives
raam neArg0 khaanaArg1 khaayaaRam erg food eat-perf
‘Ram ate the food’
mohan neArg0 raam koArg2 khaanaArg1 khilaayaaMohan erg Ram dat food eat-caus-perf
‘Mohan made Ram eat the food’
sita neArgC mohan seArgA raam koArg2 khaanaArg1 khilvaayaaSita erg Mohan instr Ram acc food eat-ind.caus-caus-perf
‘Sita, through Mohan made Ram eat the food ’
Roleset id: KA.01 to eat
Arg0 eater
Arg1 the thing being eaten
Roleset id: KilA.01 to feed
Arg0 feeder
Arg2 eater
Arg1 the thing being eaten
Roleset id: KilvA.01 to causeto be fed
ArgC Causer of feeding
ArgA feeder
Arg2 Eater
Arg1 the thing eaten
![Page 40: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/40.jpg)
Unaccusatives
• PropBank needs to distinguish proto agents and proto patients
• Sudha danced –– Unergative, animate agentive arguments- Arg0
• The door opened – Unaccusative, non animate patient arguments- Arg1
• For English, these distinctions were available in VerbNet
• For Hindi, various diagnostic tests are applied to make this distinction
![Page 41: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/41.jpg)
Yes
No- Second Stage
Ergative. E.g. naac, dauRa, bEtha?
First stage
Yes.Unaccus-ative. E.g. khulnaa, barasnaa
No-Third stage
Yes.Unacc-usative. Eliminate bEtha
No. Take the majority vote on the tests.
cognate object & ergative case tests
Applicable to verbs undergoing transitivity alternation: eliminate those that take (mostly) animate subjects
For non-alternating verbs and others that remain, the verb will be unaccusative if:• Impersonal passives are not possible• use of ‘huaa’ (past participial relative) is possible• the inanimate subject appears without overt genitive
Unaccusativity diagnostics (5)
![Page 42: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/42.jpg)
Unaccusative verb : महक mahak ‘to smell good’
Arg1 entity that smells good
• impersonal passive test• past participial relative test
Unaccusativity
• This information is then captured in the frame
![Page 43: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/43.jpg)
Experiencers
• Arg0 includes agents, causers, experiencers• In Hindi, the experiencer subjects occur with dikhnaa (to
glimpse), milnaa (to find), suujhnaa (to be struck with) etc.
• Typically marked with dative case– Mujh-ko chaand dikhaa
I. Dat moon glimpse.pf
(I glimpsed the moon)
• PropBank labels these as Arg0, with an additional descriptor field ‘experiencer’– Internally caused events
![Page 44: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/44.jpg)
Complex predicates
• A large number of complex predicates in Hindi
– Noun-verb complex predicates
• vichaar karna; think do (to think)
• dhyaan denaa; attention give (to pay attention)
– Adjectives can also occur
• accha lagnaa: good seem (to like)
– the predicating element is not the verb alone, but forms a composite with the noun/adj
![Page 45: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/45.jpg)
Complex predicates
• There are also a number of verb-verb complex predicates
• Combine with the bare form of the verb to convey different meanings
• ro paRaa; cry lie- burst out crying
• Add some aspectual meaning to the verb
• As these occur with only a certain set of verbs, we find them automatically, based on part of speech tag
![Page 46: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/46.jpg)
Complex predicates
• However, the noun-verb complex predicates are more pervasive
• They occur with a large number of nouns
• Frame files for the nominal predicate need to be created
• E.g. in a sample of 90K words, the light verb kar; ‘do’ occurred 1738 times with 298 different nominals
![Page 47: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/47.jpg)
Complex predicates
• Annotation strategy
• Additional resources (frame files)
• Cluster the large number of nominals?
![Page 48: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/48.jpg)
Light verb annotation
• A convention for the annotation of light verbs has been adopted across Hindi, Arabic, Chinese and English PropBanks (Hwang et. al. 2010)
• Annotation is carried out in three passes– Manual identification of light verb– Annotation of arguments based on frame file– Deterministic merging of the light verb and ‘true’
predicate
• In Hindi, this process may be simplified because of the dependency label ‘pof’ that identifies a light verb
![Page 49: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/49.jpg)
राम ने पैसे चोरी कीएRaam erg Money theft Do.perf
‘Ram stole the money’
1. Identify the N+V sequences that are complex predicates. 2. Annotate predicating expression with ARG-PRX.
REL: karARG-PRX: corii
3. Annotate the arguments and modifiers of the complex predicate with a nominal predicate frame
4. Automatically merge the nominal with the light verb.REL: corii_kiyeArg0: raamArg1: paese
![Page 50: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/50.jpg)
Hindi PropBank tagset
24 labels
Label Description
ARG0 Agent, causer, experiencer
ARG1 Patient, theme, undergoer
ARG2 Beneficiary
ARG3 Instrument
50
![Page 51: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/51.jpg)
Annotating Hindi PropBank
Label Description
ARG0 Agent, causer, experiencer
ARG1 Patient, theme, undergoer
ARG2 Beneficiary
ARG3 Instrument
ARG2-ATRARG2-LOC
attribute ARG2-GOLARG2-SOU
goal
location source
Function tags
51
![Page 52: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/52.jpg)
Annotating Hindi PropBank
Label Description
ARG0 Agent, causer, experiencer
ARG1 Patient, theme, undergoer
ARG2 Beneficiary
ARG3 Instrument
ARG2-ATRARG2-LOC
attribute ARG2-GOLARG2-SOU
goal
location source
ARGC causer
ARGA secondary causer
Function tags
Causative
52
![Page 53: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/53.jpg)
Annotating Hindi PropBank
Label Description
ARG0 Agent, causer, experiencer
ARG1 Patient, theme, undergoer
ARG2 Beneficiary
ARG3 Instrument
ARG2-ATRARG2-LOC
attribute ARG2-GOLARG2-SOU
goal
location source
ARGC causer
ARGA secondary causer
ARGM-VLV Verb-verb construction
ARGM-PRX Noun-verb construction
Function tags
Causative
Complex predicate
53
![Page 54: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/54.jpg)
Annotating Hindi PropBank
Label Description Label Description
ARGM-ADV adverb ARGM-CAU cause
ARGM-DIR direction ARGM-DIS discourse
ARGM-EXT extent ARGM-LOC location
ARGM-MNR manner ARGM-MNS means
ARGM-MOD modal ARGM-NEG negation
ARGM-PRP purpose ARGM-TMP time
•Other modifier labels
54
![Page 55: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/55.jpg)
Hindi PropBank annotation
Annotation pane Frameset file display
![Page 56: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/56.jpg)
Annotation practice
• Need to maintain good annotation practices
• Current practices- double blind, followed by adjudication
• Inter-annotator agreement measures the consistency in the annotation task
• English PropBank had high inter annotator agreement K=0.91
![Page 57: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/57.jpg)
Hindi PropBank annotation
• Improve consistency & annotation speed
• PropBank annotation on dependency trees has some advantages
• Hindi Treebank uses a large set of dependency labels that have rich semantic information
दि येgave
Arg0k1
Arg1k2
Arg2k4
पैसेmoney
औरत कोwoman dat
राम नेRaam erg
![Page 58: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/58.jpg)
Deriving PropBank labels from dependencies
• We can derive Hindi PropBank labels from Dependency labels
• Mapping will reduce annotation effort, improve speed
• The dependency tagset has labels that are in some ways fairly similar to PropBank
– Verb specific labels k1- 5
– Verb modifier labels k7p, k7t, rh etc.
![Page 59: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/59.jpg)
Label comparison
•Using linguistic intuition, we can compare HDT labels with the numbered arguments in HPB
59
![Page 60: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/60.jpg)
Label comparison
• Similarly, linguistic intuition gives us the mapping from HDT for HPB modifiers
60
![Page 61: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/61.jpg)
Label comparison
• These mappings are included in the PB frame files, for example, the verb ‘A: to come’
• Only for numbered arguments
• Basis for the linguistically motivated rules
Roleset Usage Rule
A.01 To come (path) k1 Arg1k2p Arg2-GOL
A.03 to arrive k1 Arg0k2p Arg2-GOLk5 Arg2-SOU
61
![Page 62: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/62.jpg)
Automatic mapping of DT to PB
• A rule based, probabilistic system for automatic mapping
• We use two kinds of resources:
– Annotated corpus [Treebank+ PropBank]
• 32,300 tokens, 2005 predicates
– Frame files with mapping rules
62
![Page 63: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/63.jpg)
Argument classification
• We use three kinds of rules to carry out automatic mapping– Deterministic rules
• Dependency label mapped directly onto PropBank
– Empirically derived rules• Using corpus statistics associated with dependency &
PropBank labels
– Linguistically motivated rules• Derived from linguistic intuition & captured in frame
files
63
![Page 64: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/64.jpg)
Example of the rules
Features Count PropBank labels
xe.01_active_k1(give)
32 Arg0:0.93
Arg1: 0.03
Arg2:0.03
xe.01_active_k2 65 Arg1:0.95
Arg2:0.01
Arg0:0.01
xe.01_active_k4 34 Arg2:0.94
Arg0:0.02
•Associate the probability of each label in combination with a particular feature tuple•We use only 3: roleset ID, voice, dependency label•For the verb give, we get the correct mapping to the Hindi labels
![Page 65: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/65.jpg)
Evaluation
• 32,300 tokens of annotated data
• Ten-fold cross validation for evaluation
65
![Page 66: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/66.jpg)
Results
PRECISION RECALL F1 SCORE
Empirically derived rules
90.59 47.92 62.69
Linguistically motivated rules
89.80 55.28 68.44
66
![Page 67: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/67.jpg)
Results
PRECISION RECALL F1 SCORE
Empirically derived rules
90.59 47.92 62.69
Linguistically motivated rules
89.80 55.28 68.44
Numbered Argument Accuracy
PRECISION RECALL F-SCORE
Empiricallyderived rules
93.63 58.76 72.21
Linguistically motivated
rules
91.87 72.36 80.96
67
![Page 68: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/68.jpg)
Evaluation
• Linguistically motivated rules improve the recall with a slight drop in the precision
• With the most frequent PropBank labels, empirically derived rules perform well
• More data should improve the performance for modifier arguments
68
![Page 69: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/69.jpg)
Evaluation
• Annotation practices also affect the mapping
– PB labels are coarse-grained: E.g. ArgM-ADV maps to four different dependency labels
– PB are fine-grained: E.g. ‘means’ and ‘causes’ are distinguished (ArgM-MNS, ArgM-CAU)but are lumped together under a single label in dependency treebank
69
![Page 70: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/70.jpg)
Evaluation
• Our goal is to find a useful set of mappings to use at a pre-annotation stage
• We expect that with more PropBanked data, empirically derived rules will perform better
• Using additional resources such as Hindi WordNet can help to make up for the lack of training data
70
![Page 71: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/71.jpg)
Summary
• Semantic role labels
• English PropBank
– Frame files
– PropBank tagset
• Development of Hindi PropBank
– Linguistic issues
– Experiments
![Page 72: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University](https://reader030.vdocument.in/reader030/viewer/2022040504/5e35a7452a290009a73dc282/html5/thumbnails/72.jpg)
Questions?