ancora -nom: a spanish lexicon of deverbal nominalizations

24
AnCora-Nom: A Spanish Lexicon of Deverbal Nominalizations Aina Peris and Mariona Taulé CBA 2010: Corpus-Based Approaches to Paraphrasing and Nominalization Barcelona, 1-2 December 2010

Upload: shay

Post on 24-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

AnCora -Nom: A Spanish Lexicon of Deverbal Nominalizations. Aina Peris and Mariona Taulé CBA 2010: Corpus-Based Approaches to Paraphrasing and Nominalization Barcelona, 1-2 December 2010 . AnCora -Nom: A Spanish Lexicon of Deverbal Nominalizations. Introduction Related Work - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

AnCora-Nom: A Spanish Lexicon of Deverbal

Nominalizations Aina Peris and Mariona Taulé

CBA 2010: Corpus-Based Approaches to Paraphrasing and NominalizationBarcelona, 1-2 December 2010

Page 2: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Introduction

Related Work

Methodology

AnCora-Nom

Conclusions and Future Work

CBA 2010

AnCora-Nom: A Spanish Lexicon of Deverbal Nominalizations

Page 3: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

La solución pasaba por [abaratar el despido]. ‘The solution was to make dismissal cheaper’ La solución consistía en [el abaratamiento del despido]. ‘The solution consisted of the cost reduction of dismissal’.

CBA 2010

Introduction

Deverbal nominalizations contain rich semantic information

Page 4: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

New lexical resource: AnCora-Nom◦ Deverbal nominalizations from AnCora-Es: 1,655 lemmas

◦ Semantic Information Argument Structure Denotation type: event, result, and underspecified

◦ Morhosyntactic information Specifiers Plurality Constituents

CBA 2010

Introduction

Page 5: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

AnCora-Nom: 1,655 lexical entries linked to those of the corresponding verbs.

NLP approach: helpful for information extraction tasks.

Linguistic approach: an excellent resource for studying the argument realization of both nouns and verbs.

CBA 2010

Introduction

Page 6: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Introduction

Related Work

Methodology

AnCora-Nom

Conclusions and Future Work

CBA 2010

AnCora-Nom: A Spanish Lexicon of Deverbal Nominalizations

Page 7: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

NOMLEX (Macleod et al., 1998)

NOMLEX-PLUS (Meyers et al., 2004)

Berkeley FrameNet Project (Ruppenhoffer et al., 2006)

◦ Spanish FrameNet (Subirats, 2009)

The Essex Database of Russian Verbs and their Nominalizations (Spencer and Zaretskaya, 1999)

NOMAGE lexicon (Balvet et al., 2010)

CBA 2010

Related Work

Manually build vs. automatically obtained

Page 8: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Introduction

Related Work

Methodology

AnCora-Nom

Conclusions and Future Work

CBA 2010

AnCora-Nom: A Spanish Lexicon of Deverbal Nominalizations

Page 9: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

ANCORA-ES ANCORA-NOM

AnCora-Nom represents in the lexical entry the information coded in AnCora-Es

CBA 2010

Methodology

500,000-word Spanish corpus

23,000 deverbal nominalization tokens

1,655 nominalization types and lexical entries

Page 10: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Annotation of Deverbal nominalizations in AnCora-Es

<sn>

<grup.nom gen="f" num="s">

<n gen="f" lem= "aceptación" denotationtype="result" num="s" originlexicalid="verb.aceptar.1.default" pos="ncfs000" postype="common" sense="16:00235235" wd="aceptación"/>

<sp arg="arg0" func="cn" tem="agt" > <prep> <s lem="de" pos="sps00" postype="preposition" wd="de"/> <sn> <spec gen="m" num="p"> <d gen="c" lem=”el" num="p" pos="dn0cp0" postype=”article" wd=”los"/> <grup.nom gen="m" num="p"> <n gen="m" lem="demás" num="p" pos="ncmp000" postype="common" sense="16:10919146" wd="demás"/>

CBA 2010

Methodology

Page 11: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Extraction process

CBA 2010

Methodology

Consult all the ocurrences of each lemma

Stablish the different

nominal senses

Extract the features

associated with each sense

Lexicalized ConstructionsSame denotation and same verb sense

AnCora-Nom Lexical Entry

Page 12: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Introduction

Related Work

Methodology

AnCora-Nom

Conclusions and Future Work

CBA 2010

AnCora-Nom: A Spanish Lexicon of Deverbal Nominalizations

Page 13: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

CBA 2010

AnCora-NomAnCora-Nom

Lexical Entries 1,655

Nominal Senses 3,094

Nominal Frames 3, 204

Page 14: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

<lexentry lemma="aceptación" lng="es" origin="deverbal" type="noun">

<sense cousin="no" denotation="result" id="1" lexicalized="no" originlemma="aceptar" originlink="verb.aceptar.1" wordnetsynset="16:00117820+16:10039397">

<frame appearsinplural="no" type="default"> <argument argument="arg0" thematicrole="agt"> <constituent frequency="1" preposition="de" type="sp"/> <constituent frequency="1" type="s.a"/> </argument> <specifiers> <constituent frequency="1" postype="article" type="determiner"/> <constituent frequency="1" type="void"/> </specifiers>

<examples> <example file="CESS-CAST-P/141_19981202.tbf.xml" nodepath="4.5.3.2.1.0" sentencenodepath="4">

Para el realizador y guionista , el protagonista masculino , Stéphane , " es muy interesante porque Ø encarna la tolerancia , aceptación de los demás .

</example>

<sense cousin="no" denotation="event" id="2" lexicalized="no" originlemma="aceptar"

originlink="verb.aceptar.1" wordnetsynset="16:00117820"> …

Aceptación ‘acceptance’ lexical entry

AnCora-Nom

Page 15: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

<sense cousin="no" denotation="result" id="1" lexicalized="no” originlemma="aceptar" originlink="verb.aceptar.1" wordnetsynset="16:00117820+16:10039397”>

Cousin attribute

CBA 2010

Sense Attributes

“no”morphological relationship from verb to noun

aceptar>aceptación accept>acceptance

“yes”

morphological relationship from noun to verb

revolución> revolucionar‘revolution’> revolutionize

semantic relationship between noun and verb

escarnio – mofarse‘mocking’ – ‘make fun’

Page 16: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

<sense cousin="no" denotation="result" id="1" lexicalized="no” originlemma="aceptar" originlink="verb.aceptar.1" wordnetsynset="16:00117820+16:10039397”>

Denotation attribute◦ Event[Su Poss-arg1-pat persecución <denotationtype=“event”> ] es uno de esos atractivos marginales que aún le quedan a la Vuelta.

[His persecution] is one of those marginal appeals that ‘la Vuelta’ still has’.

◦ ResultEl tema de conversación era [la actuación <denotationtype=“result”> policial AP-arg0-agt].‘The topic of discussion was [the police acting].’

◦ UnderspecifiedSe espera [la llegada <denotationtype =“underspecified” > de más de 450 observadores extranjeros]NP.

‘[The arrival of more than 450 foreign observers] is expected.’

CBA 2010

Sense Attributes

Page 17: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

<sense cousin="no" denotation="result" id="1" lexicalized="no” originlemma="aceptar" originlink="verb.aceptar.1" wordnetsynset="16:00117820+16:10039397”>

Lexicalized attribute◦ “no”: the nominalization does not take part in lexicalized construction

◦ “yes”: the nominalization does take part in lexicalized construction Alternative_lemma attribute: golpe_de_estado ‘coup d’état’

Lexicalizationtype attribute: “nominal”, “verbal”, “adjectival”, “adverbial”, “prepositional” and “conjunctive”.

CBA 2010

Sense Attributes

Page 18: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

<sense alternativelemma="golpe_de_estado" cousin="no" denotation="result" id="4" lexicalizationtype="nominal" lexicalized="yes" originlemma="golpear" originlink="verb.golpear.1" wordnetsynset="16:00629246”>

<frame appearsinplural="no" type="default"> <argument argument="arg0" thematicrole="agt"> <constituent frequency="1" type="s.a"/> <constituent frequency="1" preposition="de" type="sp"/> </argument> <argument argument="argL"> <constituent frequency="6" preposition="de" type="sp"/> </argument> <argument argument="argM" thematicrole="fin"> <constituent frequency="1" preposition="en_favor_de" type="sp"/> <specifiers> <constituent frequency="3" postype="indefinite" type="determiner"/> <constituent frequency="2" type="void"/> <constituent frequency="1" postype="demonstrative" type="determiner"/> <reference_modifiers> <constituent frequency="1" type="s.a"/> </reference_modifiers> <examples> <example file="3LB-CAST/111_C-6.tbf.xml" nodepath="1.2.1.1" sentencenodepath="1">El empresario fiyiano George_Speight

encabeza este golpe de Estado en_favor_de la comunidad de nativos fiyianos, como ha definido su acción . </example> <example file="CESS-CAST-AA/8907_20000114.tbf.xml" nodepath="0.0.1.4.3.1.1.0" sentencenodepath="0">El ex presidente

de Costa_de_Marfil Henri_Konan_Bédié , destituido el pasado 24_de_diciembre por un violento golpe de Estado militar, reclamó hoy en París la celebración de elecciones " libres y transparentes " en su país antes de Junio próximo . </example>

</examples>

CBA 2010

Lexicalized sense golpe de estado ‘coup d’état’

Page 19: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

<frame appearsinplural="no" type="default"> <argument argument="arg0" thematicrole="agt">

<constituent frequency="1" type="s.a"/> <constituent frequency="1" preposition="de" type="sp"/> </argument> <argument argument="argL"> <constituent frequency="6" preposition="de" type="sp"/> </argument> <argument argument="argM" thematicrole="fin"> <constituent frequency="1" preposition="en_favor_de" type="sp"/>

<reference_modifiers> <constituent frequency="1" type="s.a"/> </reference_modifiers>

<specifiers> <constituent frequency="3" postype="indefinite" type="determiner"/> <constituent frequency="2" type="void"/> <constituent frequency="1" postype="demonstrative" type="determiner"/>

CBA 2010

Frame Attributes“passive”, “causative”, “locative”, “benefactive”

• Argument position• Thematic Role• Constituents and frequency

• "article”• "indefinite”• "demonstrative”• "exclamative”• "numeral”• "interrogative”• "possessive”• "ordinal” • "void”

Page 20: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Quantitative Data

Table 1: Distribution of nominal senses

Table 2: Distribution of senses taking into

account argument realization

Denotation Lexicalized Non-lexicalized

Total

Event 0 631 631

Result 115 1,771 1,886

Underspecified 2 490 492

None 85 0 85

Total 202 2,892 3,094

Arguments Event Result Underspecified

0 48 499 22

1 336 603 312

2 168 340 111

More than 2 79 444 47

Total 631 1,886 492

CBA 2010

Page 21: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Introduction

Related Work

Methodology

AnCora-Nom

Conclusions and Future Work

CBA 2010

AnCora-Nom: A Spanish Lexicon of Deverbal Nominalizations

Page 22: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

AnCora-Nom: New lexical resource

1,655 lexical entries of Spanish deverbal nominalizations

Developed from the information encoded in the AnCora-Es corpus.

Linked to the AnCora-Es corpus and to the AnCora-Verb Spanish lexicon

CBA 2010

Conclusions

Page 23: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

AnCora-Nom as input of the ADN-classifier

Enlarge this lexicon with deadjectival nominalizations and relational nouns

Build a Catalan deverbal nominalization lexicon

CBA 2010

Current and Future work

Page 24: AnCora -Nom:  A Spanish Lexicon of  Deverbal  Nominalizations

Thank you

Any question?