issue s of valency in prague dependency treebank: c reating valency lexicon of verbs

24
Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs Markéta Lopatková Center for Computational Linguistics MFF UK, Prague CIL XVII, Prague, July 26, 2003 1

Upload: media

Post on 13-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Issue s of Valency in Prague Dependency Treebank: C reating Valency Lexicon of Verbs. Markéta Lopatková Center for Computational Linguistics MFF UK, Prague. CIL XVII, Prague, July 26, 2003 1. Motivation. ‘traditional’ linguistics source of data for linguistic research - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Issues of Valency in Prague Dependency Treebank:

Creating Valency Lexicon of Verbs

Markéta LopatkováCenter for Computational Linguistics

MFF UK, Prague

CIL XVII, Prague, July 26, 2003 1

Page 2: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Motivation

‘traditional’ linguistics source of data for linguistic research verification of theoretical criteria set up

natural language processing lemmatization morphological tagging syntactic analysis word sense disambiguation ‘semantic analysis’ machine translation building other resources

language acquisition

CIL XVII, Prague, July 26, 2003 2

Page 3: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Syntactic vs. semantic approach I.

‘Levin Verb Classes’ (Levin, 1993) hypothesis: syntactic features of verbs are semantically

determined method: syntactic behavior semantic classes ‘alternation’ ~ a change in the realization of the argument

structure of a verb ‘conative alternation’

Edith cuts the bread Edith cuts at the bread

classes = verbs which undergo certain types of alternations

CIL XVII, Prague, July 26, 2003 3

Page 4: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Syntactic vs. semantic approach II.

PropBank (Palmer et al., 2001) ‘layer of semantic annotation’ in PennTreebank argument structure for verbs

arguments: Arg0, ... Arg5 modificators: ArgM (LOC, TMP, EXT, PRP, ADV)

He was drawing diagrams and sketches for his patron. Arg0: he Rel: drawing Arg1: diagrams and sketches Arg2-for: his patron He keeps st in the fridge. Arg0: he Rel: keeps Arg1: st Arg2-in: the fridge(also Hajičová, Kučerová, 2002)

CIL XVII, Prague, July 26, 2003 4

Page 5: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Syntactic vs. semantic approach III.

FrameNet (Fillmore, 2002) it groups lexical items with parallel semantic characterization the structure and particular components correspond to

‘semantic roles’ of the common semantic frame verbs, nouns, adjectives, prepositions

‘Communication’:‘Speaker’ ‘Message’ ‘Addressee’ ‘Topic’

‘Medium’ Tom communicates with Kim about the festival. Tom communicates with Kim by

letter. Tom communicates the message to me.

‘Reciprocality’:‘Protagonists’ ‘Prot-1’ ‘Prot-2’ Tom fought with Kim. Tom and Kim fought.

CIL XVII, Prague, July 26, 2003 5

Page 6: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Syntactic vs. semantic approach IV.

LCS Database (Lexical Conceptual Structure) (Dorr, 2001) semantic representation semantic structure + semantic content

verb cut down lexical item: (act_on loc (* thing 1) (* thing 2) ((* [on] 23) loc (*head*) (thing 24))

(cut+ingly 26) (down+/m))

sentence United States cut down (the) quota. (act_on loc (us+) (quota+)

((* [on] 23) loc (*head*) (thing 24))

(cut+ingly 26) (down+/m))

logic arguments (ag, exp, th, src, goal, info, perc, loc,poss, time, prop)

logic modifiers (mod-poss, ben, instr, purp, mod-loc, manner, mod-prop) cut down: _ag_th,mod-loc(on)

CIL XVII, Prague, July 26, 2003 6

Page 7: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Prague Dependency TreeBank

based on Functional Generative Description (FGD) (Sgall et al., 1986)

dependency-oriented stratificational

level of underlying representation (‘tectogrammatical level’) (described in Hajičová et al., 2000) valency theory (esp. Panevová, 1994)

CIL XVII, Prague, July 26, 2003 7

Page 8: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency in FGD I.

complementations: inner participants vs. free modifications obligatory vs. optional valency frame:

Matka.ACT předělala dětem.ADDR loutku.PAT z Kašpárka.ORIG na čerta.EFF. [Mother re-made a puppet for children from a Punch to an imp.] (Panevová)V Praze.LOC se sejdeme na Hlavním nádraží.LOC u pokladen.LOC. (Panevová)[In Prague we will meet at Main Station near a booking-office.]

CIL XVII, Prague, July 26, 2003 8

obligatory optional

inner participants

free modifications

Page 9: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency in FGD II.

a ‘middle position: syntactic criteria are used for the identification of Actor and Patient

(Actor is the first inner participant, the second is always a Patient) other inner participants (Addressee, Origin and Effect) as well as

free modifications are determined in accordance with semantic considerations

concept of ‘shifting’ (Panevová, 1974-75)

OriginActor Patient Addressee

Effect

Kniha.ACT vyšla. (Panevová) [The book appears.]Chlapec.ACT vyrostl v muže.PAT. (Panevová) [A boy grew up to a man.]

CIL XVII, Prague, July 26, 2003 9

Page 10: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency in FGD III.

valency of autosemantic words verbs (Panevová, from the seventies)

5 inner participants - Actor, Patient, Addressee, Origin, Effect app. 45 free modifications ‘shifting of cognitive roles’ for inner participants

nouns (esp. Panevová, 2000, Řezníčková, manuscript) verbal complementations spec. nominal complementations - Identity, Partitive, Appurtenance,

Restrictive and Descriptive Attribute adjectives (Panevová, 1998)

verbal complementations spec. adjectival complementations

CIL XVII, Prague, July 26, 2003 10

Page 11: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency structure on TR level of PDT

the core of annotation on the tectogrammatical level problem of consistency valency lexicon

verbs two branches:

lists of verbs with their complementations being created and used by annotators (PDT-VALLEX)

complex valency lexicon (VALLEX) nouns

the theoretical aspects and methodology are refined now (Řezníčková, manuscript)

lists of nouns with their complementations adjectives

lists of adjectives with their complementations

CIL XVII, Prague, July 26, 2003 11

Page 12: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency lexicon of verbs – PDT-VALLEX

lists being created and used by annotators valency frames of verbs in their particular meanings, as they appear

during annotation, the lexeme as a whole is not analyzed: the information specifying elements of frames:

‘functor’ - i.e. name of complementationtype - obligatory / optional possible morphemic form(s)

example(s) it serves for consistency of annotation approx. 4 700 verbs with 7 150 valency frames

(i.e. 1,5 frames per verb)dát [to give] ... ACT(1;obl) ADDR(3;obl) PAT(4;obl)dát někomu knihu [to give sb a book]

CIL XVII, Prague, July 26, 2003 12

Page 13: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency lexicon of verbs – VALLEX

complex information on the whole verb lexeme in all its meanings (Lopatková, Žabokrtský, 2002) the information on particular valency frames, corresponding to its

meanings (described with gloss(es) and example(s)) the information specifying elements of frames:

‘functor’ - i.e. name of complementationtype - obligatory / optional possible morphemic form(s)

mluvit [to speak] ... ACT(1;obl) ADDR(s+7;obl) PAT(o+6;opt)

mluvila s ním o dětech [she spoke with him about their children]

additional syntactic information

CIL XVII, Prague, July 26, 2003 13

Page 14: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency lexicon of verbs – VALLEX II.

additional syntactic information for particular valency frames: reflexivity (in progress) reciprocity control aspect and aspectual counterparts possible diatheses, passivization (future plans) primary / secondary / idiomatic usage syntactic/semantic class (in progress) pointers to Czech EuroWordNet (in progress) frequency of a particular frame in samples of ČNK (60 occurrences

of each verb lexeme)

CIL XVII, Prague, July 26, 2003 14

Page 15: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency lexicon of verbs – VALLEX III.

current state:

1 400 verbs with 3 860 frames (i.e. 2,7 frames per verb) verbs chosen according to their frequency in Czech National

Corpus and PDT about 85% on ‘running text’ in PDT

open questions

enriched valency frame syntactic-semantic classes alternative frames frozen collocations

CIL XVII, Prague, July 26, 2003 15

Page 16: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Valency lexicon of verbs

Why two branches? PDT-VALLEX ~ ‘extensive’

necessary for annotation ‘recall’ improves relatively quickly

VALLEX ~ ‘intensive’ the whole lexeme is analyzed en bloc adequate and consistent

description ‘precision’ improves

the two branches are supposed to be merged PDT-VALLEX ~ valuable source for VALLEX

CIL XVII, Prague, July 26, 2003 16

Page 17: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs
Page 18: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Enriched valency frames I.

inner participants each inner participant can occur only once

(with single occurrence of a verb) combination of inner participants must be listed for a particular verb morphemic form is predicted by the governing verb concept of ‘shifting’ is applied

free modifications each free modification can be repeated syntactically, they can modify any verb

(only semantic restrictions are often present) they have typical semantics they do not undergo the ‘shifting’

CIL XVII, Prague, July 26, 2003 18

Page 19: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Enriched valency frames II.

quasi-valency complementations (also Panevová, 2003) each quasi-valency complementation can occur only once

(with any occurrence of a verb) each quasi-valency complementation is characteristic for a limited

list of verbs morphemic form is predicted by the governing verb

they have typical semantics they do not undergo the shifting

Obstacle uhodit hlavou o větev.OBST [to bump one's head against a bough] zavadit o stůl.OBST [to brush against a table]

Difference prodloužit o hodinu.DIFF [to prolong by one hour] Mediator vzít někoho za ruku.MDT [to take sb by his/her hand]

CIL XVII, Prague, July 26, 2003 19

Page 20: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Enriched valency frames III.

typical modifications optional free modifications ‘commonly’ used with a verb usually modify group of verbs with similar meaning morphemic form

prototypical for some modifications e.g. Dative case or prep. group pro [for]+Acc for Benefactor

determined by the typical semantics of the modifying members e.g. prep. groups na [on]+Loc and v [in]+Loc typically specify Location

‘verbs of motion’ – typically modified by Direction modification (provided that Direction is not obligatory)jít do kina / přes les / jít z domova [to go to cinema / through the wood / from home]

‘verbs of exchange’ – typically modified by modification of Recompensedát / dostat / získat / kupovat / brát něco.PAT za něco.RCMP [to give / get / obtain / buy / take something for something]

CIL XVII, Prague, July 26, 2003 20

Page 21: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Exploitation of the valency lexicon

reaching the consistency of assigning the valency structure (PDT-VALLEX)

automatic syntactic analysis (‘shallow parsing’) ‘tectogrammatical parser’

automatic system for creating an underlying representation of Czech sentences

source data for building the valency lexicon of nouns

CIL XVII, Prague, July 26, 2003 21

Page 22: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

Resources

theoretical articles on valency (Panevová) The Manual for Tectogrammatical Tagging of the Prague Dependency Treebank

(Hajičová et al., 2000) lists of particular valency frames created by annotators electronic valency dictionary of surface realizations of verbal modifiers

(FI MU Brno, Pala, Ševeček, 1997) printed dictionaries

Slovesa pro praxi (SPP, 1997), valency specification of 767 most frequent verbs Slovník spisovného jazyka českého (SSJČ, 1964)Slovník spisovné češtiny pro školu a veřejnost (SSČ, 1978)Slovník českých synonym (SČS, 1994)Slovník české frazeologie a idiomatiky (SČFI, 1983)

Czech National Corpus (ČNK) EuroWordNet, Czech WordNet

CIL XVII, Prague, July 26, 2003 22

Page 23: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

References I. Dorr, B.J. (2001) LCS Verb Database, Online Software Database of Conceptual

Structures and Documentations, UCMP . Fillmore, Ch. (2002) FrameNet and the Linking between Semantic and Syntactic

Relations. In: COLING 2002, Proceedings, pp. xxviii-xxxvi.

Hajičová, E. et al. (2000) A Manual for Tectogrammatical Tagging of the Prague Dependency Treebank. UFAL/CKL Technical Report TR-2000-09.

Hajičová, E., Kučerová, I. (2002) Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank. In: LREC 2002, Proceedings, pp. 846-851.

Levin, B. (1993) English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago.

Lopatková, M. et al. (2002) Tektogramaticky anotovaný valenční slovník českých sloves. UFAL/CKL Technical Report TR-2002-15.

Lopatková, M., Žabokrtský, Z. (2002) Valency Dictionary of Czech Verbs. In: LREC 2002, Proceedings, pp. 949-956.

Lopatková, M. (2003) Valency in the Prague Dependency Treebank: Building the Valency Lexicon. PBML 79. (in press)

CIL XVII, Prague, July 26, 2003 23

Page 24: Issue s  of Valency in Prague Dependency Treebank:  C reating Valency Lexicon of Verbs

References II.

Pala, K., Ševeček, P. (1997) Valence českých sloves. In: Sborník prací FFUB, Brno.

Palmer, M. et al. (2001) Automatic Predicate Argument Analysis of the Penn TreeBank. In: HLT 2001, Proceedings, San Francisco: Morgan Kaufamm.

Panevová, J. (1974-75) On Verbal Frames in Functional Generative Description. Part I, PBML 22, pp. 3-40, Part II, PBML 23, pp. 17-52.

Panevová, J. (1994) Valency Frames and the Meaning of the Sentence. In: Luelsdorff (ed.) The Prague School of Structural and Functional Linguistics, John Benjamins, pp. 223-243.

Panevová, J. (1998) Ještě k teorii valence. Slovo a slovesnost 59, pp. 1-14. Panevová, J. (2000) Poznámky k valenci podstatných jmen. Čeština -

univerzália a specifika 2, Masarykova Univerzita, Brno, pp. 173-180. Panevová, J. (2003) Some Issues of Syntax and Semantics of Verbal

Modifications. In: Proceedings of MTT 2003, Paris. (in press) Sgall, P. et al. (1986) The Meaning of the Sentence in Its Semantic and

Pragmatic Aspects. Dordrecht: Reidel, Prague: Academia.

CIL XVII, Prague, July 26, 2003 24