the salsa experience: semantic role annotation
DESCRIPTION
The SALSA experience: semantic role annotation. Katrin Erk University of Texas at Austin. Semantic role annotation in SALSA. SALSA: The Sa arbr ücken L exical S emantics Annotation and A nalysis project Manual annotation of the German TIGER corpus with lexical semantic information - PowerPoint PPT PresentationTRANSCRIPT
The SALSA experience: semantic role annotation
Katrin Erk
University of Texas at Austin
Semantic role annotation in SALSA SALSA: The Saarbrücken Lexical Semantics
Annotation and Analysis project Manual annotation of the German TIGER corpus
with lexical semantic information Basis: The Berkeley FrameNet database Verbs annotated with their Frame (~ sense),
plus semantic roles TIGER corpus: 1.5 million words / 80 K sentences of German newspaper text
(Frankfurter Rundschau) Stuttgart/Potsdam/Saarbrücken Phrase types and grammatical functions
Annotation Scheme
(They didn‘t want to pay the move back because the employee had quit.)
Semantics: Independent frames Trees of depth one One edge points to target, others to frame elements Sem. roles point to syn. constituents
TIGER Syntax: Node labels: constituents Edge labels: gramm. functions Crossing edges POS
Experiences with the semantic role annotation in Salsa Frame (~ sense) assignment more difficult than role
assignment Multiple tags possible, at frame level and at role level Limited compositionality phenomena, each with separate
annotation format in Salsa: Light verbs, metaphor, idioms Distinction often difficult: metaphor vs idiom, bleaching If I did this again, one format, multiple tags possible
Annotation beyond the sentence boundary Message role in Communication frames
Annotation below the word boundary: German noun compounds Mietrechtsdiskussion: discussion of tenant law
Encoding sem. role annotation: TIGER XML as a great basis TIGER XML:
each constituent is an XML element with a globally unique ID
Syn. edges explicitly encoded:<edge> elements links two nodes, referring to their IDs
Models discontinuous constituents Salsa/Tiger XML:
Sem. annotation by adding a modular <sem> block to the XML structure of a sentence
Semantics points to syn. constituents using their IDs Annotation beyond sentence boundary possible:
globally unique syn. IDs
Extracting a lexicon: need for a deeper, richer syntax Extracting syntax/semantics mapping:
needs to identify gramm. functions filled by sem. roles
Problems: Constituent structure rather than
dependencies: subjects hard to retrieve
TIGER does not mark voice Shallow format for PPs: determining heads is hard Coordination is a pain