jan hajič otakar smrž petr zemánek jan Šnaidauf emanuel beška faculty of mathematics and...
TRANSCRIPT
![Page 1: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/1.jpg)
Jan Hajič Otakar Smrž
Petr ZemánekJan Šnaidauf
Emanuel Beška
Faculty of Mathematics and PhysicsFaculty of Philosophy and ArtsCharles University in Prague
Development in Data and Tools
Prague Arabic DependencyTreebank
![Page 2: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/2.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
2
Project Release – PADT 1.0 December 2004, Linguistic Data
Consortium 148 000 Morpho, 113 500 Syntax
AFP 13 000 N/A France Presse Penn ATB 1
UMH 38 500 N/A Ummah Press Penn ATB 2
XIN 13 500 N/A Xinhua News A Gigaword
ALH 10 000 73 500 Al-Hayat News A Gigaword
ANN 12 500 25 500 An-Nahar News A Gigaword
XIA 26 500 49 500 Xinhua News A Gigaword
![Page 3: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/3.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
3
Open-Source Tools TrEd Tree Editor
Multi-purpose annotation environment Suite of programming utilities
Netgraph Search Engine Server/Client system architecture Easy-to-learn query language
Encode::Arabic Perl Module Extension for processing of Arabic script ArabTeX, Buckwalter, Unicode, …
![Page 4: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/4.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
4
PADT Functional Views Functional Generative Description
Theory of linguistic meaning and its expression Prague Dependency Treebank for Czech
Independence of representation levels Tectogrammatical – linguistic meaning Analytical – surface dependency syntax Morphological – categories and lexical units
Abstraction of the relations across levels Strict distinction between form and function Different units of description on each level
![Page 5: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/5.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
5
Functional Morphology Provides syntax levels with their abstract
language, not just giving letters in tokens Revives multiple senses of categories Completeness of generation Strict modeling of grammatical control MorphoTrees – ‘human tagging’ Successful prototype feature-based tagger
![Page 6: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/6.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
6
Syntactic Levels of Description
Analytical level Pragmatically motivated, close to surface syntax Every single token resulting from
morphological level forms one node Tree-like dependency structure for every sentence
Tectogrammatical level Linguistic (literal) meaning, deep relations, TFA Initial structures transformed from AL Nodes for autosemantic words only Decisive role of valency frames
![Page 7: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/7.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
7
Logic of Analytical Trees Concepts of dependency and valency Reduction: sentence must retain
grammatical correctness if leaves(terminal nodes) are chopped off
Trees: clause components clauses sentences paragraphs etc.Subtrees of clauses exchangeable for non-clauses
Nodes: words, tokenized parts of words, punctuation marks – marked by functions
Edges: syntactic relations –governing node dependent node/subtree
![Page 8: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/8.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
8
Some Syntax Issues of Arabic
Non-verbal predication of several types Subordinate non-verbal clauses / modification Verb-like behavior of many nominal forms Mostly VSO in verbal sentences, but…
vice-versa in non-verbal clauses different, depending on context boundness
Compound verbs, fixed composite prepositions Grammatical co-reference, accusative of
inner object, complex referencing, etc.
![Page 9: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/9.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
9
Problem I: Predication Head node of tree: PREDICATE
Why? Steady role in sentence, cannot be omitted Verbal predicate: I-go to school Non-verbal predicate
Nominal: The-house a-big (=the house is big) Existential: There a-city (=there is a city) Prepositional
Possessive: For him a-house (=he has a house) Adverbial: The-mosque in the-city (=…is…)
Conjunctional: The-problem that (=…is that)
![Page 10: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/10.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
10
la- [PredP]for
-hu [Obj]him
baytun [Sb]a-house [nom.]
Predication Types in TreesdAma [Pred]lasted
iqtirAHu [Sb]proposal
‑hu [Atr]his
al-EamalIyata [Obj]the-operation [acc.]
EalA [AuxP]on
zumalA’i [Obj]colleagues
‑hi [Atr]his
sAEatayni [Adv]two-hours [acc.]al-baytu [Sb]
the-house [nom.]
kabIrun [Pnom]a-big [nom.]
vam~ata [PredE]there-is
fI [PredP]in
al-madInati [Adv]the-city [gen.]
al-jAmiEu [Sb]the-mosque [nom.]
madInatun [Sb]a-city [nom.]
Nominal
Prepositional(possessive)
Existential
Prepositional(adverbial, locative)
Verbal
Verb-like behavior (object of noun?)
![Page 11: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/11.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
11
Problem II: Clauses & Co-reference
Recursiveness: subordinate clause is con-tained as subtree in place of simple element Head-node of clause gets the same function Problem: non-verbal structures – clauses or not? Compound verbs (mA zAla etc.) treated equally
Grammatical co-reference: Personal pro- noun formally required by another element Pronoun must be marked to be treated as such Target of reference is unambiguously identifiable Often in subordinate clauses, mostly attributive
Ex.: He-wrote a-book number its-pages hundred
![Page 12: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/12.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
12
naHwu [Sb]grammar [nom.]
jumalan [Sb]sentences [acc.]
fI [Atr_PredP]in
Clauses & Co-reference in Trees
kataba [Pred]he-wrote
SafHatin [Atr]pages [gen.]
kitAban [Obj]a-book
mi’atu [Sb]hundred [nom.]
zAlat [Pred]she-stopped
tuHis~u [Atv]she-feels
anna [AuxC]that
‑hA [Atr_Ref] their
-hA [Obj]her
wADiHun [Atr_Pnom]clear [nom.]
tuEjibu [Obj_Pred]they-impress
al-rajulu [Sb]the-man [nom.]
Attributive clause, prepositional
predicate (adverbial)
Objective clause, verbal predicate
Compound verb, formed as main verb and its complement
zaybabu [Sb]Zaynab
mA [AuxM]not
-hi [Adv_Ref]it
Referencing pronoun, as
attribute in clause
Attributive clause, nominal predicate
Referencing pronoun, as
adverbial in clause
![Page 13: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/13.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
13
Future Prospects Implementation of Functional
Morphology Tectogrammatical annotation Lexicons of valency frames Re-training the feature-based tagger
on MorphoTrees Machine-learning on the treebank
data for various purposes
![Page 14: Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague](https://reader036.vdocument.in/reader036/viewer/2022082818/56649f035503460f94c17564/html5/thumbnails/14.jpg)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
14
Thank you
Questions welcome!
http://ckl.mff.cuni.cz/padt/