pdt: the tools
DESCRIPTION
PDT: The Tools. Jan Haji č Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic. Tectogrammatical Annotation Tools. Manual annotation Speech Reconstruction: MEd - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/1.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
1
PDT:The Tools
Jan Hajič
Institute of Formal and Applied Linguistics
School of Computer Science
Faculty of Mathematics and Physics
Charles University, Prague
Czech Republic
![Page 2: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/2.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
2
Tectogrammatical AnnotationTools
Manual annotation Speech Reconstruction: MEd Morphology (linear structure annotation): LAW Special graphical tool (TrEd)
Customizable graphical tree editor Viewing and Searching
TrEd, Netgraph (linear structure: also Bonito/Manatee) Automatic annotation
(ASR, Segmentation), Morphology, Tagging, Parsing, Deep parsing, Co-reference, WSD, …
Generation Jan Ptacek’s generation tools (rule-based, so far)
![Page 3: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/3.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
3
Manual annotation
Speech reconstruction MEd z-layer, w-layer, m-layer Audio – annotators can listen
Morphology LAW – new version fro fast morphological
disambiguation Syntax (analytical, tectogrammatical)
TrEd
![Page 4: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/4.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
4
MEd: speech reconstruction viewer / annotation tool
m-layer (annotation)
w-layer
z-layer
audio
![Page 5: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/5.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
5
The Morphological Annotation Tool (LAW)
Java-based
Dictionary
access
XML-aware
PML:
m-layer
![Page 6: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/6.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
6
TrEd: Manual Annotation Tool Perl/PerlTk based, platform-independent
Linux, Windows 95/98/2000, Solaris, ...
Perl as the “macro” language “unlimited” online processing capability
Flexibility for interactive checking split screen, graphical “diff” function
Customization, printing, “plugins”, ... [Automatic processing: btred – no GUI] [Fast search (parallel processing): btred/ntred]
![Page 7: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/7.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
7
The “TrEd” Tree Editor Graphical tool
TrEd Main screen:
Original sentence: [This year’s flu seasonis still quiet in Europe.]
Editing windowcustomization
Run a macro
Multiwindowediting/compare
![Page 8: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/8.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
8
TrEd
![Page 9: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/9.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
9
Valency Lexicon in TrEd
to write sth (about sth)
![Page 10: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/10.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
10
Searching the treebank
TrEd (obviously) Programming possible (perl) Fast search (parallelization)
Netgraph Linguist-user-friendly Easy to write queries Not as flexible Java
![Page 11: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/11.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
11
Netgraph
Query
![Page 12: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/12.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
12
Netgraph
Search
results
![Page 13: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/13.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
13
Automatic annotation
Morphological analysis Tagging Parsing (surface) Tectogrammatical (deep) parsing
Tectogrammatical structure Co-reference Grammatemes
Generation
![Page 14: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/14.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
14
Morphological dictionary
Czech UFAL-developed C implementation 800k lemmas
English Open source Amorph-generated from data From WSJ
![Page 15: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/15.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
15
Tagging
Czech 10+ taggers Best: “MORCE”
Averaged perceptron + unsupervised + rules, > 96% Testing on spoken (ASR) input
English Off-the-shelf (97%) (…will retrain MORCE on WSJ/PTB) NB: within parsers (mostly)
![Page 16: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/16.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
16
Parsing
Czech McDonald et al.
MST + MIRA, 85-86% dep. Accuracy Labeling (afun)
C 5.0 or within parser, also ~ 85% accuracy
English Collins / Charniak NB: Phrase-based
![Page 17: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/17.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
17
Tectogrammatical parsing
Czech: TrEd-implemented, 4-step process Starts from analytic layer
English Rule-based so far
Too little data annotated Annotation underway currently
Starts from classical Collins/Charniak WSJ-type output
![Page 18: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/18.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
18
Tectogrammatical parsing - accuracy
Newest results: 4 phases Transformation
-based learning FnTBL
Largely langu-
age independent Coreference: >90%
m- and a-layer:Attribute manual autostructure 89,3 % 76,4 %functor 85,5 % 77,4 %val_frame.rf 92,3 % 90,9 %t_lemma 93,5 % 90,9 %nodetype 94,5 % 92,6 %gram/sempos 93,8 % 91,5 %a/lex.rf 96,5 % 95,1 %a/aux.rf 94,3 % 90,3 %is_member 94,3 % 89,5 %is_generated 96,6 % 95,2 %deepord 68,0 % 66,7 %
![Page 19: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/19.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
19
Word sense disambiguation
For words with valency frames All verbs Some nouns, adjectives
Valency frame ~ meaning (sense) Jiri Semecky’s work Accuracy on PDT: 70%+ Portable to English
No results yet
![Page 20: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/20.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
20
Generation
From TR to text Jan Ptacek’s work (cf. review meeting) Rule-based Czech: completed
Integrated with TTS (UWB) English: before completion of first version Results
No metrics yet, subjectively very good
![Page 21: PDT: The Tools](https://reader036.vdocument.in/reader036/viewer/2022062308/56815286550346895dc0ae2c/html5/thumbnails/21.jpg)
March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax
21
Some (more) pointers
http://ufal.mff.cuni.cz/pdt2.0 Current version of PDT, all three levels, 1.9/1.5/0.8
Mw
http://ufal.mff.cuni.cz/REST/CAC/CAC.html The Czech Academic Corpus, v 1.0
http://www.ldc.upenn.edu LDC2001T10 (PDT v1.0), LDC2004T23 (PADT 1.0),
LDC2004T25 (PCEDT 1.0)
http://www.clsp.jhu.edu: Workshop 2002 Using TL for MT Generation