a unified database of dependency treebanks integrating, quantifying & evaluating dependency data
DESCRIPTION
A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data. Olga Pustylnikov, Alexander Mehler Bielefeld University. Motivation. Exploring similarities among languages by means of syntactic treebanks We collected a database covering 11 languages - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/1.jpg)
Olga Pustylnikov, Alexander Mehler
Bielefeld University
A Unified Database of Dependency Treebanks
Integrating, Quantifying & EvaluatingDependency Data
![Page 2: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/2.jpg)
SFB 673Motivation
Exploring similarities among languages by means of syntactic treebanks
We collected a database covering 11 languages
Treebanks have been developed separately by different research projects
quantitative investigations on these treebanks -> the need for unification
![Page 3: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/3.jpg)
SFB 673Motivation
(+) generic: allowing to represent as many treebanks as possible
(+) extensible to new treebanks
(+) complete: preserving all corpus specific information
(+) transferable to other kinds of corpora
(–) complex: exhibiting the minimal
complexity
-> graph representations
Demands on the unified format of treebanks
![Page 4: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/4.jpg)
SFB 673Motivation
Graph eXtensible Language is a graph model representig corpora in terms of graphs
XML
GXL
WIKI
MultimodalData
Treebanks
TOOLS
GXL (Holt et al., 2006)
GXL can be applied to any kinds of corpora. (See e.g. Mehler and Gleim (2005), Ferrer i Cancho et al. (2007), Pustylnikov and Mehler (2008))
TreebankseGXL
![Page 5: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/5.jpg)
1. eGXL
2. Data
3. Complexity Evaluation
4. Application
5. Conclusion
SFB 673Agenda
![Page 6: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/6.jpg)
SFB 673eGXL
Sentences
Types
IDREF
<graph id=“Types”>
<node id=“POS” />
<node id=“t245” name=“VERB” />
…
</graph>
<graph id="Sentences">
<graph id="g8">
<node id="s8_1" form="Detta" pos="t151" />
<node id="s8_2" form="vill" pos="t245" />
...
<rel>
<relend direction="in" target="s8_2" />
<relend direction="out" target="s8_1" />
</rel>
...
</graph>
2-level data model
![Page 7: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/7.jpg)
SFB 673The eGXL Sentences-graph
vill
Detta bestämtjag bemöta .
<graph id="Sentences">
<graph id="g8">
<node id="s8_1" form="Detta" pos="t151" />
<node id="s8_2" form="vill" pos="t245" />
...
<rel>
<relend direction="in" target="s8_2" />
<relend direction="out" target="s8_1" />
</rel>
...
</graph>
each token of a treebankeach token of a treebank
word formword forman IDREF to the POS-node of the Types-graph
an IDREF to the POS-node of the Types-graph
a (syntactic) relationa (syntactic) relation
from (e.g. a head verb)
to (e.g. a dependent argument)
from (e.g. a head verb)
to (e.g. a dependent argument)
![Page 8: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/8.jpg)
1. eGXL
2. Data
3. Complexity Evaluation
4. Application
5. Conclusion
SFB 673Agenda
![Page 9: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/9.jpg)
SFB 67311 Dependency Treebanks
7 different formats
![Page 10: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/10.jpg)
SFB 673Input vs. Output Formats
Examples from Dutch, Swedish, Italian treebanks
![Page 11: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/11.jpg)
SFB 673Unification is possible…
… due to the separation of the core from the secondary parts
<graph id=“Types”>
<node id=“POS” />
<node id=“t245” name=“VERB” />
…
</graph>
<graph id="Sentences">
<graph id="g8">
<node id="s8_1" form="Detta" pos="t151" />
<node id="s8_2" form="vill" pos="t245" />
...
<rel>
<relend direction="in" target="s8_2" />
<relend direction="out" target="s8_1" />
</rel>
...
</graph>
diversity
commonality
![Page 12: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/12.jpg)
SFB 673The TreebankWiki
http://ariadne.coli.uni-bielefeld.de/wikis/treebankwiki/
![Page 13: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/13.jpg)
1. eGXL
2. Data
3. Complexity Evaluation
4. Application
5. Conclusion
SFB 673Agenda
![Page 14: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/14.jpg)
SFB 673Complexity of eGXL
Logical Scalling Factor (LSF): number of logical elements (e.g. XML-element) required to represent a treebank unit (e.g. a word form, POS etc.) node rel
eGXLothereGXLother
![Page 15: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/15.jpg)
1. eGXL
2. Data
3. Complexity Evaluation
4. Application
5. Conclusion
SFB 673Agenda
![Page 16: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/16.jpg)
SFB 673DTDB
![Page 17: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/17.jpg)
1. eGXL
2. Data
3. Complexity Evaluation
4. Application
5. Conclusion
SFB 673Agenda
![Page 18: A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data](https://reader035.vdocument.in/reader035/viewer/2022062315/5681538b550346895dc18cad/html5/thumbnails/18.jpg)
SFB 673Conclusions
a database covering 11 languages eGXL – a generic XML graph model adopted to syntactic
treebanks use of treebanks within a single application (Ariadne)
[email protected]@uni-bielefeld.de
SFB 673Thank you for your attention!