[dcsb] dr francesco mambrini (chs, usa/dai, germany), "treebanking in the world of thucydides....
TRANSCRIPT
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
Treebanking in the World of ThucydidesLinguistic annotation for the Hellespont Project
Francesco Mambrini
Center For Hellenic Studies
Deutsches Archäologisches Institut
November 20 2012
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
Outline
1 What digital corpora for Ancient History?The questions at handData-driven approaches
2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
Outline
1 What digital corpora for Ancient History?The questions at handData-driven approaches
2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
A web of knowledge
Figure: A simplified model
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
Interconnectedness: the problem
The multivalent nature of historical thought [. . . ]eludes the keyword-indexed approach to the Webtoday on offer through Google and other searchengines. Though we can summon up an exhaustivelist of Web resources that contain the words “Gallipoli”and “sources”, today’s Web cannot effectively respondto a basic historical question such as, “which sourcesattest the Gallipoli Campaign of World War I?”
B. Robertson
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
CIDOC Conceptual Reference Model
Objects represented as being part of events
Figure: by Doer and Stead 2009
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
One more problem!Know what our sources are!
big and complex works; e.g. Thucydides:6.126 sentences, 167.512 wordsca 30 years of war, + 50 years in digression, references thatgo back to before the Trojan War!
Unstructured natural languageWritten in Ancient GreekControversial (interpretation and textual reconstruction)Literary work (= shaped by discursive and ideologicalstrategies)
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
Outline
1 What digital corpora for Ancient History?The questions at handData-driven approaches
2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
Ontologiemodellierung für die Erforschung vonRitualstrukturen (SBF 619, Heidelberg)
Figure: Event extraction from texts
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
NLP Pipeline
NLP Process Ancient Greek?
Chunking
Lemmatization
POS-tagging
Syntactic parsing
Word-sense disambiguation
Co-reference resolution
Semantic role annotation
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
Using and Enhancing the available resourcesThe Ancient Greek Dependency Treebank
AGDT: treebank with word-by-word morphological anddependency-based syntactical description
a step forward: semantic information
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The questions at handData-driven approaches
A syntactic treeThuc. 1.89.1
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
Outline
1 What digital corpora for Ancient History?The questions at handData-driven approaches
2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
A case studyAthens, 479-431 BCE
Goal:Connecting textual and archaeological sources in thePerseus DL and Arachne via CIDOC-CRM
Steps:Enriching the text of one source (Thucydides) withlinguistic and historical informationIdentify and mark events on the text
manuallydata-driven approach
Integrating secondary literature (through data miningalgorithms)
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
Toward a 3-level scenarioMorphology and Syntax
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
Toward a 3-level scenario+ semantic and pragmatical information
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
Outline
1 What digital corpora for Ancient History?The questions at handData-driven approaches
2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
With tectogrammatical annotation:
Our text is:1 easier to browse for content-related search (easier to use
in digital environments)2 more informative on historically relevant questions
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
With tectogrammatical annotation:
Our text is:1 easier to browse for content-related search (easier to use
in digital environments)2 more informative on historically relevant questions
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
With tectogrammatical annotation:
Our text is:1 easier to browse for content-related search (easier to use
in digital environments)2 more informative on historically relevant questions
Hellespont Project
What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118
The Hellespont ProjectExamples
Conclusions
1 Currently, our literary sources are not structured forsemantic, event-based queries
2 NLP processes for event extraction are not yet capable ofhandling raw Ancient Greek texts
3 NLP tools and techniques are adaptable to the taskprovide standardshelp and speed manual annotation(incidentally) they add a lot of information on linguisticaspects of the documentary sources
Hellespont Project