a linked coptic dictionary online · a linked coptic dictionary online latech-clfl 4, santa fe,...
Post on 13-Oct-2020
6 Views
Preview:
TRANSCRIPT
A Linked Coptic Dictionary Online
Amir Zeldes Georgetown University amir.zeldes@georgetown.edu
LaTeCH @COLING 2018 Santa Fe, NM, August 25, 2018
Frank Feder Akademie der Wissenschaften zu Göttingen frank.feder@mail.uni-goettingen.de
Maxim Kupreyev Berlin-Brandenburgische Akademie der Wissenschaften maxim.kupreyev@bbaw.de
Emma Manning Georgetown University esm76@georgetown.edu
Caroline T. Schroeder University of the Pacific cschroeder@pacific.edu
What and who? ⲟⲩ ̂ⲁⲩⲱ ⲛⲓⲙ
Part Ⲓ
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 2/25
Who are we?
CDO is the result of work in KELLIA – a bilateral German/American collaboration with members from:
Akademie der Wissenschaften zu Göttingen
Berlin-Brandenburgische Akademie der Wissenschaften
Georg-August-Universität Göttingen
Georgetown University
University of the Pacific
Funded by the German Research Foundation (DFG) and National Endowment for the Humanities (NEH)
“Digital Coptic”
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 3/25
What is Coptic? Last stage of Ancient Egyptian Language (starting 2nd Century)
Mediterranean in 1st millennium
Hellenistic period
Unique language
Longest continuous documentation
Strong contact variety (with Greek)
Religious significance
Early Christianity
Rise of monasticism
Gnosticism
...
Coptische Dialects
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 4/25
Goals
Provide a comprehensive online dictionary for Coptic
Match structure to existing Egyptological resources
Bi-directional linking:
Corpora to lexicon: click on words to search
Lexicon to corpora:
Link to attestations
Get quantitative information
Internal and external cross-references
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 5/25
Related work Main reference work: Crum’s dictionary (1939)
Scans: http://coptot.manuscriptroom.com/crum-coptic-dictionary
Ascii search headword index: http://marcion.sourceforge.net/
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 6/25
Related work
The database behind CDO is an expansion to the existing schema of Thesaurus Linguae Aegyptiae (TLA, http://aaew.bbaw.de/tla/, Feder 2016)
Headword structure follows TLA, inventory of entries follows primarily W.E. Crum’s (1939) A Coptic Dictionary (native Egyptian vocabulary)
Greek loan word component contributed by the DDGLC project (Leipzig/FU Berlin, http://research.uni-leipzig.de/ddglc/, Almond et al. 2013)
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 7/25
Related work
Increasingly, digital corpora of Coptic are available (Schroeder & Zeldes 2016, Behlmer & Feder 2017)
Scriptorium data freely available & searchable using ANNIS at https://corpling.uis.georgetown.edu/annis/scriptorium/
Richly annotated
Lemmatized data for lexical frequencies
Multiple linkable views of attestations
Linguistic analysis Normalized edition Diplomatic transcription
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 8/25
What kind of texts does this material come from? Sermons: Shenoute's Abraham our Father
"As for us, brethren, let us live by the truth so that we are upstanding in all our works, and so that the prophets, apostles and all the saints might dwell among us, ..."
Letters: Besa Letter to Aphthonia
".. you sent to your father and mother: 'they fought with me,' or 'they abused me'. It's you lying, as they didn't fight with you nor did they abuse you… "
Apophthegmata Patrum (sayings of the desert fathers)
"They said about the blessed Sarah the virgin that she spent sixty years living at the top of the river and she never set foot outside to see the river."
Bible, saints' lives, letters, documentary material… all freely available and searchable
Would you like to read these in Coptic?
How? ⲛ̄ⲁϣ ⲛ̄ϩⲉ
Part ⲒⲒ
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 10/25
Coptic grammar
Agglutinative language (or even polysynthetic, Loprieno 1995:51):
ⲁ-ϥ-ⲧⲣⲉ-ⲩ-ⲛⲁⲩ ⲉⲃⲟⲗ a-f-tre-u-nau ebol PST-3sgm-CAUS-3pl-see out
he made them have sight
Incorporation:
ϩⲱⲧⲃ hōtb ‘kill’
ϩⲉⲧⲃ-ⲯⲩⲭⲏ hetb-psychē ‘soul-kill’
Phrasal verbs! -> Multiword Expressions
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 11/25
Coptic grammar
Texts originally written without spaces
Later conventions separate ‘words’
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 12/25
Coptic grammar
Three levels of segmentation in text: (cf. Layton 2014)
ϩ︤ⲛ︥|ⲛⲉϥ|ϩⲃⲏⲩⲉ ⲙ̄|ⲙ︤ⲛ︦ⲧ︥-ⲣⲉϥ-ϩⲉⲧⲃ̄-ⲯⲩⲭⲏ hn̩|nef|hbēwe m̩|mn̩t-ref-hetb̩-psukhē in|his|deeds of|ness-er-kill-soul
Translation: in his deeds of soul-killing
4 morphemes
3 words
2 bound groups
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 13/25
CDO – XML representation
Use TEI dictionary module, adapted by Feder & Kupreyev
Standardized serialization of ISO 24613 LMF (Lexical Markup Framework, see Romary 2015)
Dictionary structured around <entry> elements
supply multiple <form> sub-elements
multiple orthographic <orth> elements possible
multiple senses
grammatical, etymological and bibliographic information
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 14/25
Schematic tree Unique ‘lemma’
form
oRef element for multi-word entries
Reserved subtypes for loan word etymologies
Inventories for dialects and POS
Trilingual definitions (DE, EN, FR)
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 15/25
Example <entry n="4" type="compound" xml:id="C6397"> <form type="lemma" xml:id="CF14818"> <orth>ϯⲧⲉϩⲓⲏ</orth><oRef>ϯ ⲧⲉ ϩⲓⲏ</oRef> </form> <gramGrp> <pos>Vb.</pos> <subc>Kompositverb</subc> </gramGrp> <xr type="cf"> <ref target="#ϯ">geben, zahlen, verkaufen</ref> </xr> <sense> <cit type="translation" xml:lang="de"> <quote>Gelegenheit geben, erlauben</quote> </cit> <cit type="translation" xml:lang="en"> <quote>give road, means</quote> </cit>
<cit type="translation" xml:lang="fr"> <quote>permettre, accorder</quote> </cit> <cit> <bibl>CD 646a-b; KoptHWb 356; ChLCS 88b </bibl> </cit> </sense> </entry>
• ~11,900 entries • Consistency enforced via
XSD schema
https://github.com/KELLIA/dictionary
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 16/25
Web interface
In practice users interact with browser
XML files compiled to SQL database (work by E.Manning)
Entries linked to Coptic Scriptorium’s ANNIS search engine (Krause & Zeldes 2016)
Automatic generation of ANNIS Query Language (AQL) to find attestations -> generating correct query non-trivial!
SQL database enriched with frequency information via ANNIS REST API -> continuous update of frequency info
Scriptorium NLP tools (Zeldes & Schroeder 2016) resolve inflected forms on the fly
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 17/25
Quick search: automatic language recognition, regex
Virtual keyboard
Dialect sigla from XML (partial coverage)
Mapping to POS tags used by NLP tools
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 18/25
Entry navigation
Related (multiword expression) entry in
lexicon
Link to scanned online edition of Crum’s dictionary
Corpus frequencies
Corpus search
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 19/25
Corpus linkup – CDO->CS (lemma)
word pos lemma freq rank
ⲕⲱⲧⲉ V ⲕⲱⲧⲉ 195 3.96
ⲕⲱⲧⲱⲛ N ⲕⲱⲧⲱⲛ 0 0
ⲕⲱⲧ V ⲕⲱⲧ 118 2.4
ⲕⲱⲫⲟⲛ N ⲕⲱⲫⲟⲛ 0 0
ⲕⲱⲱⲃⲉ V ⲕⲱⲱⲃⲉ 2 0.04
…
AQL: lemma="ⲕⲱⲧ"
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 20/25
Corpus linkup – CDO->CS (MWE)
word pos lemma freq rank
ⲕⲱⲧⲉ V ⲕⲱⲧⲉ 195 3.96
ⲕⲱⲧⲱⲛ N ⲕⲱⲧⲱⲛ 0 0
ⲕⲱⲧ V ⲕⲱⲧ 118 2.4
ⲕⲱⲫⲟⲛ N ⲕⲱⲫⲟⲛ 0 0
ⲕⲱⲱⲃⲉ V ⲕⲱⲱⲃⲉ 2 0.04
…
<oRef> ⲣ ⲡ ⲕⲉ- </oRef>
AQL: norm="ⲣ" . norm="ⲡ" . norm="ⲕⲉ"
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 21/25
Corpus linkup – CDO->CS (unknown)
word pos lemma freq rank
ⲕⲱⲧⲉ V ⲕⲱⲧⲉ 195 3.96
ⲕⲱⲧⲱⲛ N ⲕⲱⲧⲱⲛ 0 0
ⲕⲱⲧ V ⲕⲱⲧ 118 2.4
ⲕⲱⲫⲟⲛ N ⲕⲱⲫⲟⲛ 0 0
ⲕⲱⲱⲃⲉ V ⲕⲱⲱⲃⲉ 2 0.04
…
AQL: norm_group= /.*ϫⲉⲙⲡⲓⲣⲏϯ.*/ | orig_group= /.*ϫⲉⲙⲡⲓⲣⲏϯ.*/
ϫⲉⲙ(ⲡⲓ)ⲣⲏϯ ‘find means’
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 22/25
Corpus linkup – CS->CDO (lemma only)
lemma links
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 23/25
Evaluation
Lexicon coverage based on 80K running lexical items in Scriptorium corpora (~75K excl. punctuation, lacunae)
Main issues:
Productive morphology -> rely on NLP tools?
Foreign vocabulary -> cover using DDGLC data
tokens types covered total % covered total %
all lemmas 65,347 74,744 87.43 1,079 2,602 41.47 names ok 66,273 74,744 88.67 1,264 2,602 48.58 foreign ok 71,763 74,744 96.01 1,976 2,602 75.94 both ok 72,689 74,744 97.25 1,991 2,602 76.52
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 24/25
Outlook - WIP
Current work on integrating DDGLC Greek vocabulary
coming very soon: ~3900 new entries!!
Corpus links:
Add MWE links using new annotation layer in corpora
Add links to sub-word morphological analysis for productive derivations
Exploratory tools for frequency and vocabulary lists
Distribution and similarity metrics
LaTeCH-CLfL , Santa Fe, August 25, 2018 A Linked Coptic Dictionary Online 25/25
Ⲙⲓⲱⲧ︤ⲛ︥ ⲧⲱⲛⲟⲩ! =
This work was supported by joint funding from the National Endowment for the Humanities Office of Digital Humanities and a bilateral NEH (HG-229371)/ DFG project (273503199)
We also thank the Berlin-Brandenburg Academy of Sciences (BBAW), the Göttingen Academy of Sciences and Humanities, and in particular the Digital Edition of the Coptic Old Testament Project, as well as the DDGLC project at Leipzig and FU Berlin for their contributions to the lexicon
We thank Sonja Dahlgren, Julien Delhez, Lena Krastel, Tonio Sebastian Richter and Anne Sörgel who contributed to compiling the lexical data and Mitchell Abrams for contributions to the Web interface
mjo:tn̩ to:nu well-being+your.PL greatly
top related