1 the digital du cange: moldy old tomes make an internet comeback andrew gollan and ross scaife...
TRANSCRIPT
1
The Digital du Cange:
Moldy Old Tomes Make an Internet Comeback
Andrew Gollan and Ross Scaife
Modern and Classical Languages, Literatures, and Cultures
February 23, 2005
3
Major Periods of Latin Literature
Archaic: down to c. 100 BCE Classical: c. 100 BCE to c. 200 CE Late Antique: c. 200 CE to 500 CE Medieval: c. 500 CE to c. 1400 CE Renaissance (Humanistic): c. 1400 CE
to c. 1700 Neolatin: c. 1700 CE to present
4
Some Lexica for Archaic and Classical Latin Thesaurus linguae latinae (TLL). 1900+ Forcellini. Totius latinitatis lexicon.
1755-1887 Lewis and Short. 1879 Oxford Latin Dictionary. 1968-1982
5
Souter: A glossary of Later Latin to 600 A.D. 1964
C. du Fresne, seigneur Du Cange. Glossarium ad scriptores mediae et infimae latinitatis, 10 vols. 1678-1887
J.F. Niermeyer. Mediae latinitatis lexicon minus.... 1964-76
Some Lexica for late Antique and Medieval Latin
6
Some Lexica for Humanistic and Neo- Latin Egger (Vatican): Lexicon recentis
Latinitatis: a dictionary of contemporary Latin. 1992
Hoven. Lexique de la prose latine de la renaissance. 1994
7
Problems with the status quo Physical access extremely limited: rare
and expensive resources Thesaurus Linguae Latinae (TLL)
moving at a snail’s pace inadequate application of computers
Copyright restrictions vs. open access Desirability of corpus-based approach
8
Latin on line so far What about all the databases Young
Library subscribes to? David Packard’s PHI: more promising Perseus Latin corpus: available but
limited Lewis and Short
Available in TEI-XML from Perseus Limited electronic extensions in place
9
Our goals and principles Incremental approach adding value at
each level Push the envelope technically in how
one goes about lexicography Network effect: leverage the distributed
community Open Access licensing of all data
(Creative Commons)
10
Where we are now: Stage I Indexed page images (scans, OCR, and
CGI) Complete transcriptions with simple
markup Gradually more elaborate markup of
original lexica
11
What we want to do: Stage II Merge the parallel lemmata in the
separate lexica: this is the hard part! Concatenate lemmata Map common citations Apply WordNet or similar approach to
group semantically related definitions Create interface that allows humans to
clean up the resulting mess
12
What we want to do: Stage III Implement methods for collaboration on
extending this resource Multiple perspectives: from a single
source, generate thesauri, lexica for particular authors or genres or locales or periods
Die happy.
13
UK: a good locale for this work Strong interest in post-Classical Latin
among faculty and graduate students History of projects in humanities
computing related to classical languages and cultures
14
A propitious moment… Computational humanists standardizing on
XML (mostly TEI) markup of texts Powerful tools emerging for working with
XML, e.g. editors and XML-aware databases Also: important recent advances in
computational linguistics (machine learning systems)
OCR getting much better