chlt integration
DESCRIPTION
CHLT Integration. Integration in two directions. Interoperability with indexing structures of Perseus Digital Library Integration of parsers into indexing module of search and visualization tool. Integration with Structure of Perseus Digital Library. - PowerPoint PPT PresentationTRANSCRIPT
CHLT Integration
Integration in two directions Interoperability with indexing
structures of Perseus Digital Library
Integration of parsers into indexing module of search and visualization tool
Integration with Structure of Perseus Digital Library Perseus text display system transforms
XML and legacy SGML files tagged according to an arbitrary DTD and creates a consistent set of core data files that can be read by any application Sentences Chunks Lemmatized Inflected Catalog of works (PTEXT DB) Morphological Databases Short Definitions
File Locations
The surrogate files are written to a location that is associated with the unique ID assigned to the document in the PDL.
Each chunk or sentence also has a unique identifier
These two pieces of information can be used: To generate URLs to access full text in DL To generate human readable citations of the
sentences according to scholarly conventions
WP2 Integration: Word Profile Tool Word Profile tool reads lemmatized
files to acquire a complete list of words in IGL corpus
All frequency counts, display sentences, human readable citations, and links to full text are based on surrogate files generated by PDL.
WP2 Integration: Multi-Lingual IR Tool Author and language selection routines in MLIR
tool is dynamically generated from PDL metadata catalog
Database of translation equivalents is created directly from SGML/XML and saved as a core data file that is available to other applications in the system
Translation Equivalence Program works with any TEI conformant dictionary. Dictionary selection screen updates dynamically.
Translated query is handed off to current PDL search engine and the visualization tool based on
documented APIs
WP4 Integration: Old Norse Text and Parser Middleware translates Old Norse Parser
output to format used by PDL ISO Language tags in texts tell system to
use Old Norse morphology and link to Old Norse lexicon
PDL short definition program automatically extracts information from Zoega
WP4 & 6: Corpus Integration
TEI makes corpus integration easy Old Norse texts and lexica and
Neo-Latin texts are tagged according to TEI standards
Documentation of tagging conventions.
Parser Integration with WP1 Similar middleware can link LemLat to
PDL WP1 Visualization Tool also includes a
parsing/stemming step This program is designed generally to
work with many systems, not simply those created by PDL
Source code for LemLat and Old Norse so that search/visualization tool can be used to search Old Norse and Latin texts that are not part of PDL
Next Steps:
Implementation of parser integration with WP1
Seamless integration of MLIR tool and production deployment
Improved documentation of tags required for OAI linking