enabler/elsnet workshop, 28-29 august 2003 an ontology-based knowledge portal for language...
TRANSCRIPT
ENABLER/ELSNET Workshop, 28-29 August 2003
An Ontology-Based
Knowledge Portal
for Language Technology
Hans Uszkoreit, Brigitte Jörg, Gregor Erbach
ENABLER/ELSNET Workshop, 28-29 August 2003
Project COLLATE
Theme: Computational Linguistics and Language Technology for
Real World Applications
Partners: DFKI Saarbrücken, Saarland University
Support: A Grant by the German Federal Ministry for Education
and Research for RTD strengthening the position of
Saarbrücken as a Competence Center for
Language Technology
PIs: Hans Uszkoreit, Manfred Pinkal and Wolfgang Wahlster
Duration: Spring 2001 - end of 2003
ENABLER/ELSNET Workshop, 28-29 August 2003
Information Service about Language Technology
www.lt-world.org
Ontology-based
XML Import and Export Formats
Visual and Structural Design
Information Center: LT World
ENABLER/ELSNET Workshop, 28-29 August 2003
Objectives
distributed information service combines and offers for each aspect of LT the best contents
available exploits hypermedia technology for including useful contents is flexible and scalable enough to support the evolution of the
discipline exhibits a structure that is transparent for both experts and
visitors from outside the field increasingly utilizes language and knowledge technologies for
improved management and presentation of the information. is open for exchange of data with other information services potential for interoperability with future knowledge services is suited for the sophisticated metadata schemes of the envisaged
semantic web
ENABLER/ELSNET Workshop, 28-29 August 2003
LT World - Levels and Tasks
underlying logicalstructure
data maintenancestructure
presentationalstructure
ontology specifications
concrete architecture
XML specificationsDBs, XML pages,
HTML pages
generic designCI
actual designof pages
selection of sourcesorganization of
collection/production
content in DBs,documents, links
presented contents
Conceptual Level Specification LevelTechnical Realization
LevelContent Level
ENABLER/ELSNET Workshop, 28-29 August 2003
User View: Four Top Level Areas
Information and Knowledge
Players and Teams
Resources and Results
Communication/Interaction
ENABLER/ELSNET Workshop, 28-29 August 2003
Information and Knowledge
Basic knowledge about all areas of LT
source: Survey of the State of the Art in Human Language Technology (1997, new edition in preparation)
Pointers to specialized knowledge (links to literature, projects, systems, products, people, resources, standards...) source: link collection by DFKI
Glossary of the fieldsource: DFKI with input from HLT Survey
ENABLER/ELSNET Workshop, 28-29 August 2003
Players and Teams
DB with all researchers in LTnames, affiliations, links to homepagesnumber of entries: 2235
DB of projectsnumber of entries: 659
DB of research organisations, companies, funding agenciesnumber of entries: 1561
ENABLER/ELSNET Workshop, 28-29 August 2003
Resources and Results
DB of prototypes, research systems and productssource: ACL Software Registry (operated by DFKI)
Links to resource initiatives: ELRA, LDC,
For resources link to search service of OLAC
ENABLER/ELSNET Workshop, 28-29 August 2003
Communication/Interaction
News about technologies, people, products, centers, etc.source: collection by DFKI and contributions by usersnumber of entries: 370
List of Events: Conferences, Workshops, Summer Schools,etc.source: collection by DFKI and contributions by usersnumber of entries: 251
Links Topic-Centered Mailing Listssource: collection of existing lists
ENABLER/ELSNET Workshop, 28-29 August 2003
Usage of LT World
MonthUnique visitors
Number of visits
Pages Hits Bandwidth
Jan 795 1502 15185 33635 171.58 MB
Feb 808 1443 12127 28622 140.89 MB
Mar 1036 1780 15751 41622 199.38 MB
Apr 989 1778 17994 47452 231.71 MB
May 1006 1922 16143 44624 180.93 MB
Jun 944 1963 18458 42912 237.69 MB
Jul 912 2103 16066 41712 208.18 MB
… -- -- -- -- --
Total 6496 12499 111745 280600 1.34 GB
ENABLER/ELSNET Workshop, 28-29 August 2003
Systematics of the Discipline
Mature scientific or engineering disciplines have developed a systematics of the subject
Younger disciplines have outgrown their first systematics
LT or CL does not yet have a systematics or a classification scheme
ENABLER/ELSNET Workshop, 28-29 August 2003
Logical Structuring: Two Options
Tree-Structured Classification
Libraries
Encyclopedias and Handbooks
Multidimensional Structuring
Multiple-Inheritance Hierarchies
And-Or Hierarchies
ENABLER/ELSNET Workshop, 28-29 August 2003
Means for Ordering
Terminology
Thesaurus
Classification vs. Systematics
Taxonomy = Classification + Nomenclature
Ontology formal ontology relational ontology
ENABLER/ELSNET Workshop, 28-29 August 2003
Our Setup
Immediately visible structure: easy and transparent
Some multidimensional structuring through chapter structure of the Survey
For internal storage and DB search: complex multidimensional structure
Underlying systematics: multilayered and multidimensional ontology
ENABLER/ELSNET Workshop, 28-29 August 2003
Ontologies
Theoretical Ontologies Epistemological reasons Phenomenological systematics
Practical Ontologies Support of processes Data Maintenance Information Services
ENABLER/ELSNET Workshop, 28-29 August 2003
Systematics/Ontologies
Generic Core: Dublin Core
Special Ontologies underlying exchange formats for special information types such as OLAC (for linguistic resources) BibTex (for scientific literature) Languages (for language codes)
Generic ontologies for the scientific discipline and technology sector
General Multidimensional Classification for CL and LT
ENABLER/ELSNET Workshop, 28-29 August 2003
Science Actor Subject NewKnowledge (Scientific)Means
Research Actors Subject ResearchGoals Means
ResearchProject Actors Subject ResearchGoals Means Duration
Applied Science Actor SubjectNewKnowledgeMeansApplications
Applied ResearchActorsSubjectResearchGoalsMethodsApplications
Applied ResearchProjectActorsSubjectResearchGoalsMethodsDurationApplications
ENABLER/ELSNET Workshop, 28-29 August 2003
Funded Research Project
Name Acronym Full Name
Actors Organizations PI Other Roles Researchers
Subject Discipline/Area
Objectives Goals Means Program
Duration StartDate EndDate
Funding Agency Program Funding Number
ENABLER/ELSNET Workshop, 28-29 August 2003
SearchScience Production
Technology
Education
ExtraScientific PurposeResearchScientific
Education
Applied Research
ExtraScientific Purpose
Technical Product
ENABLER/ELSNET Workshop, 28-29 August 2003
Multidimensional Classification for CL and LT
Dimensions Generic:
Type of Resource (web page, metaindex, publication, person, product, patent, project, ...)
PeopleGeolocationDate/Comments
Disciplin--Specific (not all may apply for a given resource)
Application (grammar checking, text translation, IR)Linguality (monolingual, bilingual,multilingual, translingual, language-
inde) Languages/Language Pairs (Romanian, Thai, <en-fr>,...)Technologies (HMM, FSA, EBT, linear programming, ...)Linguistic Area (morphology, syntax, pragmatics,...)
Linguistic Approach (Two-Level Morpology, systemic functional g., DRT)
ENABLER/ELSNET Workshop, 28-29 August 2003
Excerpt from the Ontology
Dublin Core
OLACLanguages
LT World
Language Technology
Technology
BibTex
Information & Knowledge
Teams &Players
Systems &Resources
Communication& Events
Publications
ENABLER/ELSNET Workshop, 28-29 August 2003
Area Nodes
Example of the shallow hierarchy for technologies
Text Technologies ...
Text Summarization...
Information Extraction• Named Entity Recognition• Terminology Extraction• Relation Extraction• Answer Extraction
...
Text Generation...
ENABLER/ELSNET Workshop, 28-29 August 2003
Main Info for Each Subject Area
Name Acronyms aka‘s, Term Translations Short Definition Explanation Topic Websites R&D Prototypes/Products Projects People Literature
ENABLER/ELSNET Workshop, 28-29 August 2003
Ontology Modelling and Interchange Formats
Ontologies maintained with Protégé 2000
Ontology Modelling with Protégé
Export / Interchange Formats
ENABLER/ELSNET Workshop, 28-29 August 2003
Protégé: Class View
ENABLER/ELSNET Workshop, 28-29 August 2003
Protégé: Slot ViewProtégé: Slot View
ENABLER/ELSNET Workshop, 28-29 August 2003
Protégé: Form View (Input-Configuration)Protégé: Form View
ENABLER/ELSNET Workshop, 28-29 August 2003
Protégé: Instance View (Input-Interface)Protégé: Instance View
ENABLER/ELSNET Workshop, 28-29 August 2003
<LT:System rdf:about="<LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="[email protected]" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"><LT:resource.description>Babel is a Prolog System with Web-Interface in
Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description>
<LT:dc.language rdf:resource="<English"/><LT:lt.languages rdf:resource="<German"/><LT:dc.creator rdf:resource="<LT_00399"/><LT:developed-by rdf:resource="<LT_00399"/><LT:dc.rights rdf:resource="<ont_051002_00178"/><LT:developed-by rdf:resource="<ont_051002_00209"/><LT:olac.format.os>Windows 95</LT:olac.format.os><LT:olac.format.os>Windows NT</LT:olac.format.os>
</LT:System>
Protégé: RDF-Export Instance of the Babel systemProtégé: RDF-Export
Re
lati
on
sA
ttri
bu
tes
ENABLER/ELSNET Workshop, 28-29 August 2003
Protégé: RDF-Export Instance of the Babel system
Re
lati
on
s
Protégé: RDF-Export
<LT:System rdf:about="<LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="[email protected]" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"><LT:resource.description>Babel is a Prolog System with Web-Interface in
Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description>
<LT:dc.language rdf:resource="<English"/><LT:lt.languages rdf:resource="<German"/><LT:dc.creator rdf:resource="<LT_00399"/><LT:developed-by rdf:resource="<LT_00399"/><LT:dc.rights rdf:resource="<ont_051002_00178"/><LT:developed-by rdf:resource="<ont_051002_00209"/><LT:olac.format.os>Windows 95</LT:olac.format.os><LT:olac.format.os>Windows NT</LT:olac.format.os>
</LT:System>
ENABLER/ELSNET Workshop, 28-29 August 2003
Organizational Issues
Division of Labour
In the beginning all contents and references were collected and maintained by DFKI
Input of the authors/ area specialists of the Survey for distributed authoring and content maintenance
Input from the LT community via HTML forms and XML import format
News and conferences maintained and updated by DFKI
ENABLER/ELSNET Workshop, 28-29 August 2003
Relationships to External Resources
Included but autonomous resources: ACL NL Software Registry, Language Technology Survey
Systematically cross-Linked and Cross-Searchable Resources: all OLAC Resources such as (LDC, SIL, ACL SR, and OLAC Home)
Systematically crosslinked resources: HLT Central, ELSNET, EACL ACL NLP Universe
Linked resources: All other relevant resources relevant for LT