patrizia paggio center for sprogteknologi a modular and scalable environment for the semantic web
Post on 18-Dec-2015
215 views
TRANSCRIPT
Patrizia Paggio
Center for Sprogteknologi
A Modular and Scalable Environment for the Semantic WEB
IT-højskolen 2
Goals
To develop an innovative methodological environment and software that will enable content providers to build a knowledge grid in which the content of WEB pages can be managed in a modular
and scalable way, and queries can be posed in natural language to extract
relevant content from the grid based on the underlying ontologies
Testbed: a demonstrator to search university sites
IT-højskolen 3
Domain areas involved
Semantic Web Ontology mapping Knowledge management Topic maps Text and data mining Intelligent agents
IT-højskolen 4
NL-based, intelligent content search
IT-højskolen 5
MOSES consortium
FINSA, Italian software company (agent technology, requirements engineering)
Mondeca, French software company (Knowledge management and semantic markup, graph theory)
Parabots, Dutch software company (text and data mining)
Rome III Univ. (user partner, graph theory)
Rome II Univ. (language technology, machine learning)
CST (language technology, content-based search)
IT-højskolen 6
MOSES consortium
UsersTechnology suppliersC
o-o
rdin
atio
nResearch
IT-højskolen 7
Planning
IT-højskolen 8
The semantic web
The present web is a collection of texts for humans to inspect and use.
On the semantic web, texts are structured (marked up) so that programs (agents) can manipulate them.
The semantic structure refers to common repositories e.g. ontologies.
IT-højskolen 9
A scenario
A student/researcher looking for information on university courses or research activities in Europe.
“I need a list of institutes offering post-graduate courses in computational linguistics including corpus linguistics where the teaching language is ...”
“Which Danish university offers Danish language courses for foreign students?”
IT-højskolen 10
Our vision
Content of web pages is structured according to relevant templates and ontologies (the project will create those relevant for the domain)
Help is provided by the system to find the templates that best match the pages to be marked up
Search is based not on the words in the text, but on the semantic templates
A linguistic agent processes the results to generate relevant answers
IT-højskolen 11
Our vision
“I need a list of institutes offering ...”
“The following institutes offer post-graduate courses in computational linguistics including corpus linguistics: ...”
“Which Danish university offers...”
“The University of Cph offers Danish language courses for foreign students”
IT-højskolen 12
Main work packages
1. Requirements and domain analysis
2. Architecture design
3. Semantic structure and tools
4. Implementation of agents
5. Content-based engine
6. Test and validation
IT-højskolen 13
Query analyser
Investigate methods and develop tools to analyse user queries and convert them into semantic descriptions.
Based on a realistic corpus of questions/queries.
Use of shallow linguistic analysis.
Specific linguistic items, e.g. interrogative pronouns.
mapping
analysisda_query_1
da_ontology
da_analysed_query_1
search
it_ontology
it_analysed_query_1
search
Multilingual search as ontological mapping
IT-højskolen 15
A1 T2R2
T4
R4
T1 R1
A2 T6R6R7
T5 R5 T3R3
Topic maps
Topics and associations
A CSR2
EM
R3
BV R1
“Bernard Vatantis instructor of a tutorial
about Content Structure Engineeringhold at Extreme Markup Languages 2002”
Association Example
This association represents an assertion about three topics One person : “Bernard Vatant” One space-time event : “Extreme Markup Languages 2002” One concept : “Content Structure Engineering”
IT-højskolen 17
Example
“List alle lektorerne i italiensk i efteråret 2003”
(List all associated professors in Italian in the Autumn 2003)
list-all(x) [lektor(x), subject(italian),
time(autumn-2003)]
IT-højskolen 18
Example, cont.
At = course-assoc
Rt1 = instructor
Rt2 = subject
Rt3 = institution
Rt4 = time
instructor instructor
professore professore ricercatore professor lektor UA
ordinario associato
IT-højskolen 19
Example, cont.
list-all(x) [lektor(x), subject(italian),
time(autumn-2003)]
list-all(x) [instructor(x), OR list-all(x) [prof-associato(x),
subject(italian), subject(italian),
time(autumn-2003)] time(autumn-2003)]
IT-højskolen 20
Answer generation - example
“kurser i datalingvistik” (courses in computational linguistics)
...educational programme in computational linguistics, Göteborg University. A Swedish program offering bachelor's and master's degrees
...Lund University’s curriculum 2001-2002. Computational linguistics deal with automatic analysis of texts and other linguistic material...
(Result of a Google search: texts are not tagged with concepts!
Bold face added to relevant information)
IT-højskolen 21
Answer generation, cont.
“I have found the following courses:”...educational programme in computational linguistics, Göteborg
University. A Swedish program offering bachelor's and master's degrees
...Lund University’s curriculum 2001-2002. Computational linguistics deal with automatic analysis of texts and other linguistic material...
The introductory sentence should be in the language of the query!
IT-højskolen 22
Answer generation, cont.
I have found the following courses: Göteborg University, bachelor’s and master’s degrees
+ link Lund University + link
Introductory sentence and relevant concepts (bachelor’s and master’s degrees) should be in the language of the query!
IT-højskolen 23
More information
MOSES web site coming up soon
Link from www.cst.dk
THANK YOU