patrizia paggio center for sprogteknologi a modular and scalable environment for the semantic web

23
Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

Patrizia Paggio

Center for Sprogteknologi

A Modular and Scalable Environment for the Semantic WEB

Page 2: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 2

Goals

To develop an innovative methodological environment and software that will enable content providers to build a knowledge grid in which the content of WEB pages can be managed in a modular

and scalable way, and queries can be posed in natural language to extract

relevant content from the grid based on the underlying ontologies

Testbed: a demonstrator to search university sites

Page 3: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 3

Domain areas involved

Semantic Web Ontology mapping Knowledge management Topic maps Text and data mining Intelligent agents

Page 4: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 4

NL-based, intelligent content search

Page 5: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 5

MOSES consortium

FINSA, Italian software company (agent technology, requirements engineering)

Mondeca, French software company (Knowledge management and semantic markup, graph theory)

Parabots, Dutch software company (text and data mining)

Rome III Univ. (user partner, graph theory)

Rome II Univ. (language technology, machine learning)

CST (language technology, content-based search)

Page 6: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 6

MOSES consortium

UsersTechnology suppliersC

o-o

rdin

atio

nResearch

Page 7: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 7

Planning

Page 8: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 8

The semantic web

The present web is a collection of texts for humans to inspect and use.

On the semantic web, texts are structured (marked up) so that programs (agents) can manipulate them.

The semantic structure refers to common repositories e.g. ontologies.

Page 9: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 9

A scenario

A student/researcher looking for information on university courses or research activities in Europe.

“I need a list of institutes offering post-graduate courses in computational linguistics including corpus linguistics where the teaching language is ...”

“Which Danish university offers Danish language courses for foreign students?”

Page 10: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 10

Our vision

Content of web pages is structured according to relevant templates and ontologies (the project will create those relevant for the domain)

Help is provided by the system to find the templates that best match the pages to be marked up

Search is based not on the words in the text, but on the semantic templates

A linguistic agent processes the results to generate relevant answers

Page 11: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 11

Our vision

“I need a list of institutes offering ...”

“The following institutes offer post-graduate courses in computational linguistics including corpus linguistics: ...”

“Which Danish university offers...”

“The University of Cph offers Danish language courses for foreign students”

Page 12: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 12

Main work packages

1. Requirements and domain analysis

2. Architecture design

3. Semantic structure and tools

4. Implementation of agents

5. Content-based engine

6. Test and validation

Page 13: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 13

Query analyser

Investigate methods and develop tools to analyse user queries and convert them into semantic descriptions.

Based on a realistic corpus of questions/queries.

Use of shallow linguistic analysis.

Specific linguistic items, e.g. interrogative pronouns.

Page 14: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

mapping

analysisda_query_1

da_ontology

da_analysed_query_1

search

it_ontology

it_analysed_query_1

search

Multilingual search as ontological mapping

Page 15: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 15

A1 T2R2

T4

R4

T1 R1

A2 T6R6R7

T5 R5 T3R3

Topic maps

Topics and associations

Page 16: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

A CSR2

EM

R3

BV R1

“Bernard Vatantis instructor of a tutorial

about Content Structure Engineeringhold at Extreme Markup Languages 2002”

Association Example

This association represents an assertion about three topics One person : “Bernard Vatant” One space-time event : “Extreme Markup Languages 2002” One concept : “Content Structure Engineering”

Page 17: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 17

Example

“List alle lektorerne i italiensk i efteråret 2003”

(List all associated professors in Italian in the Autumn 2003)

list-all(x) [lektor(x), subject(italian),

time(autumn-2003)]

Page 18: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 18

Example, cont.

At = course-assoc

Rt1 = instructor

Rt2 = subject

Rt3 = institution

Rt4 = time

instructor instructor

professore professore ricercatore professor lektor UA

ordinario associato

Page 19: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 19

Example, cont.

list-all(x) [lektor(x), subject(italian),

time(autumn-2003)]

list-all(x) [instructor(x), OR list-all(x) [prof-associato(x),

subject(italian), subject(italian),

time(autumn-2003)] time(autumn-2003)]

Page 20: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 20

Answer generation - example

“kurser i datalingvistik” (courses in computational linguistics)

...educational programme in computational linguistics, Göteborg University. A Swedish program offering bachelor's and master's degrees

...Lund University’s curriculum 2001-2002. Computational linguistics deal with automatic analysis of texts and other linguistic material...

(Result of a Google search: texts are not tagged with concepts!

Bold face added to relevant information)

Page 21: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 21

Answer generation, cont.

“I have found the following courses:”...educational programme in computational linguistics, Göteborg

University. A Swedish program offering bachelor's and master's degrees

...Lund University’s curriculum 2001-2002. Computational linguistics deal with automatic analysis of texts and other linguistic material...

The introductory sentence should be in the language of the query!

Page 22: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 22

Answer generation, cont.

I have found the following courses: Göteborg University, bachelor’s and master’s degrees

+ link Lund University + link

Introductory sentence and relevant concepts (bachelor’s and master’s degrees) should be in the language of the query!

Page 23: Patrizia Paggio Center for Sprogteknologi A Modular and Scalable Environment for the Semantic WEB

IT-højskolen 23

More information

MOSES web site coming up soon

Link from www.cst.dk

THANK YOU