porting the qall-me framework to romanian

42
Porting the QALL-ME framework to Romanian Constantin Or˘ asan Research Group in Computational Linguistics Research Institute in Information and Language Processing University of Wolverhampton http://www.wlv.ac.uk/ ~ in6093/ 29 th March 2010

Upload: constantin-orasan

Post on 22-May-2015

830 views

Category:

Education


1 download

DESCRIPTION

Invited talk at Processing ROmanian in Multilingual, Interoperational and Scalable Environments (PROMISE 2010) on how to port the QALL-ME framework to a new language

TRANSCRIPT

Page 1: Porting the QALL-ME framework to Romanian

Porting the QALL-ME framework to Romanian

Constantin Orasan

Research Group in Computational LinguisticsResearch Institute in Information and Language Processing

University of Wolverhamptonhttp://www.wlv.ac.uk/~in6093/

29th March 2010

Page 2: Porting the QALL-ME framework to Romanian

1 Introduction

2 The QALL-ME project

3 Multilingual information access in QALL-ME

4 Conclusions

Page 3: Porting the QALL-ME framework to Romanian

Structure of the presentation

1 Introduction

2 The QALL-ME project

3 Multilingual information access in QALL-ME

4 Conclusions

Page 4: Porting the QALL-ME framework to Romanian

Need to access information

• as a result of the Internet development more and moreinformation becomes available

• this information is in many languages

• fields from computational linguistics such as automaticsummarisation, question answering, text mining, etc. can helppeople deal with information

Page 5: Porting the QALL-ME framework to Romanian

Need to access information

• as a result of the Internet development more and moreinformation becomes available

• this information is in many languages

• fields from computational linguistics such as automaticsummarisation, question answering, text mining, etc. can helppeople deal with information

Page 6: Porting the QALL-ME framework to Romanian

Question answering (QA)

• Question answering aims at identifying the answer to aquestion in a large collection of documents

• the information provided by QA is more focused thaninformation retrieval

• the output can be the exact answer or a text snippet whichcontains the answer

• the domain took off as a result of the introduction of QAtrack in TREC, whilst cross-lingual QA as a result of CLEF

Page 7: Porting the QALL-ME framework to Romanian

Types of QA systems

• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)

• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database

• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain

Page 8: Porting the QALL-ME framework to Romanian

Types of QA systems

• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)

• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database

• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain

Page 9: Porting the QALL-ME framework to Romanian

Types of QA systems

• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)

• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database

• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain

Page 10: Porting the QALL-ME framework to Romanian

Purpose of the presentation

• briefly present the QALL-ME project

• show how it was adapted to answer questions in Romanianabout movies

Page 11: Porting the QALL-ME framework to Romanian

Purpose of the presentation

• briefly present the QALL-ME project

• show how it was adapted to answer questions in Romanianabout movies

Page 12: Porting the QALL-ME framework to Romanian

Structure of the presentation

1 Introduction

2 The QALL-ME project

3 Multilingual information access in QALL-ME

4 Conclusions

Page 13: Porting the QALL-ME framework to Romanian

The QALL-ME project

• QALL-ME = Question Answering Learning technologies in amultiLingual and Multimodal Environment

• EU-funded project part of FP6

• 7 partners:• FBK-irst, Italy• University of Wolverhampton, UK• University of Alicante, Spain• DFKI, Germany• Comdata, Italy• UbiEST, Italy• WayCom, Italy

• Web page: http://qallme.fbk.eu

Page 14: Porting the QALL-ME framework to Romanian

The QALL-ME project

• aimed at establishing a shared infrastructure for multilingualand multimodal QA in the domain of tourism

• In the QALL-ME system• users ask natural language questions in several languages (both

in textual and speech modality) using a variety of input devices(e.g. mobile phones), and

• returns a list of specific answers formatted in the mostappropriate modality, ranging from small texts, maps, videos,and pictures.

Page 15: Porting the QALL-ME framework to Romanian

Spanish Answer Extractor

Italian Answer Extractor

German Answer Extractor

QALL­ME central QA planner

Service Provider

Question Type ontology

Answer Type ontology

Dialog Models

English Answer Extractor

Local Information Sources

Semantic representation

Speech Recognizers

Page 16: Porting the QALL-ME framework to Romanian

Main outputs of the project

• an ontology for the domain of tourism

• entailment based QA framework

• the QALL-ME benchmark

• an entailment framework

(all accessible from the project’s web page:http://qallme.fbk.eu)

Page 17: Porting the QALL-ME framework to Romanian

The ontology

• A domain-specific ontology for the tourism domain wasdeveloped and shared among all the partners.

• The ontology was used to serve as:• bridge between different languages• communication language between different components of the

system

• The ontology was linked to domain independent ontologiessuch as MultiWordNet and Sumo

• For more information see (Ou et al., 2008)

Page 18: Porting the QALL-ME framework to Romanian

Design of the ontology

• Analysis of data from content providers

• Analysis of users requirements

• Inspired by similar ontologies:• Harmonise and eTourism: focus on static information (e.g.

accommodation and events/activities)• Similar to eTourism as is written in OWL rather RDFs• but wider coverage

• Introspection

Page 19: Porting the QALL-ME framework to Romanian

The ontology

• Main classes: Country, Destination, Site (i.e.Accommodation, Attraction, Gastro, and Infrastructure),Transportation, EventContent and Event

• Element classes: Facility, Room, PersonOrganization,Language, and Currency

• Attribute classes: Contact, Location, Period and Price.

• Element and attribute classes cannot exist independently andhave to be attached to other main or element classes

Page 20: Porting the QALL-ME framework to Romanian

MovieShow

Cinema

Movie

TicketPrice

DateTimePeriod

synposis

isInSitehasPrice

hasEventContent

hasPeriod

priceType

priceValue

Director

Star

Producer

Writer

Currency

GPSCoordinate

DirectionLocation

Contact

hasCurrency

TimePeriod

DatePeriod

startTimeendTime

endDate startDate

hasTimePeriod

hasDatePeriod

DirectionLocation

hasSiteFacility

hasContact

hasWriter

hasDirector

hasProducer

genre

name

hasPostalAddress

hasGPSCoordinate

PostalAddress

CinemaRoom

hasRoom

hasStar

certificate

SitePrice

Event

EventContentPeriod

subClassOfsubClassOf

subClassOf

subClassOfsubClassOf

SiteFacility

RoomFacility

hasRoomFacility

name description

Page 21: Porting the QALL-ME framework to Romanian

The ontology

• Encoded using OWL DL, since it has more expressive powerthan OWL Lite and has more efficient reasoning support thanOWL Full

• Used Protege-OWL as the editor and RacerPro7 as thereasoner

• The ontology contains• 122 classes (concepts),• 55 datatype properties and• 52 object properties which indicate the relationships among

the 122 classes.• 15 top-level classes.

• The class hierarchy has a maximum depth of 4.

Page 22: Porting the QALL-ME framework to Romanian

The QALL-ME framework

• is an architecture skeleton for multilingual QA systems forclosed domains

• designed in such a way that it allows fast development ofclosed domain QA systems

• freely available from http://qallme.sourceforge.net/

• is based on a Service Oriented Architecture (SOA) which isrealised using web services

• relies on textual entailment recognisers

Page 23: Porting the QALL-ME framework to Romanian

Web services

1 Context providers: are used to anchor questions in spaceand time

2 Annotators: Currently three types of annotators areavailable:

• named entity annotators which identify names of cinemas,movies, persons, etc.

• term annotators which identify hotel facilities, movie genresand other domain-specific terminology

• temporal annotators that are used to recognise and normalisetemporal expressions in user questions

3 Entailment engine: determines whether a user questionentails a retrieval procedure

4 Query generator: which relies on an entailment engine togenerate a query to extract the answer.

5 Answer pool: retrieves the answers from a database.

Page 24: Porting the QALL-ME framework to Romanian

Context providers

• are used to anchor a question in space and time

• return the current position and time

• used by the presentation module when maps are displayed

• used by temporal process to normalise temporal entities

• determines which services are used in a cross-lingual scenario

• can be static or determined from a mobile phone

Page 25: Porting the QALL-ME framework to Romanian

Named entity and term annotators

• named entity recogniser = identifies names of hotels, movies,persons, etc.

• term annotator = identifies domain specific terms such ashotel facilities, movie genres, etc.

• the entities and terms are known, so the task is reduced to adatabase look up

• Gazetteers are the main source for determining the entities

• The annotation module needs to determine the canonical formof a entity

• greedy algorithm that uses character based similarity, amodified TF*IDF and a greedy algorithm

• does not allow overlapping and there are few ambiguities

Page 26: Porting the QALL-ME framework to Romanian

Named entity and term annotators

• Annotates both standard and non-standard entities: cinema,movie, location, genre, certificate

• Needs to deal with nosy input:• misspelt words/input from ASR engines/SMS input e.g.

becaming Jane, becoming Jade• free word order (Will Smith / Smith, Will)• equivalent strings (saw III / three / 3; Smith, Will / Smith,

W.)

• Needs to deal with questions in mixed languages

• Needs to deal with ambiguous entities

Page 27: Porting the QALL-ME framework to Romanian

Temporal annotator

• questions from the domain of tourism contain a large numberof temporal expressions

• we use a simplified version of the tagger implemented byPuscasu (2004)

• the simplification was done to reduce the processing time(Varga, Puscasu, and Orasan, 2009)

• identifies both self-contained temporal expressions (TEs) andindexical/under-specified TEs

• uses TIMEX2 standard

• the output is used by TIMEX2SPARQL service to restrict theextracted answers

Page 28: Porting the QALL-ME framework to Romanian

Entailment engine

• often closed-domain QA systems transform a question to aProlog fact or SQL query

• often this solution works only partially due to languagevariability

• in QALL-ME this problem is solved using textual entailment

• the entailment engine determines whether two questions entailthe same meaning so they share the same retrieval procedure:

• T the input question• H is textual pattern stored in a repository• textual patterns have SPARQL retrieval procedures

• we calculate the similarity between two sentences to determinewhether between them there is an entailment relation

Page 29: Porting the QALL-ME framework to Romanian

Query generation service

• produces a SPARQL query that can be used to answer thequestion

• has a list of question templates with their associated SPARQLqueries

• relies on the entailment engine to determine which of thequestion patterns entail the same meaning as the userquestion

• fills in the slots of the question patterns

Page 30: Porting the QALL-ME framework to Romanian

Example

User question (T): What movie can I see tonight inWolverhampton?

List of patterns (H):

• Who is the director of [MOVIE]?

• Where can I see [MOVIE] [TIMEX]?

• What movies are on in [DESTINATION] [TIMEX]?

• What is the address of [CINEMA]?

• . . .

Page 31: Porting the QALL-ME framework to Romanian

Example

User question (T): What movie can I see tonight inWolverhampton? → What movie can I see [TIMEX] in[DESTINATION]?

List of patterns (H):

• Who is the director of [MOVIE]?

• Where can I see [MOVIE] [TIMEX]?

• What movies are on in [DESTINATION] [TIMEX]?

• What is the address of [CINEMA]?

• . . .

Select the retrieval pattern associated with the questionWhat movies are on in Wolverhampton tonight

Page 32: Porting the QALL-ME framework to Romanian

Answer Pool service

• takes the SPARQL query generated by the query generatorand extracts the answer

• SPARQL is a query language for accessing RDF graphs by theW3C RDF Data Access Working Group

• SPARQL provides interoperability between languages

Page 33: Porting the QALL-ME framework to Romanian

Structure of the presentation

1 Introduction

2 The QALL-ME project

3 Multilingual information access in QALL-ME

4 Conclusions

Page 34: Porting the QALL-ME framework to Romanian

Cross-lingual QA

• QALL-ME tourism prototype is design to allow bothmonolingual and cross-lingual QA

• relevant web services are activated depending on the sourceand target language

• user scenario: Romanian tourist in UK who wants to find outmore about the movies in Wolverhampton

Page 35: Porting the QALL-ME framework to Romanian

Cross-lingual QA

Page 36: Porting the QALL-ME framework to Romanian

Prototype for Romanian

• we wanted to find out how long it takes to develop a demo forRomanian

• components had to be adapted:• named entity and term annotators had to be trained on a

different list of entities• a simple temporal annotator was implemented on the basis of

the English one• the language independent similarity entailment engine was used• the question patterns were translated to Romanian• answer pool did not required any change

• the whole process took under one week

Page 37: Porting the QALL-ME framework to Romanian

Romanian demo

http://qallme.wlv.ac.uk:8080/QALL-ME-web-demo/index.jsp

Page 38: Porting the QALL-ME framework to Romanian

Structure of the presentation

1 Introduction

2 The QALL-ME project

3 Multilingual information access in QALL-ME

4 Conclusions

Page 39: Porting the QALL-ME framework to Romanian

Conclusions

• multilinguality is a very important issue for the QALL-MEproject

• the ontology constitute the bridge between languages

• the QALL-ME framework can be used to quickly developprototypes for other languages

Page 40: Porting the QALL-ME framework to Romanian

Thank you!

Page 41: Porting the QALL-ME framework to Romanian

References

Page 42: Porting the QALL-ME framework to Romanian

Ou, Shiyan, Viktor Pekar, Constantin Orasan, Christian Spurk, and Matteo Negri.2008. Development and alignment of a domain-specific ontology for questionanswering. In European Language Resources Association (ELRA), editor, Proceedingsof the Sixth International Language Resources and Evaluation (LREC’08), Marrakech,Morocco, May 28 – 30.

Puscasu, Georgiana. 2004. A framework for temporal resolution. In Proceedings ofthe 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon,Portugal, May, 26-28.

Varga, Andrea, Georgiana Puscasu, and Constantin Orasan. 2009. Identification oftemporal expressions in the domain of tourism. In Knowledge Engineering: Principlesand Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.