implementation of a qa system in a real context carlos amaral (priberam, portugal) dominique laurent...

16
Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Upload: ann-price

Post on 18-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Implementation of a QA system in a real context

Carlos Amaral (Priberam, Portugal)

Dominique Laurent (Synapse Développement, France)

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 2: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

1. The Question-Answering systemWhat is a QA System ?

• System that enables the extraction of an answer (or several) to a request (a question) based on a corpus

• The problematic of « the type of the question »

• An answer or several, possibly a list from one or several documents, an answer of the type Yes/No…,

• On a corpus in one or several languages…

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 3: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

1.1. QA and Language Processing• A QA system appears to be a LP « par excellence »• However, certain systems are uniquely based on pattern

matching (cf Soubotine & Soubotine, TREC 2003),• These systems seems to have reached their limits • And, if they can process all what is factual, the complex

questions/queries are far beyond their possibility.• The best systems validated at TREC and CLEF are

based on Automated Language Processing.

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 4: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

1.2. OUR QA SYSTEM

• First developed (1999 - 2001) within a French innovation project (Anvar)

• Then (end 2001- end 2003) within the European project TRUST (FP5)

• Currently, (2005/06) within the European project M-CAST (FP6)

• Main features : targets B2B and B2C, multilingual, NLP based and intensive.

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 5: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

A modular conception

FrenchLanguage

Module

ItalianLanguage

Module

PortugueseLanguage

Module

PolishLanguage

Module

EnglishLanguage

Module

Indexation engine Extraction of text engine

IndexDocumentsDocuments Visualizationof Results

Visualizationof Results

CzechLanguage

Module

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 6: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 7: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

1.3. Evaluations of the QA system

• Professional benchmarking contests and campaigns such as EQueR (2004) and CLEF (2005 & 2006),

• Evaluations for the French, English, Portuguese and Spanish language modules, in monolingual and multilingual.

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 8: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

CLEF 2005

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

64%

39,50%36,50%

25,50%

64,50%

0%

10%

20%

30%

40%

50%

60%

70%

French monolingualEnglish - FrenchPortuguese - FrenchItalian - FrenchPortuguese monolingual

Page 9: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

CLEF 200668% 67%

36%

44,5%47,5%

32,5%

52,5%

33,5%

0%

10%

20%

30%

40%

50%

60%

70%

80%

French monolingualEnglish1 - FrenchPortuguese - FrenchEnglsih2 - FrenchPortuguese monolingualSpanish monolingualPortuguese - SpanishSpanish - Portuguese

Page 10: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

• In CLEF 2005 and CLEF 2006, the best engines for monolingual were our systems for Portuguese and French. And the best systems for multilingual were our systems for English-French, Portuguese-French, Spanish-Portuguese, Portuguese-Spanish.

• Synapse Développement and Priberam are now partners of the project Quaero.

Page 11: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

2. Implementation in M-CAST Project• Tests carried-out on books in the National Czech library

and the Torun library in Poland,

• Processing several millions of digitized documents,

• Manages meta-data and UDC classification,

• Accommodates questions and answers in English, French, Italian, Portuguese, Polish, Czech

• Implemented on both library portals

Page 12: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

2.1. Adaptation to Digital Libraries Resources

• Scanned texts : poor quality– > Spell checker to improve the quality of

documents.

• One book, lots of pages :– > Management of multi-part documents during

semantic analysis

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 13: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

2.2. Integration of Dublin Core document’s attributes

• Storage of Dublin Core attributes as Metadata

• QA : Who is the author of Hamlet ?– Adaptation of the system to search in

metadata– Use of those metadata as filters

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 14: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

2.3. Universal Decimal Classification

• Storage of UDC codes for each document

• Search through UDC codes

• Filtering through UDC codes

• Semantic disambigation through UDC codes

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Page 15: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Technical architecture

ATL Web

Service

Indexer

Searcher

Linguistic processor

QueryResult

list

Index documents

Semantic Index

Semantic Index

Semantic Index

Java Portal

QueryResult

list

SOAP XML

Index documents

SOAP XML

Indexed documents

Indexed documents

Language Modules

Document

Indexed Document

Question

Answer

Document Parser

Language Detector

IIS

Axis / Apache

Page 16: Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

END of Presentation I would appreciate your

questions !

Thank you - Merci !