aquaint building an initial cross-lingual question answering system: english question -> chinese...

18
AQUAINT Building an Initial Cross-lingual Building an Initial Cross-lingual Question Answering System: Question Answering System: English Question -> Chinese Collection English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6 October 2004

Upload: damon-higgins

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

AQUAINT

Building an Initial Cross-lingual Building an Initial Cross-lingual Question Answering System:Question Answering System:

English Question -> Chinese CollectionEnglish Question -> Chinese Collection

Ralph Weischedel, Ana Licuanan, Jinxi Xu

6 October 2004

Page 2: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

2

AQUAINTPhase 2 Objectives Phase 2 Objectives

End-to-End System; Multi-lingual Data• Find appropriate information in a second language• Be organized to maximize performance

– Analyze, then translate?– Translate, then analyze?

• Focus on complex questions (e.g. definitional & biographical questions), rather than on factoid questions

• Determine whether two statements across languages convey – The same information, – Inconsistent information, or – Novel/complementary information

(To be addressed later)

(To be addressed later)

Page 3: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

3

AQUAINTApproachApproach

• Trained, language-independent algorithms for core NLP problems, e.g.,

– passage retrieval,

– name tagging,

– parsing and

– co-reference

• Plug-and-play architecture for alternative MT systems for question & document translation

• Controlled experiments to measure and optimize QA performance

– BBN’s AQUA system for English as monolingual baseline

Page 4: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

4

AQUAINTMono-Lingual System 12/2003Mono-Lingual System 12/2003

Question Classification

Question

Document Retrieval

Linguistic Processing & Extraction of Kernel Facts

Kernel Fact Ranking

Redundancy Removal

List of Responses

Proposition Finding

Co-reference

Relation Extraction

Name AnnotationName Tagging

Question Profile

TreebankParsing

Linguistically Motivated

Components of SERIF

Hand-crafted Patterns

Surface Structure Matching

Background Model

Page 5: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

5

AQUAINTAQUA Cross-LingualAQUA Cross-Lingual

• Architecture today implemented for English questions against Chinese data bases via analysis in Chinese

• Expansion later for– Arabic documents– Merging of answers from English, Arabic, and Chinese sources

(Later)

English & Arabic

DatabaseAnswer

Generation

Transliteration

User Interface

Document Processing

SERIFMachine

Translation

Chinese TextTranslated English Text

Chinese Extraction Output

Analysis during

Indexing

Responding to Questions

Page 6: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

6

AQUAINTOutlineOutline

• Transliterating names from English to foreign language (part of question analysis)

• Performance of Chinese analysis components

• Machine translation currently used

• Example output

Page 7: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

7

AQUAINTProblems with Dialects in TransliterationProblems with Dialects in Transliteration

• Examples

– George Bush• 乔治 布什 (PRC)• 乔治 布希 (Taiwan)

– Blair• 布莱尔 (PRC)• 贝理雅 (Hong Kong)

• Even within a single dialect, there can be multiple transliterations in use

• Currently we use the PRC style of transliteration

Page 8: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

8

AQUAINTTransliteration AlgorithmTransliteration Algorithm

• Given an English name E, the algorithm (Al-Onaizan, 2002) finds C that maximizes

P(E|C)*P(C)– English name E is segmented into phonemes (character

sequences)– Probabilities of phoneme mappings P(E|C) are learned form

human transliterated names – Language model probability P(C) is compiled from a Chinese

corpus• Transliteration Training Data

– Person proper names– Mandarin training data– ~500k name pairs taken from Chinese - English Name Entity

Lists (LDC2003E01 v1.beta)

Page 9: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

9

AQUAINTStatistical Transliteration: ExamplesStatistical Transliteration: Examples

• Albright: 奥尔布赖特 5.5 * 10-4

– 奥尔 :al 0.1648

– 布 :b 0.5292

– 赖 :righ 0.0113

– 特 :t 0.5657

• Powell: 鲍威尔 2.4 * 10-4

– 鲍 :po 0.0069

– 威尔 :well 0.0351

Page 10: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

10

AQUAINTCurrent Chinese Component PerformanceCurrent Chinese Component Performance

Test Set Recall Precision F/Value

NamesACE

evaluation TDT4 data

80% 77% 78.26 F

DescriptionsACE

evaluation TDT4 data

60% 76% 66.82 F

Entity Mentions

ACE evaluation TDT4 data

78.7 (value)

EntitiesACE

evaluation TDT4 data

72.4 (value)

ParsingChinese

Treebank 82.8% 81.3% 82.04 F

Represents state-of-the-art performance

Page 11: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

11

AQUAINTMachine TranslationMachine Translation

• Statistical MT learns to translate new text based on existing text translated by humans

• Model of translation trained by GIZA++

– Freely available at www.informatik.rwthaachen.de/Colleagues/och/software/GIZA++.html

• Language Model trained using CMU Language Modelling Toolkit v2

• Translation was done by USC/ISI’s ReWrite decoder, version 1.0.0a

– Downloaded from http://www.isi.edu/licensed-sw/rewrite-decoder/

Page 12: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

12

AQUAINTMT Training DataMT Training Data

• Translation– ~315k sentence pairs (~11m Chinese characters)– Corpora:

• MTC-1 (Multiple Translation Chinese Corpus)• Chinese-English Lexicon• Chinese Treebank• Hong Kong News• Hong Kong Hansards (proceedings of the Legislative

Council of the HKSAR)

• Language Model– Trigram language model– ~60m English words– Corpora:

• TDT-4 (English portion)• North American News Text Corpus

Page 13: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

13

AQUAINTSteps to Improving MT ModelSteps to Improving MT Model

• Using GIZA++ & ReWrite

– Take advantage of full UN Parallel Corpus

– Tune training and decoding parameters

• Consider other MT systems

Page 14: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

14

AQUAINTExample Answer NuggetsExample Answer Nuggets

• Who is Colin Powell?

– Nugget from Copulas and Appositives前任 美国 参谋长 连席 会议 主席 鲍威尔 “former Chairman of the Joint Chiefs of Staff”

• Who is Kofi Annan?

– Nugget from Propositions他 建议 由 两 族 轮流 选派 总统 , 即 希腊族人 每 担任 两 任 总统 , 土耳其族人 担任 一任 总统 。 “He proposed that Greece and Turkey alternately hold the presidency (of Cyprus)”

– Nugget from Relations安理会 主席“Chairman of the UN Security Council”

Page 15: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

15

AQUAINTUser InterfaceUser Interface

Page 16: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

Former Chairman of the Joint Chiefs (app)

Soon-to-be secretary of state, retired general Powell (app)

A candidate acceptable to Republicans and Democrats (copula)

The most likely candidate (copula)

National Security Advisor to President Reagan (prop)

General Powell will become the first black to be Secretary of State in US History (prop)

Powell served as commander of US forces in South Korea from 1973 to 1974. (prop)

Page 17: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

U.N. Secretary-General(app)

Anan is in Jerusalem for a diplomatic mission … (app)

U.N. Secretary-General(app)

Became the first U.N. Secretary-General to make a statement at the refugee meeting. (prop)

Proposed that Greece and Turkey alternately hold the presidency of Cyprus (prop)

Page 18: AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6

18

AQUAINTConcluding CommentsConcluding Comments

• Initial question answering from Chinese corpus implemented• Opportunities for improvement in all components, including

– Transliteration– Machine translation– Passage retrieval– Answer finding and generation

• Positive experience in transitioning English AQUA to– AQUAINT testbed at MITRE– Fairfield experiment

• Baseline participant in relationship pilot study – No work on answering relationship questions

• Proposed pilot evaluation in spring, 2005• First step toward full goal of answer merging across

– English– Arabic– Chinese