advanced techniques for answer extraction and formulation

25
1 Advanced Techniques Advanced Techniques for Answer for Answer Extraction and Extraction and Formulation Formulation Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan [email protected]

Upload: wan

Post on 14-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Advanced Techniques for Answer Extraction and Formulation. Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan [email protected]. Tasks. Task 1. QA System Taxonomy Task 2. Answer fusion - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced Techniques for Answer Extraction and Formulation

1

Advanced Techniques Advanced Techniques for Answer Extraction for Answer Extraction

and Formulationand Formulation

Language Computer Corporation

www.languagecomputer.comDallas, TexasPI: Dan Moldovan

[email protected]

Page 2: Advanced Techniques for Answer Extraction and Formulation

2

TasksTasks

Task 1. QA System Taxonomy Task 2. Answer fusion Task 3. Develop methods for on-line

ontology construction Task 4. Develop an inference engine capable

of providing answer justification Task 5. Formulate concise and coherent

answers Task 6. Explore new QA System

Architectures

Page 3: Advanced Techniques for Answer Extraction and Formulation

3

Performance AnalysisPerformance Analysis

Serial System Architecture

M1: Keyword pre-

processing (split/bind/spel

l)

M5: Keyword

expansion

M3: Derivation

of expected answer type

M4: Keyword selectio

n

M2: Construction of question representati

on

M6: Actual retrieval of documents

and passages

M10: Answer

formulation

M8: Identificatio

n of candidate answers

M9: Answer ranking

M7: Passage post-filtering

Answer

Question

Page 4: Advanced Techniques for Answer Extraction and Formulation

4

Performance AnalysisPerformance Analysis

Distribution of ErrorsModule

Module definition Errors (%)

M1 Keyword pre-processing (split/bind/spell check)

1.9

M2 Construction of internal question representation

5.2

M3 Derivation of expected answer type 36.4

M4 Keyword selection (incorrectly added or excluded)

8.9

M5 Keyword expansion desirable but missing 25.7

M6 Actual retrieval (limit on passage number or size)

1.6

M7 Passage post-filtering (incorrectly discarded) 1.6

M8 Identification of candidate answers 8.0

M9 Answer ranking 6.3

M10 Answer formulation 4.4

Page 5: Advanced Techniques for Answer Extraction and Formulation

5

Performance AnalysisPerformance Analysis

Impact of System Parameters

0.340.350.360.370.380.390.4

0.410.420.43

Precision(MRR)

20 50 200

Nd

Np=50Np=200Np=500

Nd – maximum number of documents retrieved

Np – maximum number of passages processed

Page 6: Advanced Techniques for Answer Extraction and Formulation

6

Performance AnalysisPerformance Analysis

Impact of System Parameters

0.4

0.411

0.421

0.401

0.38732

4359

110

265

+-3 +-6 +-10 +-20 +-40

nr. extra lines

Precision (MRR)

Time(sec)

TimePrecision

Sp – site of retrieved passage

Sp

Page 7: Advanced Techniques for Answer Extraction and Formulation

7

Performance AnalysisPerformance Analysis

Architecture with Feedbacks

M1+M2+M3+M4

M5

+ lexico-sem

alternations

M6M7+M

8Logic

ProvingM9+M1

0

Question

Answer

Loop 1

Loop 2

Loop 3

Page 8: Advanced Techniques for Answer Extraction and Formulation

8

Performance AnalysisPerformance Analysis

Feedback added Precision (MRR) Incremental enhancement

none 0.421=b 0%

Passage retrieval (loop 1)

0.468=b1 b+11%

Lexico-semantic (loop 2)

0.542=b2 b1+15%

Proving (loop 3) 0.572=b3 b2+5%

Impact of System Parameters

Page 9: Advanced Techniques for Answer Extraction and Formulation

9

On-line Ontology On-line Ontology ConstructionConstruction

Discover Concepts Step 1: Pick a set of related seed concepts Step 2: Form a corpus of N sentences that

contain at least one of the seeds Step 3: Parse the sentences in the corpus

and extract the NP that contain the seeds Step 4: Apply filtering procedures that

accept or reject new concepts Step 5: Form an ontology: classify new

concepts using subsumption

Page 10: Advanced Techniques for Answer Extraction and Formulation

10

On-line Ontology On-line Ontology ConstructionConstruction

Discover Semantic Relations Step 1: Select the semantic relation R Step 2: Pick pairs of concepts among which

R holds Step 3: Form a corpus such that each

sentence contains one pair of concepts Step 4: Extract lexico-syntactic patterns

between concepts CiPCj

Step 5: Apply semantic constraints determined a priory and decide whether or not the pattern CiPCj is a semantic relation R

Page 11: Advanced Techniques for Answer Extraction and Formulation

11

Extracting Concepts Extracting Concepts Methods:1. From NP that contain the seed.

Many of his fellow writer friends have been assassinated by islamist fundamentalist terrorist groups during the same years , in the nineties .

All the suicide terrorist groups have support infrastructures in Europe and in North America .

terrorist group

“is a”

Page 12: Advanced Techniques for Answer Extraction and Formulation

12

Extracting Concepts Extracting Concepts (cont.)(cont.)

2. From lexico-syntactic patterns containing the seed.2.1 Via subsumptionSome domestic U.S. terrorist groups , including the Aryan Nation and the Phineas Priesthood , and some militia members are also religiously motivated in addition to being driven by a hatred of the federal government .

Terrorist groups including bin Laden 's , Hamas , Hizbollah , etc. in concert with Sudan , Iran and Iraq , form alliance , to be called " Jerusalem Foundation " , to coordinate global activities .

Religiously motivated terrorist groups , such as Usama bin Ladin 's group , al - Qaida , which is believed to have bombed the U.S. Embassies in Africa , represent a growing trend toward hatred of the United States .

terrorist group

“is a”

Page 13: Advanced Techniques for Answer Extraction and Formulation

13

Extracting Concepts Extracting Concepts (cont.)(cont.)

2.2 Via lexical parallelism

During the same period , Erbakan and Refah leaders pledged their support for Hamas and other fundamentalist terrorist groups seeking to halt the Middle East peace process and to overthrow Egypt 's secular government.

terrorist group

“is a”

Page 14: Advanced Techniques for Answer Extraction and Formulation

14

Power Ontology ToolPower Ontology Tool

Page 15: Advanced Techniques for Answer Extraction and Formulation

15

Ontology SnapshotOntology Snapshotterrorist group

fundamentalist terrorist

group

Islamic terrorist group

islamist fundamentalist terrorist group

national Islamic terrorist group

Palestinian Islamic terrorist group

American terrorist group

Hamas Hizbollah

Number of concepts automatically identified: 107

Number of concepts rejected interactively: 25

Number of concepts collected and classified: 107 - 25 = 82

Page 16: Advanced Techniques for Answer Extraction and Formulation

16

Overall ResultsOverall ResultsBuilding a Corpus from the Building a Corpus from the WebWeb

(1) Total time

(2) Number hits returned by search engine

(3) Number of sentences retained

seed Total time(1)

# hits from SE(2)

Sent. Ret.(3)

Base NPs(4)

Collected concepts(5)

Asian Countries 33 min 112756 hits 876 467 34

cosmographers 8 min 210 hits 116 65 17

Eastern European countries

26 min 24523 hits 468 245 15

Explosives 53 min 168793 hits 1408 840 57

Grenades 47 min 77137 hits 1334 866 92

Microsoft products

25 min 72951 hits 762 454 48

Operating systems

39 min 1240119 hits

839 529 58

Search engines 63 min 1869723 hits

2000 871 60

Sports cars 39 min 62311 hits 493 221 40

Terrorist groups 54 min 22683 hits 855 560 82

(4) Number of base NPs containing the seed identified in documents (including duplicates)

(5) Number of collected concepts

Page 17: Advanced Techniques for Answer Extraction and Formulation

17

Semantic ConstraintsSemantic ConstraintsImposed by CausationImposed by Causation

Greenspan makes a recession

Greenspan makes a mistake

Page 18: Advanced Techniques for Answer Extraction and Formulation

18

Semantic ConstraintsSemantic ConstraintsImposed by CausationImposed by Causation

Focus on < NP1 verb NP2 >

NP1

A hyponym of causal agent

verb

Senses of verbs that mean causation

NP2

A hyponym of a causation class

- Human action- Phenomenon- State- Psychological feature- Event

Page 19: Advanced Techniques for Answer Extraction and Formulation

19

Semantic ConstraintsSemantic ConstraintsImposed by CausationImposed by Causation

causal agent make v#5: state

(cause to, do, make)

Greenspan makes a recession

causal agent make v#1: state

(make, do)

Greenspan makes a mistake

Page 20: Advanced Techniques for Answer Extraction and Formulation

20

Answer FusionAnswer Fusion Study answer fusion at various levels of

complexity Questions asking simple facts

What countries import sugar from Cuba? Questions that require on-line ontology

development What software products does Microsoft sell? What causes asthma? What are the effects of alcohol on the brain?

Speculative questions about future events Where will Al Qaeda strike next?

Page 21: Advanced Techniques for Answer Extraction and Formulation

21

Answer FusionAnswer Fusion Answers are extracted by building

an ontology on-line Cause/effect ontology

Q: What causes hypertension?

hypertensionhigh blood pressure

overwork virus fat overindulgence

obesity TV watching environmental factors

exhaustion chronic fatigue syndrome alcohol alcohol dehydration laxative abuse bacteria

atherosclerosis caffeine food poisoning viruses alcohol Salmonella anger

high salt intakesmoking

Page 22: Advanced Techniques for Answer Extraction and Formulation

22

Answer FusionAnswer Fusion

Cause/effect ontology Q: What are the effects of stress?

hair lossabsenteeismgastrointestinal treat disordersillness nerve damageheadache physical problemshyperactive behavior reading inabilitydrug abuse, substance abuse money spendingdepression homelessness

suicide attemptweight lossfatiguereduced resistance to disease

stress,tension

Page 23: Advanced Techniques for Answer Extraction and Formulation

23

Answer FusionAnswer Fusion Part-whole meronomy ontology

< NP1 have NP2 > car has clutch < NP2’s NP1 > John’s hand < NP1 of NP2 > leg of a table

Q: What does the AH-64A Apache helicopter consist of?

AH-64A Apache helicopter

Hellfire air-to-surface missile millimeter wave seeker70mm Folding Fin Aerial rocket30mm Cannon cameraArmamentsGeneral Electric 1700-GE engine4-rail launchersFour-bladed main rotorAnti-tank laser guided missileLongbow millimeter wave fire control radar integrated radar frequencyRotating turret interferometerTandem cockpit Kevlar seats

Page 24: Advanced Techniques for Answer Extraction and Formulation

24

Answer FusionAnswer Fusion Questions with multiple ontologies

Q: What terrorist groups are in Asia? Build an ontology for terrorist groups Build an ontology for Asian countries Generate specific queries with

combinations between two ontologies

terrorist groups Asian countries

Page 25: Advanced Techniques for Answer Extraction and Formulation

25

Thank you!Thank you!

Papers: Moldovan, Pasca, Surdeanu, Harabagiu, “Performance Issues and Error Analysis in an Open-Domain QA System”, ACL 2002. Girju, Moldovan, “Mining Answers for Causation Questions”, AAAI Spring Symposium 2002. Moldovan, Novischi, “Lexical Chains for Question Answering”, COLING 2002.