lymba (qa system)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · lymba...
TRANSCRIPT
![Page 1: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/1.jpg)
Lymba (QA System)
Fabian [email protected]
15.01.2014
University of FreiburgChair of Algorithms and Data Structures
Seminar Information Extraction
![Page 2: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/2.jpg)
Lymba – Fabian Schillinger 2/26
TREC 2007
● Question anwering track● Blog data & Newswire documents● Factoid questions
"How many calories are there in a Big Mac?"● List questions
"List the names of chewing gums."● "Other" questions
interesting facts about some target
![Page 3: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/3.jpg)
Lymba – Fabian Schillinger 3/26
PowerAnswer 4 Overview
● Distributed strategy-based QA-System● Strategy is collection of the components
● Question Processing (QP)● Passage Retrieval (PR)● Answer Processing (AP)
![Page 4: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/4.jpg)
Lymba – Fabian Schillinger 4/26
PowerAnswer 4 Overview
● determine temporal constraints
● detect the expected answer type
● select the keywords used in retrieving relevant passages
● decide which question class to use
![Page 5: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/5.jpg)
Lymba – Fabian Schillinger 5/26
PowerAnswer 4 Overview
● ranks passages that are retrieved by the IR system
![Page 6: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/6.jpg)
Lymba – Fabian Schillinger 6/26
PowerAnswer 4 Overview
● extracts and scores the candidate answers
![Page 7: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/7.jpg)
Lymba – Fabian Schillinger 7/26
Question Answering over Blog Data
● 175 GB of blog data● with surrounding HTML/XML
● parsed to identify unique content● language detection to remove non-english text,
spam and empty entries● 13.1 GB of data (92.5% reduction)
![Page 8: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/8.jpg)
Lymba – Fabian Schillinger 8/26
Temporal Event Processing
● Concept Tagger Module● Detects events in question or candidate passage● Labels them
Event Class Question
Occurrencemarry Who is he planning to marry?
Stateheld In what city were … held?
Aspectualbegin On what date did the court begin?
![Page 9: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/9.jpg)
Lymba – Fabian Schillinger 9/26
Temporal Event Processing
● Concept Tagger Module● Detects events in question or candidate passage● Labels them● Identifies temporal expressions
– Absolute dates– Times– Durations
Expression Question
Absolute Date2004 What company acquired IMG in 2004?
DurationThree months In three months following ...
SetsEach year How many grants … each year?
![Page 10: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/10.jpg)
Lymba – Fabian Schillinger 10/26
Temporal Event Processing
● Concept Tagger Module● Uses set of rules working on full parse tree of text● All temporal expressions normalized
Q249.5: How many grants does the Fulbright Program award each year?
P: The program named after the former Senator J. William Fulbright awards approximately 4,500 new grants annually.
![Page 11: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/11.jpg)
Lymba – Fabian Schillinger 11/26
ROSE – a new NER System
● Uses 3-step process:● Pass text through pattern based grammar system● Pass grammar annotated data through ML system● Perform partial matching on the text
![Page 12: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/12.jpg)
Lymba – Fabian Schillinger 12/26
Answer Likelihood for Factoid Answer Selection
● Goal● Group questions from previous TRECs into classes● Build language model for each class on features
extracted from Question and Answer
![Page 13: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/13.jpg)
Lymba – Fabian Schillinger 13/26
Answer Likelihood for Factoid Answer Selection
● Three methods● Generate REGEX-style paraphrases and group
them together by paraphrase identifiers● Use hierarchical clustering based on
– Expected answer type– Most relevant keywords– Named entity types
● Group by answer type
![Page 14: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/14.jpg)
Lymba – Fabian Schillinger 14/26
Answer Likelihood for Factoid Answer Selection
● For questions in classes and correctly judged answers the following features were extracted● Stemmed keywords● Morphological alternations for keywords● Named entity tags
![Page 15: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/15.jpg)
Lymba – Fabian Schillinger 15/26
Answer Likelihood for Factoid Answer Selection
● Implemented in two stages:● During question processing● During answer processing
● Use score of answer likelihood to re-rank candidates
● Best observations with grouping the questions by answer type
![Page 16: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/16.jpg)
Lymba – Fabian Schillinger 16/26
List Questions
● Strategy● Try to maximize recall by returning as many
answers as possible during passage retrieval using– Lexico-semantic alternations– Relaxing the query to include
● Target keywords● Most relevant keywords from primary question text
● But how to filter answers?
![Page 17: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/17.jpg)
Lymba – Fabian Schillinger 17/26
List Questions
● Strategy● Try to maximize recall by returning as many
answers as possible during passage retrieval using– Lexico-semantic alternations– Relaxing the query to include
● Target keywords● Most relevant keywords from primary question text
● But how to filter answers?– Use lists from Wikipedia– Integrate COGEX
![Page 18: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/18.jpg)
Lymba – Fabian Schillinger 18/26
List Questions
● external data for specialized answer types● Bots for Amazon.com, imdb.com if question was in
domain of books, songs or movies● Bot for Wikipedia
– Google "I'm feeling lucky" to locate relevant articles
![Page 19: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/19.jpg)
Lymba – Fabian Schillinger 19/26
List Questions
● COGEX for list questions● Potential candidates from passage retrieval● Each candidate is hypothesized to be answer● COGEX checks if assertion is entailed by
corresponding candidate answer passage● Only candidates with entailment score over some
threshold are returned as valid answers
![Page 20: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/20.jpg)
Lymba – Fabian Schillinger 20/26
"Other" Questions
● The challenge is selection of interesting and novel nuggets from large corpus● Definition pattern matching module● List of over 200 positive and negative pre-computed
patterns
Extended by● Hierarchy of nugget patterns and automatically
derived generic answer patterns
![Page 21: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/21.jpg)
Lymba – Fabian Schillinger 21/26
"Other" Questions
● Nugget hierarchy based on question classes from previous TREC question sets● 35 target classes
– Animal, actor, musician, literature...● Each class is associated with a set of minimal
information– Person pattern: full name, birth, death, place of birth,
residence, occupation, etc.– Event: begin time, end time, duration, location,
participants, etc.
![Page 22: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/22.jpg)
Lymba – Fabian Schillinger 22/26
"Other" Questions
● Example patterns:● _nationality _profession _var
German chancellor Angela Merkel● _var ( _nationality _profession
Angela Merkel (Germany's chancellor● _nationality _JJ _profession _var
Germany's first female chancellor Angela Merkel
![Page 23: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/23.jpg)
Lymba – Fabian Schillinger 23/26
Results
● Factoid answer selectionRun Tag Submitter Accuracy
LymbaPA07 Lymba Corporation 0.706
LCCFerret Language Computer Corporation 0.494
lsv2007c Saarland University 0.289
![Page 24: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/24.jpg)
Lymba – Fabian Schillinger 24/26
Results
● List questionsRun Tag Submitter F-Score
LymbaPA07 Lymba Corporation 0.479
LCCFerret Language Computer Corporation 0.324
ILQUA1 State University of New York (SUNY) at Albany
0.147
![Page 25: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/25.jpg)
Lymba – Fabian Schillinger 25/26
Results
● "other" questionsRun Tag Submitter F-Score (beta=3)
FDUQAT16B Fudan University 0.329
lsv2007c Saarland University 0.299
QASCU2 Concordia University 0.281
LymbaPA07 Lymba Corporation 0.281
LCCFerret Language Computer Corporation 0.261
![Page 26: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass](https://reader033.vdocument.in/reader033/viewer/2022060606/605bc147a8ca9e5f221b45aa/html5/thumbnails/26.jpg)
Lymba – Fabian Schillinger 26/26
Thank you for your attention.
Any questions?