question answering system

Question Answering System

A Seminar by JISH MON P.A2K9640

February, 2013

Department of Computer Science and Engineering,Govt. College of Engineering, Kannur

CertificateThis is to certify that this thesis entitled QUESTION ANSWERING SYS-

TEM submitted here with is an authentic record of the thesis work done by JishMon P.A under our guidance in partial fulfilment of the requirements for theaward of Bachelor of Technology Computer science from University of Kannurduring the academic year 2013.

Head Of the DepartmentDr K.NajeebHead of the DepartmentDept. of Computer Science & Engg.Govt.College of Engineering, Kannur

Faculty in chargeMs. Sruthi Tharol

Guest Asst.ProfessorDept. of Computer Science & Engg.

Govt.College of Engineering, Kannur

AcknowledgementI would like to express my sincere gratitude to all those who have helped

me directly or indirectly during my seminar work. I express my sincere thanksto our Principal Dr. Ravindranath and Dr. Najeeb K HOD Computer sciencedepartment for their overwhelming support.

I express my gratitude to all the teachers of the CSE department for theirguidance, inspiration and encouragement throughout the seminar work. I alsoacknowledge our lab assistants who rendered assistance at all times.

I express my deep sense of gratitude and sincere thanks to my parents,colleagues and all well wishers who have directly or indirectly contributed a lotto the seminar presentation.

Above all I thank the Almighty, for without his blessings, this seminar wouldnot have been successful.

AbstractA question answering system is one of the emerging information retrieval sys-tems available on the World Wide Web that is becoming popular day by dayto get succinct and relevant answers in response of users’ questions. The val-idation of the correctness of the answer is an important issue in the field ofquestion answering. Several heuristics have been applied for answer validationtask and tested them against some popular World Wide Web based open do-main Question Answering Systems over a collection of 500 questions collectedfrom standard sources such as TREC, the Worldbook, and the Worldfactbook.

List of Figures

3.1 QA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1

Contents

1 INTRODUCTION 3

2 Natural language processing 4

3 Question Answering System(QAS) 63.1 QA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.1.2 QA model . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1.3 Question answering methods . . . . . . . . . . . . . . . . 83.1.4 Issues in QA . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 QA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2.1 Question manipulation and classification . . . . . . . . . . 103.2.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.3 Answer selection . . . . . . . . . . . . . . . . . . . . . . . 123.2.4 Proximity scoring . . . . . . . . . . . . . . . . . . . . . . . 13

4 CONCLUSION 14

5 REFERENCE 15

2

1

INTRODUCTION

Question Answering (QA) is a computer science discipline within the fields of in-formation retrieval and natural language processing (NLP), which is concernedwith building systems that automatically answer questions posed by humans ina natural language.A QA implementation, usually a computer program, mayconstruct its answers by querying a structured database of knowledge or in-formation, usually a knowledge base. More commonly, QA systems can pullanswers from an unstructured collection of natural language documents. Someexamples of natural language document collections used for QA systems include:

• a local collection of reference texts

• internal organization documents and web pages

• compiled newswire reports

• a set of Wikipedia pages

• a subset of World Wide Web pages

QA research attempts to deal with a wide range of question types includ-ing: fact, list, definition, How, Why, hypothetical, semantically constrained,and cross-lingual questions.Closed-domain question answering deals with ques-tions under a specific domain (for example, medicine or automotive mainte-nance), and can be seen as an easier task because NLP systems can exploitdomain-specific knowledge frequently formalized in ontologies. Alternatively,closed-domain might refer to a situation where only a limited type of questionsare accepted, such as questions asking for descriptive rather than proceduralinformation.Open-domain question answering deals with questions about nearlyanything, and can only rely on general ontologies and world knowledge. On theother hand, these systems usually have much more data available from whichto extract the answer.

3

2

Natural language processing

Natural language processing (NLP) is a field of computer science, artificial in-telligence, and linguistics concerned with the interactions between computersand human (natural) languages. As such, NLP is related to the area of hu-mancomputer interaction. Many challenges in NLP involve natural languageunderstanding – that is, enabling computers to derive meaning from human ornatural language input.Modern NLP algorithms are based on machine learning, especially statisticalmachine learning. The paradigm of machine learning is different from that ofmost prior attempts at language processing. Prior implementations of language-processing tasks typically involved the direct hand coding of large sets of rules.The machine-learning paradigm calls instead for using general learning algo-rithms often, although not always, grounded in statistical inference to au-tomatically learn such rules through the analysis of large corpora of typicalreal-world examples. A corpus (plural, corpora) is a set of documents (or some-times, individual sentences) that have been hand-annotated with the correctvalues to be learned.

Many different classes of machine learning algorithms have been applied toNLP tasks. These algorithms take as input a large set of features that aregenerated from the input data. Some of the earliest-used algorithms, such asdecision trees, produced systems of hard if-then rules similar to the systemsof hand-written rules that were then common. Increasingly, however, researchhas focused on statistical models, which make soft, probabilistic decisions basedon attaching real-valued weights to each input feature. Such models have theadvantage that they can express the relative certainty of many different possibleanswers rather than only one, producing more reliable results when such a modelis included as a component of a larger system.Systems based on machinelearning algorithms have many advantages over handproduced rules:

• The learning procedures used during machine learning automatically focuson the most common cases, whereas when writing rules by hand it is often

4

Seminar Report 2013 QUESTION ANSWERING SYSTEM

not obvious at all where the effort should be directed.

• Automatic learning procedures can make use of statistical inference algo-rithms to produce models that are robust to unfamiliar input (e.g. contain-ing words or structures that have not been seen before) and to erroneousinput (e.g. with misspelled words or words accidentally omitted). Gen-erally, handling such input gracefully with hand-written rules or moregenerally, creating systems of hand-written rules that make soft decisionsis extremely difficult, error-prone and time-consuming.

• Systems based on automatically learning the rules can be made more ac-curate simply by supplying more input data. However, systems basedon hand-written rules can only be made more accurate by increasing thecomplexity of the rules, which is a much more difficult task. In particu-lar, there is a limit to the complexity of systems based on hand-craftedrules, beyond which the systems become more and more unmanageable.However, creating more data to input to machine-learning systems sim-ply requires a corresponding increase in the number of man-hours worked,generally without significant increases in the complexity of the annotationprocess.

Department of CSE 5 GCE, Kannur

3

Question AnsweringSystem(QAS)

3.1 QA

3.1.1 History

Two early QA systems were BASEBALL and LUNAR.[when?][who?][citationneeded] BASEBALL answered questions about the US baseball league over aperiod of one year. LUNAR, in turn, answered questions about the geologicalanalysis of rocks returned by the Apollo moon missions. Both QA systems werevery effective in their chosen domains. In fact, LUNAR was demonstrated ata lunar science convention in 1971 and it was able to answer 90 percentile ofthe questions in its domain posed by people untrained on the system. Furtherrestricted-domain QA systems were developed in the following years. The com-mon feature of all these systems is that they had a core database or knowledgesystem that was hand-written by experts of the chosen domain. The languageabilities of BASEBALL and LUNAR used techniques similar to ELIZA andDOCTOR, the first chatterbot programs. SHRDLU was a highly successfulquestion-answering program developed by Terry Winograd in the late 60s andearly 70s. It simulated the operation of a robot in a toy world (the ”blocksworld”), and it offered the possibility to ask the robot questions about the stateof the world. Again, the strength of this system was the choice of a very specificdomain and a very simple world with rules of physics that were easy to encodein a computer program. In the 1970s, knowledge bases were developed thattargeted narrower domains of knowledge. The QA systems developed to inter-face with these expert systems produced more repeatable and valid responses toquestions within an area of knowledge. These expert systems closely resembledmodern QA systems except in their internal architecture. Expert systems relyheavily on expert-constructed and organized knowledge bases, whereas manymodern QA systems rely on statistical processing of a large, unstructured, nat-

6


ural language text corpus. The 1970s and 1980s saw the development of compre-hensive theories in computational linguistics, which led to the development ofambitious projects in text comprehension and question answering. One exampleof such a system was the Unix Consultant (UC), developed by Robert Wilenskyat U.C. Berkeley in the late 1980s. The system answered questions pertainingto the Unix operating system. It had a comprehensive hand-crafted knowledgebase of its domain, and it aimed at phrasing the answer to accommodate varioustypes of users. Another project was LILOG, a text-understanding system thatoperated on the domain of tourism information in a German city. The systemsdeveloped in the UC and LILOG projects never went past the stage of simpledemonstrations, but they helped the development of theories on computationallinguistics and reasoning. Recently, specialized natural language QA systemshave been developed, such as EAGLi for health and life scientists.

3.1.2 QA model

Most modern QA systems use natural language text documents as their un-derlying knowledge source. Natural language processing techniques are usedto both process the question and index or process the text corpus from whichanswers are extracted. An increasing number of QA systems use the WorldWide Web as their corpus of text and knowledge. However, many of these toolsdo not produce a human-like answer, but rather employ ”shallow” methods(keyword-based techniques, templates...) to produce a list of documents or alist of document excerpts containing the probable answer highlighted.In an alternative QA implementation, human users assemble knowledge in astructured database, called a knowledge base, similar to those employed in theexpert systems of the 1970s. It is also to employ a combination of structureddatabases and natural language text documents in a hybrid QA system. Sucha hybrid system may employ data mining algorithms to populate a structuredknowledge base that is also populated and edited by human contributors. Anexample hybrid QA system is the Wolfram Alpha QA system which employsnatural language processing to transform human questions into a form that isprocessed by a curated knowledge base.Current QA systems typically include a question classifier module that deter-mines the type of question and the type of answer. After the question is anal-ysed, the system typically uses several modules that apply increasingly complexNLP techniques on a gradually reduced amount of text. Thus, a document re-trieval module uses search engines to identify the documents or paragraphs inthe document set that are likely to contain the answer. Subsequently a filterpreselects small text fragments that contain strings of the same type as theexpected answer. For example, if the question is ”Who invented Penicillin” thefilter returns text that contain names of people. Finally, an answer extractionmodule looks for further clues in the text to determine if the answer candidatecan indeed answer the question.



3.1.3 Question answering methods

QA is very dependent on a good search corpus - for without documents contain-ing the answer, there is little any QA system can do. It thus makes sense thatlarger collection sizes generally lend well to better QA performance, unless thequestion domain is orthogonal to the collection. The notion of data redundancyin massive collections, such as the web, means that nuggets of information arelikely to be phrased in many different ways in differing contexts and documents,leading to two benefits:

1. By having the right information appear in many forms, the burden on theQA system to perform complex NLP techniques to understand the text islessened.

2. Correct answers can be filtered from false positives by relying on the cor-rect answer to appear more times in the documents than instances ofincorrect ones.

Question answering heavily relies on reasoning, and there is a number of questionanswering systems designed in Prolog

3.1.4 Issues in QA

In 2002 a group of researchers wrote a roadmap of research in question answer-ing. The following issues were identified.

• Question classes :Different types of questions (e.g., ”What is the capitalof Liechtenstein?” vs. ”Why does a rainbow form?” vs. ”Did MarilynMonroe and Cary Grant ever appear in a movie together?”) require theuse of different strategies to find the answer. Question classes are arrangedhierarchically in taxonomies.

• Question processing :The same information request can be expressed invarious ways, some interrogative (”Who is the King of Lesotho?”) andsome assertive (”Tell me the name of the King of Lesotho.”). A semanticmodel of question understanding and processing would recognize equiva-lent questions, regardless of how they are presented. This model wouldenable the translation of a complex question into a series of simpler ques-tions, would identify ambiguities and treat them in context or by interac-tive clarification.

• Context and QA : Questions are usually asked within a context and an-swers are provided within that specific context. The context can be usedto clarify a question, resolve ambiguities or keep track of an investiga-tion performed through a series of questions. (For example, the question,”Why did Joe Biden visit Iraq in January 2010?” might be asking whyVice President Biden visited and not President Obama, why he went toIraq and not Afghanistan or some other country, why he went in January



2010 and not before or after, or what Biden was hoping to accomplishwith his visit. If the question is one of a series of related questions, theprevious questions and their answers might shed light on the questioner’sintent.)

• Data sources for QA : Before a question can be answered, it must be knownwhat knowledge sources are available and relevant. If the answer to aquestion is not present in the data sources, no matter how well the questionprocessing, information retrieval and answer extraction is performed, acorrect result will not be obtained.

• Answer extraction :Answer extraction depends on the complexity of thequestion, on the answer type provided by question processing, on theactual data where the answer is searched, on the search method and onthe question focus and context

• Answer formulation :The result of a QA system should be presented in away as natural as possible. In some cases, simple extraction is sufficient.For example, when the question classification indicates that the answertype is a name (of a person, organization, shop or disease, etc.), a quantity(monetary value, length, size, distance, etc.) or a date (e.g. the answer tothe question, ”On what day did Christmas fall in 1989?”) the extractionof a single datum is sufficient. For other cases, the presentation of theanswer may require the use of fusion techniques that combine the partialanswers from multiple documents.

• Real time question answering : There is need for developing QA systemsthat are capable of extracting answers from large data sets in several sec-onds, regardless of the complexity of the question, the size and multitude ofthe data sources or the ambiguity of the question. The user profile capturesdata about the questioner, comprising context data, domain of interest,reasoning schemes frequently used by the questioner, common ground es-tablished within different dialogues between the system and the user, andso forth. The profile may be represented as a predefined template, whereeach template slot represents a different profile feature. Profile templatesmay be nested one within another

• Multilingual (or cross-lingual) question answering :The ability to answera question posed in one language using an answer corpus in another lan-guage (or even several). This allows users to consult information that theycannot use directly.

• Interactive QA :It is often the case that the information need is not wellcaptured by a QA system, as the question processing part may fail toclassify properly the question or the information needed for extracting andgenerating the answer is not easily retrieved. In such cases, the questionermight want not only to reformulate the question, but to have a dialoguewith the system. (For example, the system might ask for a clarification



of what sense a word is being used, or what type of information is beingasked for.)

• Advanced reasoning for QA :More sophisticated questioners expect an-swers that are outside the scope of written texts or structured databases.To upgrade a QA system with such capabilities, it would be necessary tointegrate reasoning components operating on a variety of knowledge bases,encoding world knowledge and common-sense reasoning mechanisms, aswell as knowledge specific to a variety of domains.

• Information clustering for QA: Information clustering for question answer-ing systems is a new trend that is originated to increase the accuracy ofquestion answering systems through search space reduction. In recentyears this is widely researched through development of question answer-ing systems which support information clustering in their basic flow ofprocess.

• User profiling for QA :The user profile captures data about the questioner,comprising context data, domain of interest, reasoning schemes frequentlyused by the questioner, common ground established within different dia-logues between the system and the user, and so forth. The profile may berepresented as a predefined template, where each template slot representsa different profile feature. Profile templates may be nested one withinanother

3.2 QA Architecture

The architecture of the system is shown below. The question is analysed byNLPWin to produce a logical form, and in addition a set of query terms isextracted from it. The query terms will normally contain all of the words ofthe question less the question word itself (what, who, how, etc.) and a fewother stop words. The query terms are used by the Okapi IR engine with BM25weighting to produce a list of documents. The documents are then segmentedinto sentences. This stage uses NLPWin, although without using its detailedlinguistic analysis capabilities. The resulting list of sentences is ordered bythe number of terms from the question they contained, and processed again byNLPWin, this time producing the full linguistic analysis. A cutoff on the numberof sentences is used to control the processing time, since a full NLP analysis canbe quite time-consuming. The resulting logical forms are compared with thequestion’s logical form to produce a ranked list of answers with scores.

3.2.1 Question manipulation and classification

The aim of the question manipulation stage is to simplify the logical form ofquestions in order to make it easier to classify them, and to label certain termsin the question as formal and hence not expected to match a term in a candidate



Figure 3.1: QA Architecture

answer. The majority of the manipulations look for a specific question word,attached to a specific relation. For example, a question of the form Who is Xreceives a logical form in which X has an Equiv relation to a node for who. Insuch cases, we simply delete the relation and who and add an annotation tothe top node of X which indicates that we are looking for an answer to a Whoquestion over objects with the property of being X. Similar principles apply tomany of the question types. The relation may be other than Equiv; for examplein where and when questions, the relations Locn and Time are used. A secondcase which occurs frequently is logical forms in which the topmost node is be,usually with a single child, or with one child which is a Wh-word and one whichis a content node. In such cases, we remove the be node, and in the latter casemove the Wh- word’s properties to the other child.There are some common subjects for what questions, such as what country...,what year.... In these cases, we remove the whole what-phrase and mark theremaining top level node with a special property to indicate that the questionshould be answered with a restriction as to the answer type. This is only donewhen the subject corresponds to a property which NLPWin marks in the LF,such as Cntry for country. NLPWin derives this information from its lexicalresources. Tthere are a number of other question manipulations on broadlysimilar principles. After the manipulation, we then assign each question toa category, using the question word (often now discarded and encoded as aproperty) and the structural configuration. An example of a distinction madeusing the structure comes with who questions, where we distinguish questionsasking about identity, as in Who is the leader of India? from questions abouta role of a predicate, as in Who invented the paper clip?. A full list of thequestion types appears in the appendix. A few questions are left as havingUnknown type, and questions with an incomplete parse are assigned the typeBad.



3.2.2 Matching

Matching proceeds by selecting and scoring possible answers guided by the ques-tion type, and then by extracting the phrase to return as the result. Answerselection is the most complex part of the matching process, and we return toit in a moment. The result of answer selection is a node in the logical formof the answer sentence and a score. To extract the answer, we look up thesyntactic node associated with the LF node, and take the portion of the origi-nal sentence which led to it. This process is imperfect, and was intended as aquick way of recovering the answer. It tends to give phrases which span morewords than necessary. For example, the LF node may describe an entity, butthe corresponding syntactic node is a prepositional phrase, as the preposition isabsorbed into the structure of the LF, resulting in an answer such as by X orto X rather than simply X. If the resulting phrase is longer than the maximumallowed width (50 bytes or 250 bytes), then words are removed from the endsof the phrase until it is short enough. By preference, words which appeared inthe question are removed over ones which were not, and otherwise the processalternates removing words from the left and right hand ends of the phrase.

3.2.3 Answer selection

Answer selection is the heart of the matching algorithm. The rules used in theTREC-9 test are rather ad hoc; some of them are reasonably well principled,while others are hacks which seemed to work more often than any alternative.The principles we use to identify candidate answers nodes include the following:

• Node properties:Node properties are used when answer nodes usually have clear LF prop-erties, but where the relationship with the query terms can vary. Who,HowMany and HowMuch questions are good examples, although we willsee later that there is a risk involved in treating Who questions this way.The node properties are flags assigned by NLPWin usually using informa-tion stored in the lexicon. Node properties are used in three stages: firstly,we look for nodes which have one or more of a set of required properties;then we remove any which have certain properties which might indicate wehave made the wrong choice as a result of over-generalisation; and finally,we look for preference properties whose omission indicates that the scoreassigned to the answer should be lowered. For example, in the case ofWho questions, the only required property is PrprN (proper name), nodesare removed if they have properties such as Tme (time), Titl (title) andCntry (country), and the score is lowered if node does not have one of theproperties Anim (animate), Humn (human) or Nme (name).

• Relation targets:Some answers can be found be looking for nodes which are the targetof a given relation type, using proximity to determine whether the nodeis likely to be related to the question terms. Examples are Where and



When questions, answers to which are often found as the target of Locnand Time relation. For When questions in particular, the answer timeexpression may appear on a different argument of a verb to the questionterm itself, or on a modifier of the question term.

• Node-to-node relations:Node-to-node relations come closest to really using the structure of theLF. The idea here is to look for a node which lies at one end of a relation,the other end of which is a question term. The case where this is usedmost extensively is in questions of the form What is X.

• Combinations:Some of the questions types use more than one of these techniques, andselect the one which gave the best score. An example is WhoRole questions(which ask who performed a particular role of an action), which look forwords with the same properties as Who questions, and also look for entitiesin a particular role of a verb, as for WhRole and WhatRole questions(node-to-node relation type of answers).

3.2.4 Proximity scoring

To assign a score to the nodes identified in answer selection, we use a simplemeasure based on how close the candidate answer is to significant terms from thequestion. The proximity measure marks each term in an answer sentence whichmatches a term from the question, and then sees how far this term is from thecandidate answer, measured as the number of relations that have to be traversedin the logical form. The idea of proximity is to provide an approximation tomatching the LFs, in that if an answer were closely related to the matchedquestion terms, then it would have a small proximity, whereas if it had anindirect relation, the proximity would be lower. There is little linguistic basisfor this approach, and the idea was really to obtain a baseline for performancebased on a simple and easily implemented technique within the timescales ofthe TREC-9 exercise. The overall proximity is calculated by summing thesedistances for each of the question terms, taking its reciprocal, and weighting itby the logarithm of the total number of the matched question terms plus one.The latter factor is simply a way of taking into account what proportion of thequestion terms were matched. The logarithm is used just to weaken the factor;although this is ad hoc, it seems to give a better performance that using justthe proportion of the query terms or no factor at all. An obvious enhancementto this process might be to weight question terms by importance, for examplegiving lower weight to question terms which are more deeply buried in the logicalform.


4

CONCLUSION

QA systems have been extended in recent years to encompass additional domainsof knowledge.For example, systems have been developed to automatically an-swer temporal and geospatial questions, questions of definition and terminology,biographical questions, multilingual questions, and questions about the contentof audio, images, and video,knowledge representation and reasoning social mediaanalysis with QA systems and sentiment analysis Question Answering provideseffective approaches that automatically answer questions posed by humans in anatural language. Due to its simple architecture and adaptive nature has it ismore likely to be seen much more in the future.

14

5

REFERENCE

[1] Christopher D. Manning, Hinrich Schtze: Foundations of Statistical Nat-ural Language Processing, [2] Hirschman, L. Gaizauskas, R. (2001) NaturalLanguage Question Answering. The View from Here. Natural Language En-gineering (2001), 7:4:275-300 Cambridge University Press. [3] Lin, J. (2002).The Web as a Resource for Question Answering: Perspectives and Challenges.In Proceedings of the Third International Conference on Language Resourcesand Evaluation (LREC 2002). [4] Galitsky, Boris. Natural Language Ques-tion Answering System: Technique of Semantic Headers Advanced KnowledgeInternational, Australia 2003.

15

question answering system

Documents