exploiting distributional semantic models in question answering
DESCRIPTION
This paper investigates the role of Distributional Semantic Models (DSMs) in Question Answering (QA), and specifically in a QA system called QuestionCube. QuestionCube is a framework for QA that combines several techniques to retrieve passages containing the exact answers for natural language questions. It exploits Information Retrieval models to seek candidate answers and Natural Language Processing algorithms for the analysis of questions and candidate answers both in English and Italian. The data source for the answer is an unstructured text document collection stored in search indices. In this paper we propose to exploit DSMs in the QuestionCube framework. In DSMs words are represented as mathematical points in a geometric space, also known as semantic space. Words are similar if they are close in that space. Our idea is that DSMs approaches can help to compute relatedness between users’ questions and candidate answers by exploiting paradigmatic relations between words. Results of an experimental evaluation carried out on CLEF2010 QA dataset, prove the effectiveness of the proposed approach.TRANSCRIPT
Exploiting Distributional Semantic Models in Question Answering
Piero Molino1,2, Pierpaolo Basile1,2,Annalina Caputo1, Pasquale Lops1, Giovanni Semeraro1
{[email protected]}1Dept. of Computer Science, University of Bari Aldo Moro
2QuestionCube s.r.l.
6th IEEE International Conference on Semantic ComputingSeptember 19-21, 2012 Palermo, Italy
Background
A bottle of Tesgüino is on the table.
Everyone likes Tesgüino.
Tesgüino makes you drunk.
We make Tesgüino out of corn.
Background
A bottle of Tesgüino is on the table.
Everyone likes Tesgüino.
Tesgüino makes you drunk.
We make Tesgüino out of corn.
It is a corn beer
Background
You shall know a word by the company it keeps!
Meaning of a word is determined by its usage
4
Background
Distributional Semantic Models (DSMs)
– represent words as points in a geometric space
– do not require specific text operations
– corpus/language independent
Widely used in IR and Computational Linguistic
– semantic text similarity, synonyms detection, query expansion, …
Not directly used in Question Answering (QA)
5
Our idea
Using DSMs into a QA system– QuestionCube: a framework for QA
Exploit DSMs– compare semantic similarity between user’s question
and candidate answer
Our idea
Using DSMs into a QA system– QuestionCube: a framework for QA
Exploit DSMs– compare semantic similarity between user’s question
and candidate answer
Q: Which beverages contain alcohol?A: Tesgüino makes you drunk.
Outline
• Introduction to the QA problem
• Distributional Semantic Models
– semantic composition
• DSMs into QuestionCube
• Evaluation and results
• Final remark
QA vs. IR
Question Answering Information Retrieval
Information need Natural language question Keywords
ResultFactoid answer or short passages
Whole documents
Information acquisition Reading answersSearching manually insidedocuments
DISTRIBUTIONAL SEMANTIC MODELS
Distributional Semantic Models…
Defined as: <T, C, R, W, M, d, S>
– T: target elements (words)
– C: context
– R: relation between T and C
– W: weighting schema
– M: geometric space TxC
– d: space reduction M -> M’
– S: similarity function defined in M’
…Distributional Semantic Models
Term-term co-occurrence matrix: each cell contains the co-occurrences between two terms within a prefixed distance
dog cat computer animal mouse
dog 0 4 0 2 1
cat 4 0 0 3 5
computer 0 0 0 0 3
animal 2 3 0 0 2
mouse 1 5 3 2 0
…Distributional Semantic Models
TTM: term-term co-occurrence matrix
Latent Semantic Analysis (LSA): relies on the Singular Value Decomposition (SVD) of the co-occurrence matrix
Random Indexing (RI): based on the Random Projection
Latent Semantic Analysis over Random Indexing (LSARI)
Latent Semantic Analysis
Given M the TTM matrix
– apply the SVD decomposition M=UΣVT
• Σ is the matrix of singular values
– choose k singular values from Σ and approximatethe matrix M
• M*=U*Σ*V*T
– SVD
• discover high-order relations between terms
• reduce the sparseness of the original matrix
Random Indexing
1. Generate and assign a context vector to each context element (e.g. document, passage, term, …)
2. Term vector is the sum of the context vectors in which the term occurs
– sometimes the context vector could be boosted by a score (e.g. term frequency, PMI, …)
15
Context Vector
• sparse
• high dimensional
• ternary {-1, 0, +1}
• small number of randomly distributed non-zero elements
0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 1 0 0 0 0 -1
16
Random Indexing (formal)
Bn,k = An,mRm,k
B nearly preserves the distance between points(Johnson-Lindenstrauss lemma)
k << m
dr = c×d
17
RI is a locality-sensitive hashing method which approximate the cosine distance between vectors
Random Indexing (example)
John eats a red appleCVjohn -> (0, 0, 0, 0, 0, 0, 1, 0, -1, 0)CVeat -> (1, 0, 0, 0, -1, 0, 0 ,0 ,0 ,0)CVred -> (0, 0, 0, 1, 0, 0, 0, -1, 0, 0)
TVapple= CVjohn+ CVeat+ CVred=(1, 0, 0, 1, -1, 0, 1, -1, -1, 0)
18
Latent Semantic Analysis over Random Indexing
1. Reduce the dimension of the co-occurrences matrix using RI
2. Perform LSA over RI (LSARI)
– reduction of LSA computation time: RI matrix contains less dimensions than co-occurrences matrix
QUESTIONCUBE FRAMEWORK
QuestionCube framework…
Natural Language pipeline for Italian and English– PoS-tagging, lemmatization, Name Entity Recognition,
Phrase Chunking, Dependency Parsing, Word Sense Disambiguation (WSD)
IR model based on classical VSM or BM25
– several query expansion strategies
Passage retrieval
…QuestionCube framework
Question analysis
Question
Documentretrieval
Question expansion and categorization
Passage Filter
Term filter
N-gram filter
Density filter
Sequence filter
Score norm
Ranked list ofpassages
Candidatepassages
Index
DSMS INTO QUESTIONCUBE
Integration
Question analysis
Question
Documentretrieval
Questionexpansion and categorization
Passage Filter
Term filter
N-gram filter
Density filter
Sequence filter
Score norm
Ranked list ofpassages
Candidatepassages
Distributional filter
New filterbased on DSMs
Index
Distributional filter…
Compute the similarity between the user’s question and each candidate answer
Both user’s question and candidate answer are represented by more than one term
– a method to compose words is necessary
…Distributional filter…
Questioncandidateanswer1
candidateanswer3
candidateanswer2
candidateanswern
Distributional filter
DSM
compositionalapproach
candidateanswer3
candidateanswer1
candidateanswer5
candidateanswern
re-ranking
…Distributional filter (compositional approach)
Addition (+): pointwise sum of components
– represent complex structures by summing words which compose them
Given q=q1 q2 … qn and a=a1 a2 … am
– question:
– candidate answer:
– similarity
nqqqq
21
maaaa
21
aq
aqaqsim
),(
EVALUATION AND RESULTS
Evaluation…
Dataset: 2010 CLEF QA Competition
– 10,700 documents• from European Union legislation and European Parliament
transcriptions
– 200 questions
– Italian and English
DSMs
– 1,000 vector dimension (TTM/LSA/RI/RILSA)
– 50,000 most frequent words
– co-occurrence distance: 4
29
...Evaluation
1. Effectiveness of DSMs in QuestionCube
2. Comparison between the several DSMs adopted
Metrics
– a@n: accuracy taking into account only the first nanswers
– MRR based on the rankof the correct answer
30
N
rank
N
i i1
1
Evaluation (alone scenario)
Only distributional filter is adopted: no otherfilters in the pipeline
31
Passage Filter
Term filter
N-gram filter
Density filter
Sequence filter
Score norm
Distributional filter
Evaluation (combined)
Distributional filter is combined with the other filters
– using CombSum function
– baseline: distributional filter is removed
32
Passage Filter
Term filter
N-gram filter
Density filter
Sequence filter
Score norm
Distributional filter
Results (English)…Run a@1 a@5 a@10 a@30 MRR
alone TTM .060 .145 .215 .345 .107
RI .180 .370 .425 .535 .267
LSA .205 .415 .490 .600 .300
LSARI .190 .405 .490 .620 .295
combined baseline* .445 .635 .690 .780 .549
TTM .535 .715 .775 .810 .6141
RI .550 .730 .785 .870 .6371,2
LSA .560 .725 .790 .855 .6371
LSARI .555 .730 .790 .870 .6341
1significat wrt. baseline2significat wrt. TTM*without distributional filter
…Results (Italian)Run a@1 a@5 a@10 a@30 MRR
alone TTM .060 .140 .175 .280 .097
RI .175 .305 .385 .465 .241
LSA .155 .315 .390 .480 .229
LSARI .180 .335 .400 .500 .254
combined baseline* .365 .530 .630 .715 .441
TTM .405 .565 .645 .740 .5391
RI .465 .645 .720 .785 .5551
LSA .470 .645 .690 .785 .5511
LSARI .480 .635 .690 .785 .5571,2
1significat wrt. baseline2significat wrt. TTM*without distributional filter
Final remarks…
Alone: all the proposed DSMs are better than the TTM
– in particular LSA and LSARI
Combined: all the combinations are able to overcome the baseline
– English +16% (RI/LSA), Italian +26% (LSARI)
Slight difference in performance between LSA and LSARI
…Final remarks
Distributional filter in the alone scenario shows an higher improvement than in the combination scenario
– find more effective way to combine filters (learning to rank approach)
Exploit more complex semantic compositional strategies
Thank you for your attention!
Questions?<[email protected]>