optimization of a finite state space for information retrieval qiang huang school of computing,...

18
Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Upload: lynne-banks

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Optimization of a Finite State Space for Information Retrieval

Qiang Huang

School of Computing, Robert Gordon University

Page 2: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Outline

- Our previous research

- Challenges and our idea

- Work plan

Page 3: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Concept Review

Query: Documents:

glasgow

monday

weather

temperature…

Page 4: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Concept Review

Model of information retrieval:

Document Model: P(Doc_Model) = P(w|D)

Query Model: P(Query_Model) = P(w|Q)

Language Model:

The construction of a language model is to estimate the probability of

the observations in a probabilistic space with using some measurement.

Page 5: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Concept Review

Terms Query Model: Document Model:

D1 D2 D3 … DN

a 0 0 0.01 0 0

aa 0 0.01 0.05 0 0.05

glasgow 0.25 0.02 0.5 0.3 0.3

Monday 0.25 0.22 0.4 0.2 0.2

temperature 0.4 0.21 0.01 0.4 0.4

weather 0.1 0.5 0.01 0.1 0.05

zoo 0 0.04 0.02 0 0

.

.

.

.

.

.

.

.

.

.

.

.

Page 6: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Concept Review

Probability based Measures:

Kullback-Leibler (KL) Divergence:

w QwP

DwPDwPModelQueryModelDocP

)|(

)|(log)|()_||_(

Vector based Measures:

Euclidean Distance

Cosine value of the angle between two vectors, query and document.

Page 7: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Model Optimization

glasgow

monday temperature

wi

Page 8: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Model Optimization

glasgow

mondaytemperature

wi

P(wi|monday)P(wi|temperature)

P(wi|glasgow)

Page 9: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Framework of Theory

)|( QwP

Problem:

- what information can be used to build the state space?

words, query terms (single term and combinations), part of document, documents, even the intra-relationship between query terms, and other context information, such as user’s search history

j

jjQ QPQwPwP )()|()|(

Example:

Q = {glasgow, monday, temperature} Q’ = {{glasgow}, {monday}, {temperature}, {glasgow monday}, {glasgow temperature}, {monday temperature}, {glasgow monday temperature}}

Page 10: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Framework of Theory Algorithms:

- Association Rule: is used to estimate P(w | Qj).Song, D., Huang, Q., Rüger, S., Bruza, P., Facilitating Query Decomposition in Query Language Modeling by Association Rule Mining Using Multiple Sliding Windows. The 30th European Conference on Information Retrieval (ECIR’2008).

- The Aspect Model: is used to estimate P(w | Qj) and P(Qj).Huang, Q., Song, D., Rüger, S., Bruza, P., Learning and Optimization of an Aspect Hidden Markov Model for Query Language Model Generation. The 1st International Conference on the Theory of Information Retrieval (ICTIR’2007).

- Markov Chain: P(wi) = Σwj P(wi | wj)• P(wj)Hoenkamp, E., Bruza, P., Huang, Q. and Song, D. The Asymptotic Behavior of a Limited Dependencies Language Model. Dutch-Belgian information retrieval (DIR’2008), the Netherlands, 2008.

- The Hidden Markov Model: is used to estimate P(w | Qj) and P(Qj) with taking into account the intra-relations between the subsets of query Q.Huang, Q., Song, D., A latent Variable Model for Query Expansion using the Hidden Markov Model. The 17th Conference on Information and Knowledge Management (CIKM’2008).

Page 11: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Evaluation

Evaluation:

Data sets:

- .GOV2 (~22 million documents, ~500G)

- WT10G (~1.6 million documents, ~10G)

- TREC Discs 1-5 (~2 million documents, ~9G)

Performances:

Our methods significantly outperform a number of state-of-the-art models, such as the Relevance Model (from University of Massachusetts)

Page 12: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Challenges

Challenges:

- No general ways to model the relationships between dynamic contexts

- Lack of mechanisms in classical IR for integrating and mapping between

representations of such multimedia and structured documents and their

respective contexts

Page 13: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Why using Quantum Theory (QT)

Why using Quantum Theory (QT):

QT provides a unified framework for different types of mechanisms:

- geometrical representation of information as vectors in Hilbert space

- measurement of observables via subspace projection operators

- ability for logical reasoning through lattice structures

- modeling the change of states via evolution operators

Page 14: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Issues

Issues:

- How to represent search state?

- How to develop operational methods for measurement of observables

and modelling context?

- How to use the interaction and evolution of contexts?

Page 15: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

How to do (PFSA)

Probabilistic finite state automata (PFSA): is a tuple

- is a finite set of states

- is the initial state

- is the set of final states

- is a finite alphabet

- is an nxn stochastic matrix: is the probability of going from state i

to state j when w is a input letter

- is a probability distribution of a letter w over w

TU

))(,,,,( wwf UQQA

QQ f

},,{ ''1 nQQ Q

wU jiwU ,)(

Q

Page 16: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

How to do (QFSA)

Quantum finite state automata (QFSA): is a quantum analogue of probabilistic automata

- N-state qubit , which is an element of N-dimensional

complex projective space.

- Uw is an NxN unitary matrix for letter w.

- ∑ is a finite alphabet, .

- Pr(w), the probability of the state machine accepting a given finite input

string (w = {wk, …, w1, wk}) is . (||.||² is

L² norm, and P is a NxN projection matrix)

NCP|

w

2|)Pr(

01 www UUPUw

k

Page 17: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

How will I do (Methods)

Method 1:

- search states: queries, documents, and different types of context

- subspace model: the rich lattice structure of the quantum theory (QT) space

Method 2:

- context operator: the required basis vectors, the relative importance with

respect to the context, and the projection of the document onto the subspace

Method 3:

- evolution of states: the use of unitary operators in QT and the dynamics of

search involving changing information

Page 18: Optimization of a Finite State Space for Information Retrieval Qiang Huang School of Computing, Robert Gordon University

Plan