tuning before feedback: combining ranking discovery and blind feedback for robust retrieval* weiguo...

Tuning Before Feedback:Combining Ranking Discovery and

Blind Feedback for Robust Retrieval*

Weiguo Fan, Ming Luo, Li Wang, Wensi Xi, and Edward A. FoxDigital Library Research Laboratory, Virginia Tech

*This research is supported by the National Science Foundation under Grant Numbers IIS-0325579, DUE-0136690 and DUE-0333531

Outline Introduction Research Questions Approach: Ranking Tuning + Blind Fdbk Experiment Results Conclusion

Introduction Ranking functions play an important role

in IR performance Blind feedback (pseudo-relevance

feedback) has been found very useful for ad hoc retrieval

Why not combine ranking function optimization with blind feedback to improve robustness?

Research Questions Does blind feedback work even better on fine-

tuned ranking functions as compared to on traditional ranking functions such as Okapi BM25?

Does the type of query (very short vs. very long) have any impact on the combination approach?

Can the ranking function discovered, in combination with blind feedback, extrapolate well for new unseen queries?

Our Approach Use ARRANGER

a Genetic Programming-based discovery engine

to perform the ranking function tuning [Fan 2003tkde, Fan 2004ip&m, Fan 2004jasist]

Combine ranking tuning and feedback Test on different types of queries

RF Discovery Problem

Order Doc. Rele.1 A 12 D 13 F 14 G 15 B 06 C 07 E 0

Order Doc. Rele.1 A 12 B 03 C 04 D 15 E 06 F 17 G 1

Feedback

Training

Data

Input

Ranking Function

Discovery

Ranking

Function f

Output

Ranking Function Optimization Ranking Function Tuning is an art! – Paul Kantor Why not adaptively discover RF by Genetic Programming?

Huge search space Discrete objective function Modeling advantage

What is GP? Problem solving system designed based on principles of evolution

and heredity Widely used for structure discovery, functional form discovery,

other data mining and optimization tasks

Genetic Algorithms/Programming Representation:

Vector of bit strings or real numbers for GA Complex data structures: trees, arrays for GP

Genetic transformation Reproduction Crossover Mutation

IR application [Gordon’88, ’91], [Chen’98a, ’98b], [Pathak’00], etc.

Essential GP ComponentsComponents Meaning

Terminals Leaf nodes in the tree structure (i.e., x, y).

Functions Non-leaf nodes used to combine the leaf nodes. Commonly, numerical operations: +, -, *, /, log, sqrt.

Fitness function

The objective function GP aims to optimize.

Reproduction A genetic operator that copies the individuals with the best fitness values directly into the population of the next generation without going through the crossover operation.

Crossover A genetic operator aiming to improve the diversity as well as the genetic fitness of the population. See details in next slide.

Example of Crossover in GP

tf*(tf+df)

tf*(N/df)

+

df

*

tf

*

tf

+

df

Crossover

Parent1 Parent2

Child1Child2

N/df+df

(tf*df)+df

N

/

dfdftf

+Generation: N

Generation: N+1

N

/

dfdftf

+

The ARRANGER Engine1. Split the training data into

training and validation2. Generate an initial

population of random “ranking functions”

3. Evaluate the fitness of each “ranking function” in the population and record 10 best ones

4. If stopping criteria is not met, generate the next generation of population by genetic transformation, go to Step 3.

5. Validate the recorded best “ranking functions” and select the best one as the RF


1 2 3 48 49 50

Start

Initialize Population

Evaluate Fitness

Apply Crossover

Stop?

Validate and Output End

48 49 501 2 30.40.30.4 0.80.30.4

The ARRANGER Engine

1. Split the training data into training and validation2. Generate an initial population of random “ranking

functions”3. Evaluate the fitness of each “ranking function” in

the population and record 10 best ones4. If stopping criteria is not met, generate the next

generation of population by genetic transformation, go to Step 3.

5. Validate the recorded best “ranking functions” and select the best one as the RF

The ARRANGER Engine


1 2 3 48 49 50

Start

Initialize Population

Evaluate Fitness

Apply Crossover

Stop?

Validate and Output End

48 49 501 2 3

0.4 0.3 0.4 0.8 0.3 0.4

Blind Feedback Automatically adds more terms to a user’s

query to enhance the performance of search engines by assuming top ranked docs relevant

Some examples Rocchio (performs best in our experiment) Dec-Hi Kullback-Leibler Divergence (KLD) Chi-Square

Ranking Tuning

Blind Feedback

Multiple user queriesWith relevance information New Ranking

Function

New Search Results

User Queries

Ranking Tuning

Blind Feedback

Multiple user queriesWith relevance information New Ranking

Function

New Search Results

User Queries

An Integrated Model

Experiment Setting Data

2003 Robust Track data (from TREC 6, 7, 8) Training Queries

150 old queries from TREC 6, 7, 8 Test Questions

50 very hard queries + 50 new queries

The Results on 150 Training Queries

Run No. Desc Short

Okapi without BF (Baseline)

0.1880 0.2194

Okapi with BF 0.2076 (+10.4%) 0.2385 (+8.7%)

RF 1 without BF 0.2173 (+15.6%) 0.2394 (+9.1%)

RF 1 with BF 0.2422 (+28.8%) 0.2661 (+21.3%)

Results on Test Queries (1)

Results on Test Queries (2)

Conclusions Blind feedback works well on GP trained

queries. Ranking function combined with blind

feedback works with new queries Two stage model responds differently to

Desc query (slightly better) and Long query

Thank You!

Q&A?

tuning before feedback: combining ranking discovery and blind feedback for robust retrieval* weiguo...

Documents