cross-lingual query suggestion using query logs of different languages

24
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07

Upload: bree

Post on 16-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Cross-Lingual Query Suggestion Using Query Logs of Different Languages. SIGIR 07. Abstract. Query suggestion To suggest relevant queries for a given query To help users better specify their information needs Cross-Lingual Query Suggestion (CLQS): - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

1

Cross-Lingual Query Suggestion Using

Query Logs of Different Languages

SIGIR 07

Page 2: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

2

Abstract

• Query suggestion– To suggest relevant queries for a given query– To help users better specify their information

needs

• Cross-Lingual Query Suggestion (CLQS): – For a query in one language, we suggest similar or

relevant queries in other languages.• cross-lingual keyword bidding (Search Engine)

• cross-language information retrieval (CLIR)

Page 3: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

3

Introduction

• CLQS vs. Cross-Lingual Query Expansion – Full queries formulated by users in another

language.

• The users of search engines – similar interests in the same period of time– queries on similar topics in different languages

• Key point– How to learn a similarity measure between two

queries– MLQS: Term Co-Occurrence based MI and 2

Page 4: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

4

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 5: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

5

Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2

– qf : a source language query

– qe : a target language query

– simML : Monolingual query similarity

– simCL : Cross-lingual query similarity

– Tqf : translation of qf in the target language

Page 6: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

6

Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2

• Learning: LIBSVM regression algorithm– f : feature functions– : mapping feature space onto kernel space– w : weight vector in the kernel space

– relevant vs. irrelevant– strongly relevant, weakly relevant or irrelevant

Page 7: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

7

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 8: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

8

Monolingual Query Similarity Measure Based on Click-through Information

• click-through information in query logs [26]

• KN(x) : number of keyword in a query x

• RD(x) : number of clicked URLs for a query x

• = 0.4 , =0.6

Page 9: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

9

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 10: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

10

1. Bilingual Dictionary – 1/2

– 120,000 unique entries (built-in-house)– Given an input query qf={wf1,wf2,…,wfn} (in source languag

e)– By bilingual dictionary D: D(wfi)={ti1,ti2,…,tim}

– C(x,y) is the number of queries in the log containing both x and y.

– C(x) is the number of queries in the log containing x. – N is the total number of queries in the log

Page 11: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

11

1. Bilingual Dictionary – 2/2

– The set of top-4 query translations is denoted as S(Tqf)

– T S(Tqf)• Retrieve all queries containing T in target language and

assign Sdict(T) as their value

Page 12: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

12

2. Parallel Corpora– Given a pair of queries

• qf : in the source language • qe : in the target language

– Bi-Directional Translation Score : • IBM model 1 & GIZA++ tool

• P(yj|xi) is the word to word translation probability

– Top 10 queries {qe} with qf from the query log

Page 13: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

13

3. Online Mining for Related Queries – 1/3

• OOV is a major knowledge bottleneck for query translation and CLIR

• Assumption :– A query in the target co-occurs with the source

query in many web pages– They are probably semantically related – but, amount of noise

Page 14: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

14

3. Online Mining for Related Queries – 2/3

– Frequency in the Snippets• For example:

– Given a query q=abc in source language

– By dictionary : a={a1,a2,a3}, b={b1,b2} and c={c1}

– Web query : q ^ (a1 v a2 v a3) ^ (b1 v b2) ^ (c1) in target language

– 700 snippets , most frequent 10 target queries

Page 15: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

15

3. Online Mining for Related Queries – 3/3

– Any query qe mined from the web will be associated with a feature CODC Measure with SCODC(qf,qe)

Page 16: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

16

4. Monolingual Query Suggestion

• Q0 : candidate queries (in target language)

– For each target query qe,

• SQML(qe) : monolingual source query

Page 17: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

17

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 18: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

18

Estimating Cross-lingual Query Similarity

• Four categories of features are used to learn the cross-lingual query similarity.

• cross-lingual query similarity score– Learning: LIBSVM regression algorithm

• f : feature functions

• : mapping feature space onto kernel space

• w : weight vector in the kernel space

Page 19: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

19

Performance Evaluation – Log Data

• Data Resources : – MSN Search Engine

• French (source language) vs. English ( target language)– A one-month English query log

– 7 million unique English queries

– Occurrence frequency more than 5

• 5,000 French queries – 4,171 queries have their translations in the English queries

– 70% training weight of LIBSVM

– 10% development data

– 20% testing

Page 20: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

20

Performance Evaluation - CLIR

• Data Resources : – TREC6 CLIR data (AP88-90 newswire, 750MB)– 25 short French-English queries Pairs (CL1-CL25)

• average long 3.3

• match in the web query logs for training CLQS

Source Language

Target Language

BM25

CLIR

CLQS {q

e}qf

Page 21: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

21

• CLQS

Page 22: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

22

Page 23: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

23

• CLIR

Page 24: Cross-Lingual Query Suggestion  Using  Query Logs  of Different Languages

24

Conclusion

• Cross-lingual query suggestion

• Query Logs

• French to English

• TREC6 French to English CLIR task– CLQO demonstrates the high quality