a statistical comparison of tag and query logs

20
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim

Upload: hedda-mccoy

Post on 04-Jan-2016

21 views

Category:

Documents


2 download

DESCRIPTION

A Statistical Comparison of Tag and Query Logs. Mark J. Carman, Robert Gwadera , Fabio Crestani , and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim. Contents. Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Statistical Comparison of Tag and Query Logs

A Statistical Comparison of Tag and Query Logs

Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark BaillieSIGIR 2009

June 4, 2010Hyunwoo Kim

Page 2: A Statistical Comparison of Tag and Query Logs

Contents Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion

2 / 20

Page 3: A Statistical Comparison of Tag and Query Logs

Introduction

tags3 / 20

Page 4: A Statistical Comparison of Tag and Query Logs

Introduction Questions

1. Are queries and tags similar across URLs?2. Can tag data be used to approximate user queries to a

search engine?3. Can query logs be used to suggest new tags for a particular

webpage?4. For what types of websites is the correlation between the

term distributions for queries and tags the highest?5. Which of the distributions, tags or queries, is most closely re-

lated to the content of the clicked websites?

4 / 20

Page 5: A Statistical Comparison of Tag and Query Logs

Building a Dataset AOL query log

– Sizable– Recent (2006)– English queries– Available to academic researchers– 657,426 users– A period of 3 months from March to May, 2006

Delicious tag– Collaborative tagging system

Final dataset: 4145 complete URLs– Google query, stemming, prunning

5 / 20

Page 6: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

http://www.nytimes.com

tags

or

6 / 20

Page 7: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar? Kullback-Leibler divergence

7 / 20

Page 8: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar? Jensen-Shannon divergence

– Symmetric measure

Overlap coefficient

Vq : query logsVr : tags

8 / 20

Page 9: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

9 / 20

Page 10: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar? Open directory project

10 / 20

Page 11: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

11 / 20

Page 12: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

12 / 20

Page 13: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

13 / 20

Page 14: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

14 / 20

Page 15: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

15 / 20

Page 16: A Statistical Comparison of Tag and Query Logs

Are the Distributions Similar?

16 / 20

Page 17: A Statistical Comparison of Tag and Query Logs

Investigating Website Content

17 / 20

Page 18: A Statistical Comparison of Tag and Query Logs

Investigating Website Content

18 / 20

Page 19: A Statistical Comparison of Tag and Query Logs

Conclusion Similarity between query term and tag

– Vocabularies contain a large amount of overlap– Term frequency distributions are correlated– Similarity is not dependent on the topic area

Queries are more similar to content than to tags Queries and tags are more similar to one another

than to content

Future work– Models for automatically removing noise from the tag and

query logs– Techniques for predicting useful tags from query distributions– Techniques for the effective use of tag data to improve dif-

ferent forms of Web search

19 / 20

Page 20: A Statistical Comparison of Tag and Query Logs

Thank you