intelligent database systems lab presenter : chuang, kai-ting authors : rafael odon de alencar,...

13
Intelligent Database Systems Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010, ACM Geographical classification of documents using evidence from Wikipedia

Upload: lauren-hunter

Post on 17-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Intelligent Database Systems Lab Motivation Geography-related terms are often used in Web search queries.

TRANSCRIPT

Page 1: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Presenter : Chuang, Kai-Ting

Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr.,

Marcos André Gonçalves

2010, ACM

Geographical classification of documents using evidence from Wikipedia

Page 2: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Page 3: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Motivation• Geography-related terms are often used in Web

search queries.

Page 4: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Objectives• It is important to recognize the association of

documents to places in order to adequately respond

to such queries.

Page 5: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Methodology• This paper shows a technique for classifying

documents according to their association to places,

based on the occurrence of terms that coincide with

Wikipedia entry titles.

Page 6: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Methodology

Page 7: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Methodology

Page 8: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Experiments

Page 9: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Experiments

Page 10: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Experiments

Page 11: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Experiments• We defined 100 place names to be removed from the

documents.• 10-fold cross validation was used.• Impact in precision:

– Wikipedia Model: more than 30% of loss.– TF-IDF Bag-of-words model: about 6% of loss.

Page 12: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Conclusions• Experiments showed that a high level of precision can

be achieved with this approach.

Page 13: Intelligent Database Systems Lab Presenter : Chuang, Kai-Ting Authors : Rafael Odon de Alencar, Clodoveu Augusto Davis Jr., Marcos André Gonçalves 2010,

Intelligent Database Systems Lab

Comments• Advantages– The approach is helpful.

• Applications– Geographic information retrieval.