captain nemo: a metasearch engine with personalized hierarchical search space (stef/nemo) stefanos...

37
Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (http://www.dblab.ntua.gr/~stef/nemo) Stefanos Souldatos, Theodore Dalamagas, Timos Sellis (National Technical University of Athens, Greece)

Upload: diana-gallagher

Post on 16-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Captain Nemo:a Metasearch Engine with Personalized Hierarchical Search Space

(http://www.dblab.ntua.gr/~stef/nemo)

Stefanos Souldatos, Theodore Dalamagas, Timos Sellis

(National Technical University of Athens, Greece)

Page 2: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

INTRODUCTION

Page 3: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Metasearching

Metasearch

Engine

SearchEngine

1SearchEngine

2SearchEngine

3

Metasearch engines can reach a large part of the web.

Page 4: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization

Personalization is the new need on the Web.

Page 5: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearching

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Personalization can be applied in all 3 stages of metasearching:

Page 6: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearching

Personal Retrieval Model

search engines, #pages, timeout

Personalization can be applied in all 3 stages of metasearching:

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 7: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearching

Personal Presentation Style

grouping, ranking, appearance

Personalization can be applied in all 3 stages of metasearching:

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 8: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearching

Thematic Classification of Results

k-Nearest Neighbor, Support Vector Machines, Naive Bayes, Neural Networks, Decision

Trees, Regression Models

Personalization can be applied in all 3 stages of metasearching:

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 9: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Hierarchical Classification

Flat Model

Hierarchical Model

CINEMAmovie film

actor

PAINTINGpainter camvas

gallery

BASKETBALLbasket nba

game

FOOTBALLground ball

match

ROOT

ARTfine arts

SPORTSathlete score

referee

CINEMAmovie film

actor

PAINTINGpainter camvas

gallery

BASKETBALLbasket nba

game

FOOTBALLground ball

match

ARTfine arts

SPORTSathlete score

referee

ROOT

Page 10: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

RELATED WORK

Page 11: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 12: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

SearchIxquickInfogridMamma

ProfusionWebCrawle

rQuery Server

search engines to be used

User defines the:

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 13: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

InfogridMamma

ProfusionQuery Server

timeout option (i.e. max time to wait for search results)

User defines the:

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 14: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

ProfusionQuery Server

number of pages to be retrieved by each search engine

User defines the:

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 15: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 16: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

Dogpile WebCrawle

r MetaCrawle

r

Result can be grouped by search engine that retrieved them

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 17: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 18: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personalization in Metasearch Engines

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Organizes search results into dynamic custom folders

Northern Light

Recognises thematic categories and improves queries towards a category

Inquirus2

Buntine et al. (2004)

Topic-based open source search engine

Page 19: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

CAPTAIN NEMO

Page 20: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personal Retrieval Model

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 21: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personal Retrieval Model

Search Engines

Number of Results

Search Engine Timeout

Search Engine Weight

SearchEngine

1

SearchEngine

2

SearchEngine

3206

308

104

7 10

5

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 22: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personal Presentation Style

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 23: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personal Presentation Style

Result Grouping Merged in a single list Grouped by search engine Grouped by relevant topic of interest

Result Content Title Title, URL Title, URL, Description

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 24: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Personal Presentation Style

Result Retriev

al

Result Presentatio

n

Result Administrati

on Look ‘n’ Feel

Color Themes(XSL Stylesheets)

Page Layout

Font Size

Page 25: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Topics of Personal Interest

Result Retriev

al

Result Presentatio

n

Result Administrati

on

Page 26: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Topics of Personal Interest

Result Retriev

al

Result Presentatio

n

Result Administrati

on Administration of topics of personal interest

The user defines a hierarchy of topics of personal interest (i.e. thematic categories).

Each thematic category has a name and a description of 10-20 words.

The system offers an environment for the administration of the thematic categories and their content.

Page 27: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Topics of Personal Interest

Result Retriev

al

Result Presentatio

n

Result Administrati

on Hierarchical classification of results

The system proposes the most appropriate thematic category for each result (Nearest Neighbor).

The user can save the results in the proposed or other category.

Page 28: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Classification Example

CINEMAmovie film

actor

PAINTINGpainter camvas

gallery

BASKETBALLbasket nba

game

FOOTBALLground ball

match

ROOT

ARTfine arts

SPORTSathlete score

referee

Query: “Michael Jordan”

Results in user’s topics of interest:

3 82

3

Page 29: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

METASEARCH RANKING

Page 30: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Two Ranking Approaches

Using Initial Scores of Search

Engines

Not Using Initial Scores of Search

Engines

Page 31: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Using Initial Scores

Rasolofo et al. (2001) believe that the initial scores of the search engines can be exploited.

Normalization is required in order to achieve a common measure of comparison.

A weight factor incorporates the reliability of each search engine. Search engines that return more Web pages should receive higher weight. This is due to the perception that the number of relevant Web pages retrieved is proportional to the total number of Web pages retrieved as relevant.

Page 32: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Not Using Initial Scores

The scores of various search engines are not compatible and comparable even when normalized.

Towell et al. (1995) note that the same document receives different scores in various search engines.

Gravano and Papakonstantinou (1998) point out that the comparison is not feasible not even among engines using the same ranking algorithm.

Dumais (1994) concludes that scores depend on the document collection used by a search engine.

Page 33: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Aslam and Montague (2001)

Bayes-fuse uses probabilistic theory to calculate the probability of a result to be relevant to a query.

Borda-fuse is based on democratic voting. It considers that each search engine gives votes in the results it returns (N votes in the first result, N-1 in the second, etc). The metasearch engine gathers the votes and the ranking is determined democratically by summing up the votes.

Page 34: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Aslam and Montague (2001)

Weighted borda-fuse: weighted alternative of borda-fuse, in which search engines are not treated equally, but their votes are considered with weights depending on the reliability of each search engine.

Page 35: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Weighted Borda-Fuse

V (ri,j) = wj * (maxk(rk) - i + 1) V(ri,j): Votes of i result of j search engine

wj: weight of j search engine (set by user)

maxk(rk) : maximum number of results

Example:5 4 3 2SE1:

5 4 3SE2:

5 4 3 2 1SE3:

W1=735 28 21 14

50 40 30

25 20 15 10 5

W2=10W3=5

Page 36: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Captain Nemo

http://www.dblab.ntua.gr/~stef/nemo

Page 37: Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space (stef/nemo) Stefanos Souldatos, Theodore Dalamagas,

Links

Introduction

Related work

Captain Nemo

Metasearch Ranking