1 efficient ir-style keyword search over relational databases 12 december 2005 databases and the...

102
1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Seminar on Databases and the Internet Databases and the Internet The Hebrew University of Jerusalem, Winter 2006

Upload: adam-jennings

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

1

Efficient IR-Style Keyword Search

over Relational Databases

12 December 2005

Seminar on Databases and the InternetDatabases and the InternetThe Hebrew University of Jerusalem, Winter 2006

Page 2: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases2SDBI 05’

IntroductionIntroduction

This presentation is mainly based upon the

work of Hristidis, Gravano, and

Papakonstantinou.

The work consists of showing several

Efficient algorithms for Information-retrieval

Keyword search, based on the DISCOVER

Architecture.

Page 3: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases3SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 4: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases4SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 5: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases5SDBI 05’

Goal and MotivationGoal and Motivation

We present a detailed framework and methods for combining IR-style keyword search over relational databases

What is Information Retrieval Keyword Search in general?

Mainly, it’s this…

Page 6: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases6SDBI 05’

Goal and MotivationGoal and Motivation

…But not always:

SELECT * FROM Complaints C

WHERE CONTAINS (C.comment, ’disk crash’, 1) > 0

ORDER BY score(1) DESC

SELECT * FROM Complaints C

WHERE CONTAINS (C.comment, ’disk crash’, 1) > 0

ORDER BY score(1) DESC

prodIDcustIDdatecomment

p121c32326-30-2002“Disk crashed after one week of moderate use on an IBM Netvista X41”

p131c31317-3-2002“lower-end IBM Netvista caught fire, starting apparently with disk Crash”

Page 7: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases7SDBI 05’

Goal and MotivationGoal and Motivation

Current status:

• RDBMSs (Such as Oracle) provide querying capabilities for text attributes, provided that an exact colum is specified.

• Only AND semantics are being used.

• Limited ranking functions.

• Known approaches for query processing strategies are inefficient (and sometimes even infeasible).

Page 8: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases8SDBI 05’

Goal and MotivationGoal and Motivation

In particular, we’d like:

• Efficient ways to generate “top k” results according to some form of “ranking”.

• The Use AND and OR semantics (not just the default AND) when gaining results.

• Assembling keyword occurances from multiple attributes - perhaps in “unforseen” ways – without needing to specify columns.

Page 9: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases9SDBI 05’

Goal and MotivationGoal and Motivation

We would like to apply same (or similar) methods and rules that apply in this world,

Prioritizing -

K-best

results first

Prioritizing -

K-best

results first

Efficient

Searching

Efficient

Searching

Use of

AND, OR

Semantics

Use of

AND, OR

Semantics

Page 10: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases10SDBI 05’

Goal and MotivationGoal and Motivation

Why should we care??

• Keyword queries require little or no knowledge about the database semantics.

• Ranking results correctly (and returning only relevant tuples) is, of course, highly desirable.

• Efficient implementation should reduce the querying process to a fraction of the time of a naïve implementation.

Page 11: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases11SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 12: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases12SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 13: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases13SDBI 05’

FrameworkFramework

Customers

custId, name, occupations

Complaints

prodId, custId, date, comments

Products

prodId, model manufacturer

Query Model:

•A database with n relations R1,…, Rn.

•relations possibly have primary key to foreign key constraints.

•The schema graph G is a directed graph, in which for each primary to foreign key relationship between Ri and Rj, there’s an edge (i,j) :

Page 14: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases14SDBI 05’

FrameworkFrameworkA possible instance of the schema graph can be:

tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c2p131c31317-3-2002“lower-end IBM Netvista caught fire, starting apparently with disk”

c3p131c31438-3-2002“IBM Netvista unstable with Maxtor HD”

Complaints

tupleIDprodIDmanufac.

model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

p3p141“Tripplite”

“Smart 700VA”

Products

tupleIDcustIDnameOccupation

u1c3232“John Smith”

“Software engineer”

u2c3131“John L.”“Architect”

u3c3143“Jack M.”“student”

Customers

Page 15: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases15SDBI 05’

FrameworkFramework

Joining trees of tuples:

• Given a schema graph G for a database, a joining tree of tuples T is a tree of tuples where each edge (ti,tj) in T, where ti ∈ Ri and tj ∈ Rj and, which satisfies 2 properties:

(1) (Ri,Rj) ∈ G (The schema graph we talked about)

(2) ti t⋈ j ∈ Ri, ⋈ Rj

• The size(T) of a joining tree is the number of tuples in T.

Page 16: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases16SDBI 05’

FrameworkFrameworkA joining tree of tuples for our example:

tupleIDprodIDcustIDdatecomment

c2p131c31317-3-2002“lower-end IBM Netvista caught fire, starting apparently with disk”

Complaints

tupleIDprodIdmanufac.model

p2p131“IBM”“Netvista”

Products

tupleIDcustIdnameOccupation

u2c3131“John L.”“Architect”

Customers

Page 17: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases17SDBI 05’

FrameworkFramework

“Top-k” keyword query

• a “top-k” keyword query is a list of keywords Q={w1… wm}. The result for such a query is a list of the k joining trees of tuples T whose score(T,Q) is the highest, so that:

(1) each tree T in a result is minimal: cannot have a zero-scored leaf.

(2) no tuple appears more than once in a joining tree of tuples.

Page 18: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases18SDBI 05’

FrameworkFrameworkFor example, the query Q = {Netvista, Maxtor}

should yield the following results: C1 (by itself)

tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c3p131c31438-3-2002“IBM Netvista unstable with Maxtor HD”

Complaints

tupleIDprodIdmanufac.model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

p3p141“Tripplite”“Smart 700VA”

Products

tupleIDcustIdnameOccupation

u1c3232“John Smith”

“Software engineer”

u2c3131“John L.”“Architect”

u3c3143“Jack M.”“student”

Customers

Page 19: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases19SDBI 05’

FrameworkFrameworkAnd the following: p2 c3

tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c3p131c31438-3-2002“IBM Netvista unstable with Maxtor HD”

Complaints

tupleIDprodIdmanufac.model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

p3p141“Tripplite”“Smart 700VA”

Products

tupleIDcustIdnameOccupation

u1c3232“John Smith”

“Software engineer”

u2c3131“John L.”“Architect”

u3c3143“Jack M.”“student”

Customers

Page 20: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases20SDBI 05’

FrameworkFrameworkAnd the following: p1 c1

tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c3p131c31438-3-2002“IBM Netvista unstable with Maxtor HD”

Complaints

tupleIDprodIdmanufac.model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

p3p141“Tripplite”“Smart 700VA”

Products

tupleIDcustIdnameOccupation

u1c3232“John Smith”

“Software engineer”

u2c3131“John L.”“Architect”

u3c3143“Jack M.”“student”

Customers

Page 21: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases21SDBI 05’

FrameworkFramework

Score (ai,Q)

• A method to evaluate the relevance of a tree of tuples. Consists of a single-attribute (ai) IR-style relevance scoring function:

tf - Term frequency

of w (w ∈ Q) in ai

tf - Term frequency

of w (w ∈ Q) in ai

N - number of

tuples in ai’s

relation

N - number of

tuples in ai’s

relation df - number of tuples in ai’s

relation with the word w

df - number of tuples in ai’s

relation with the word w

dl, avdl - (average)

attribute value size

dl, avdl - (average)

attribute value size

S - a

constant

S - a

constant

Page 22: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases22SDBI 05’

FrameworkFrameworkCombined Score (T,Q)• another function should be used to combine

the single attributes into a final score:

• those are only optional candidates

• This framework can handle many functions - as long as they satisfy the Tuple monitonicity property:

• if individual Scores of tuples in T’ < individual Scores of T, then the combined score of the trees will also have this property.

Page 23: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases23SDBI 05’

FrameworkFrameworkCandidate Networks (CN)

• can be thought of as a join expression that involves tuple sets plus (perhaps) “base” relations, that do not have occurrences of query keywords, but help to connect relations that do…

tupleIDprodIDcustIDdatecomment

Complaints{}

tupleIDprodIdmanufac.model

p2p131“IBM”“netvista”

ProductsQ

tupleIDcustIdnameOccupation

u2c3131“John L.”“Architect”

⋈ ⋈

Q = {IBM, Architect}

Q = {IBM, Architect}

customersQ

Page 24: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases24SDBI 05’

FrameworkFramework

For example, all the candidate networks (With

scores) For Q = {Maxtor,Netvista}:

P = products

C = complaints

U = customers

Page 25: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases25SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 26: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases26SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 27: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases27SDBI 05’

ArchitectureArchitecture

• Follows is a quick overview of the system architecture needed in order to efficiently implement top-k keyword queries.

• Description relies much on the DISCOVER architecture, but is not really OS/RDBMS specific.

Page 28: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases28SDBI 05’

ArchitectureArchitecture

• The architecture consists of:

– an IR Engine

– a CN generator

– an Execution Engine

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Page 29: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases29SDBI 05’

ArchitectureArchitecture

IR Engine

• Modern RDBMSs include IR-style text-indexing functionality (e.g. Oracle Text).

• It is useful to think of the IR-engine as an indexer that gives a SCORE>0 to tuples that have occurrences of the keywords

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Page 30: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases30SDBI 05’

ArchitectureArchitecture

IR Engine• The proposed architecture

exploits this functionality -upon arrival of a query Q, generates for each relation the tuple set RQ = { t ∈ R | Score(t,Q) > 0}

• The tuple sets are then sorted by decreasing score and passed on to the next module.

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Page 31: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases31SDBI 05’

ArchitectureArchitecture

CN Generator

• receives non-empty tuple sets (Such as CQ, PQ), and the general schema graph.

• attempts to join those sets, perhaps using “base” relations (U{ }… remember?) - generates Candidate Networks (CNs)!

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Page 32: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases32SDBI 05’

ArchitectureArchitecture

CN Generator

• Also receives a parameter M, that bounds the maximum tuple sets participating in a CN (either free or non-free).]

• Why is this boundary needed?

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Number of CN

Might be exponential

in query size!

Page 33: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases33SDBI 05’

ArchitectureArchitecture

CN Generator

The generated CNs MUST satisfy:

• No “leaf” of a tuple set is a “free” tuple set (P{}…).

• No RSR tuple set exists – a tree of tuples cannot include duplicate tuples!

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Page 34: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases34SDBI 05’

ArchitectureArchitecture

Execution Engine

• This is the module that actually contacts the RDBMS query tools, in order to generate the top-k results.

• This is our focus! (as it’s the most hard to implement efficiently)

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Page 35: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases35SDBI 05’

Sparse algorithm exampleSparse algorithm exampleRecall the database from before, with the query

Q= {Maxtor, Netvista} tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c2p131c31317-3-2002“lower-end IBM Netvista caught fire, starting apparently with disk”

c3p131c31438-3-2002“IBM Netvista unstable with Maxtor HD”

Complaints

tupleIDprodIdmanufac.model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

p3p141“Tripplite”“Smart 700VA”

Products

tupleIDcustIdnameOccupation

u1c3232“John Smith”

“Software engineer”

u2c3131“John L.”“Architect”

u3c3143“Jack M.”“student”

Customers

Page 36: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases36SDBI 05’

Architecture - demonstrationArchitecture - demonstration

{Maxtor, netvista}

User

Database

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

Page 37: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases37SDBI 05’

Architecture - demonstrationArchitecture - demonstration

{Maxtor, netvista}

User

Database

Keywords

IREngine

Tuple Sets

CandidateNetwork

GeneratorDatabaseSchema

Execution engine

Database

Candidate

Networks

Parameterized

SQL queries

User

IR index

We now turn our

attention to how

THIS is done

Page 38: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases38SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 39: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases39SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 40: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases40SDBI 05’

First of all, what do we have so far?First of all, what do we have so far?

• An architecture that constructs Candidate Networks from keyword queries, using “black box” functions of modern RDBMSs, and some given SCORE functions.

• A notion of what should be done in order to produce the keyword query results.

So, how would you do it???

Page 41: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases41SDBI 05’

Naïve algorithmNaïve algorithm

• The naïve approach: simply issue an SQL query for each CN.

• The results from all the queries are then combined using Sort-Merge-Join.

• Main problem – runtime.

• What characteristic(s) can we use in order to make our algorithm more efficient?

Page 42: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases42SDBI 05’

Naïve algorithm is too slowNaïve algorithm is too slow

• Remember that the IR Engine returns Tuple sets that are ranked in DESCENDING order in respect to the SCORE() function.

• So, when applying COMBINE(Score(T,Q)) for a whole CN, we can get an ESTIMATE of its maximal possible score For CNi (MPSi).

• We can use this knowledge to disregard “unfruitful” CNs!!

Page 43: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases43SDBI 05’

Sparse AlgorithmSparse Algorithm

• For every CNi, compute MPSi.

• If MPSi does not exceed the lowest “best-k” match for the query found so far, DISCARD CNi .

• Otherwise, join tuples in CNi as usual…

• As a further optimization, CNs are evaluated in ASCENDING SIZE order - smaller CNs, are evaluated first, while “heavy” CNs might be discarded after only short calculation steps!

Page 44: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases44SDBI 05’

Sparse algorithm exampleSparse algorithm exampleRemember this database, with the query

Q= {Maxtor, Netvista} ?tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c2p131c31317-3-2002“lower-end IBM Netvista caught fire, starting apparently with disk”

c3p131c31438-3-2002“IBM Netvista unstable with Maxtor HD”

Complaints

tupleIDprodIdmanufac.model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

p3p141“Tripplite”“Smart 700VA”

Products

tupleIDcustIdnameOccupation

u1c3232“John Smith”

“Software engineer”

u2c3131“John L.”“Architect”

u3c3143“Jack M.”“student”

Customers

Page 45: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases45SDBI 05’

Sparse algorithm exampleSparse algorithm example• Suppose we want to find the Top-2 best results

for this query Q={Maxtor, Netvista} on our existing database.

• The CN generator supplies our execution engine with the following Candidate Networks, with M=3:

• We start off with CQ ,let’s take a look:

Page 46: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases46SDBI 05’

Sparse algorithm exampleSparse algorithm exampleCQ consists of all the tuples (with

Different scores, of course):

tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c2p131c31317-3-2002“lower-end IBM Netvista caught fire, starting apparently with disk”

c3p131c31438-3-2002“IBM Netvista unstable with Maxtor HD”

ComplaintsQ

C3 – it’s SCORE is 1.33

C3 – it’s SCORE is 1.33

C2 – it’s SCORE is 0.33

C2 – it’s SCORE is 0.33

C1 – it’s SCORE is 0.33

C1 – it’s SCORE is 0.33

Page 47: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases47SDBI 05’

Sparse algorithm exampleSparse algorithm example• We start off with CQ , no need to calculate

MPS(CQ) – but we do it anyway!

• We already know everything! (We got these exact results from the IR engine!

• We now turn to examine the CN PQ ...

CQ

C3 = 1.33

C1 = 0.33

C2 = 0.33

MPS(CQ)=

1.33 2 BEST RESULTSQUEUE

C3 = 1.33

C1 = 0.33

Page 48: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases48SDBI 05’

Sparse algorithm exampleSparse algorithm exampleThese are the relevant tuples that PQ consists of:

tupleIDprodIdmanufac.model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

p3p141“Tripplite”“Smart 700VA”

ProductsQ

P1 – it’s SCORE is 1

P1 – it’s SCORE is 1

P2 – it’s SCORE is 1

P2 – it’s SCORE is 1

Page 49: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases49SDBI 05’

Sparse algorithm exampleSparse algorithm example• Let’s look at the algorithm function over PQ :

• We calculate MPS(PQ) = 1, so it might still yield some result that can be added to the TOP-K Queue.

• We now turn to examine the CN CQ PQ ...

CQ

C3 = 1.33

C1 = 0.33

C2 = 0.33

MPS(CQ)=

1.33 2 BEST RESULTSQUEUE

C3 = 1.33

C1 = 0.33

PQ

P1 = 1

P2 = 1

MPS(PQ)=

1

P1 = 1

Page 50: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases50SDBI 05’

Sparse algorithm exampleSparse algorithm exampleThese are the joins of CQ PQ:

tupleIDprodIdmanufac.model

p1p121“Maxtor”“D540X”

p2p131“IBM”“Netvista”

ProductsQ

tupleIDprodIDcustIDdatecomment

c1p121c32326-30-2002

“Disk crashed after one week of moderate use on an IBM Netvista X41”

c2p131c31317-3-2002

“lower-end IBM Netvista caught fire, starting apparently with disk”

c3p131c31438-3-2002

“IBM Netvista unstable with Maxtor HD”

ComplaintsQ

C3P2 SCORE: 1.17

C3P2 SCORE: 1.17

C2P2 SCORE: 0.66

C2P2 SCORE: 0.66

C1P1 SCORE: 0.66

C1P1 SCORE: 0.66

Page 51: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases51SDBI 05’

Sparse algorithm exampleSparse algorithm example• Now, we turn to examine CQ PQ ...

• We calculate MPS(CQ PQ ) = (1+1.33) / 2=1.17, so it might still yield some result!

CQ

C3 = 1.33

C1 = 0.33

C2 = 0.33

MPS(CQ)=

1.33 2 BEST RESULTSQUEUE

C3 = 1.33

P1 = 1

PQ

P1 = 1

P2 = 1

MPS(PQ)=

1

MPS (CQ

PQ) = 1.17

CQ PQ

C3P2 = 1.17

C1P1 = 0.67

C2P2 = 0.67 C3P2 = 1.17

Page 52: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases52SDBI 05’

Sparse algorithm exampleSparse algorithm example• Now, we turn to examine CQ P{ } CQ ...

• We calculate MPS(CQ P{ } CQ )= (1.33 + 1.33) / 3 = 0.89 , so we don’t need to calculate this CN! and the same goes for CQ U{ } CQ .

• We’re finished! We return {C3 , C3P2} as results.

CQ

C3 = 1.33

C1 = 0.33

C2 = 0.33

MPS(CQ)=

1.33 2 BEST RESULTSQUEUE

C3 = 1.33

P1 = 1

PQ

P1 = 1

P2 = 1

MPS(PQ)=

1

MPS (CQ

PQ) = 1.17

CQ PQ

C3P2 = 1.17

C1P1 = 0.67

C2P2 = 0.67 C3P2 = 1.17

MPS (CQ U{}PQ) =

0.89

No need

To calc-

ulate!

Page 53: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases53SDBI 05’

Sparse is nice, but…Sparse is nice, but…

• What if there are many possible answers, some of them requiring multiple joins? (Keywords are “hiding” in multiple relations)

• Apparently, the Sparse algorithm becomes (almost) as inefficient as the Naïve algorithm – especially acute in AND queries.

• What plan should we devise now??

• We need to make better use of our architecture!

Page 54: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases54SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

• This Single-Pipelined Algorithm is essentially what we’d like to happen in a SINGLE CN case.

• IT DOES NOT solve the problem in whole

• but…

It’s a great building block for the more sophisticated General-pipelined algorithm!

Page 55: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases55SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

• This algorithm accepts a Candidate Network, The Non-empty tuple-sets TS1…TSk that participate in it.

• Recall TSi corresponds with a relation Ri, that has tuples matching the query keywords (already ordered in descending order according to the SCORE function).

• The Single-Pipelined Algorithm’s output: A stream of joining trees of tuples in descending SCORE order.

Page 56: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases56SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

• We need to keep track of the prefix S(TS) we’ve already retrieved from every tuple set.

• Each iteration, retrieve another tuple t from some TSk, and try to match it against all other tuple sets, to create potential joining trees.

• All the joining trees of tuples T that we’ve found are added to the Queue of results.

• Anyone see a problem here?

Page 57: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases57SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

• Yup, we’re back to the Naïve algorithm, aren’t we?

• Well – not quite!

• In order to guarantee that some result we’ve produced will be in the top-k, we need a similar method to the MPS.

• The MPFSi - Maximum Possible Future Score will be our estimate for the maximum score of any yet “unseen” result from TSi.

Page 58: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases58SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

• We would’ve liked using the status of each prefix S(TSk) to bound the maximum score it can yield from a yet unretrieved tuple:

MPFSi = Max { Score(T,Q) | T ∈ TS1 … ⋈ TSi-1 (⋈ TSi – S(TSi)) … ⋈ TSn }

• This is expensive!

• Instead we produce a cheaper over estimate – MPFS’i – computed as the score of the next tuple from TSi, combined with the top-ranked tuples from every other TS.

Page 59: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases59SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Suppose the algorithm receives a CN with 3 Tuple sets and

three free tuple sets that connect them:

TS3

S(TS3)=∅

MPFS’3=?

TupleIdScore

A13

A22

A31

TS1

S(TS1)=∅TupleIdScore

B19

B23

B31

TS2

S(TS2)=∅

MPFS’1=? MPFS’2=?

MPFS’all = ∅

TupleIdScore

C17

C23

C32

TupleScore

Output Queue

TupleScore

Page 60: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases60SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We want the algorithm to output BEST-6 results!

TS3

S(TS3)=∅

MPFS’3=?

TupleIdScore

A13

A22

A31

TS1

S(TS1)=∅TupleIdScore

B19

B23

B31

TS2

S(TS2)=∅

MPFS’1=? MPFS’2=?

MPFS’all = ∅

TupleIdScore

C17

C23

C32

TupleScore

Output Queue

TupleScore

Page 61: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases61SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

First, we calculate MPFS’i which is similar in every TS in

The beginning.

TS3

S(TS3)=∅

TupleIdScore

A13

A22

A31

TS1

S(TS1)=∅TupleIdScore

B19

B23

B31

TS2

S(TS2)=∅

MPFS’all = ∅

TupleIdScore

C17

C23

C32

TupleScore

Output

TupleScore

MPFS’3=3.16

MPFS’1=3+9+7/6=3.16

MPFS’2=3.16

Page 62: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases62SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Then we compute MPFS’all as the maximum of MPFS’i

TS3

S(TS3)=∅

TupleIdScore

A13

A22

A31

TS1

S(TS1)=∅TupleIdScore

B19

B23

B31

TS2

S(TS2)=∅

TupleIdScore

C17

C23

C32

TupleScore

TupleScore

MPFS’3=3.16

MPFS’1=3+9+7/6=3.16

MPFS’2=3.16

MPFS’all =3.16

Output

Page 63: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases63SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Now, We advance one of the S(TSi), say S(TS1), and

Have to update MPFS’1!

TS3

S(TS3)=∅

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

S(TS2)=∅

TupleIdScore

C17

C23

C32

TupleScore

TupleScore

MPFS’3=3.16

MPFS’1=2+9+7/6=3

MPFS’2=3.16

MPFS’all =3.16

S(TS1)=

Output

Page 64: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases64SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We try to join A1, with all the other tuples in S(TSi), but

There aren’t any.

TS3

S(TS3)=∅

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

S(TS2)=∅

TupleIdScore

C17

C23

C32

TupleScore

TupleScore

MPFS’3=3.16

MPFS’1=2+9+7/6=3

MPFS’2=3.16

MPFS’all =3.16

S(TS1)=

Output

Page 65: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases65SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We advance S(TS2), We also have no luck

getting join results. Now the MPFS’s will be:

TS3

S(TS3)=∅

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

TupleScore

MPFS’3=3.16

MPFS’2=3+3+7/6 = 2.16

MPFS’all =3.16

S(TS1)= S(TS2)=

Output

MPFS’1=3

Page 66: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases66SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We advance S(TS3), this time we’ve managed to join

C1 B⇝ 1 A⇜ 1. (We’re not forgetting to update MPFS’3!) :

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

TupleScore

MPFS’3=3+3+9/6=2.5

MPFS’2=2.16

MPFS’all =3.16

S(TS1)= S(TS2)=

S(TS3)=

Output

MPFS’1=3

Page 67: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases67SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

The SCORE of C1 B1 A1 is 3.16 =MPFS’⇝ ⇜ all, so we

output it!

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16

TupleScore

MPFS’3=2.5

MPFS’2=2.16

MPFS’all =3.16

S(TS1)= S(TS2)=

S(TS3)=

Output

MPFS’1=3

Page 68: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases68SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

But now MPFS’all should reduce!

Remember - it’s equal to the Max{MPFS’i}…

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16

TupleScore

MPFS’3=2.5

MPFS’2=2.16

MPFS’all =3

S(TS1)= S(TS2)=

S(TS3)=

Output

MPFS’1=3

Page 69: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases69SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Now, we turn to advance S(TS1) again…

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16

TupleScore

MPFS’3=2.5

MPFS’2=2.16

MPFS’all =3

S(TS1)= S(TS2)=

S(TS3)=

Output

MPFS’1=3

Page 70: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases70SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Now, we turn to advance S(TS1) again… we have no luck

joining A2, but we update MPFS’1 and MPFS’all …

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16

TupleScore

MPFS’3=2.5

MPFS’1=1+9+7/6=2.83

MPFS’2=2.16

MPFS’all =2.83

S(TS1)=S(TS2)=

S(TS3)=

Output

Page 71: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases71SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We now try join B2, with any other in S(TSi) and succeed!

We find C1 B⇝ 2 A⇜ 2 with score 3+2+7/6 = 2 < MPFS’all!

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16

TupleScore

MPFS’2=1.84

S(TS1)= S(TS2)=

Output

MPFS’1=2.83

MPFS’all =2.83

MPFS’3=2.5

S(TS3)=We keep

C1 B2 A2⇝ ⇜ in

a queue for later

output!

We keep

C1 B2 A2⇝ ⇜ in

a queue for later

output!

Page 72: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases72SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We now try join C2, with any other in S(TSi) and succeed!

We find C2 B⇝ 1 A⇜ 1 with score 3+3+9/6 = 2.5 <MPFS’all!

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16

TupleScore

MPFS’3=2.33

MPFS’2=1.84

S(TS1)= S(TS2)=

S(TS3)=

Output

MPFS’1=2.83

MPFS’all =2.83

We keep

C2 B1 A1⇝ ⇜ in

a queue for later

output!

We keep

C2 B1 A1⇝ ⇜ in

a queue for later

output!

Page 73: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases73SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We go back to S(TS1),

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16

TupleScore

MPFS’3=2.33

MPFS’2=1.84

S(TS1)= S(TS2)=

S(TS3)=

Output

MPFS’1=2.83

MPFS’all =2.83

Page 74: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases74SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We advance S(TS1), And manage to find two joins –

C1 B⇝ 1 A⇜ 3 =1+9+7/6=2.83, which we output!

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83

TupleScore

MPFS’3=2.33

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

Output

MPFS’all =2.83

Page 75: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases75SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We advance S(TS1), And manage to find two joins –

C2 B⇝ 1 A⇜ 3 =1+9+3/6=2.33, which we can’t yet output

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83

TupleScore

MPFS’3=2.33

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

Output

MPFS’all =2.83

We keep

C2 B1 A3⇝ ⇜ in

a queue for later

output!

We keep

C2 B1 A3⇝ ⇜ in

a queue for later

output!

Page 76: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases76SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Now, MPFS’all updates to 2.33, but we already have a result

that can be output from before C2 B⇝ 1 A⇜ 1! (SCORE=2.5)

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

MPFS’3=2.33

MPFS’1=0 MPFS’2=1.84

MPFS’all =2.33

S(TS1)=S(TS2)=

S(TS3)=

Output

Remember

C2 B1 A1⇝ ⇜ ?

It’s now output!

Remember

C2 B1 A1⇝ ⇜ ?

It’s now output!

Page 77: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases77SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

And what about C2 B⇝ 1 A⇜ 3, with score 2.33?

Well, it’s time for it to be output also!

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

C2B1A12.33

MPFS’3=2.33

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

MPFS’all =2.33

Output

Now

C2 B1 A3⇝ ⇜

Is also output!

Now

C2 B1 A3⇝ ⇜

Is also output!

Page 78: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases78SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Now, let’s advance S(TS3).

CAN ANYONE GUESS WHY NOT S(TS2)?

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

C2B1A12.33

MPFS’3=2.33

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

MPFS’all =2.33

Output

Page 79: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases79SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Now, let’s advance S(TS3).

It has the biggest MPFSi – most likely to yield results...!

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

C2B1A12.33

MPFS’3=2.33

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

MPFS’all =2.33

Output

Page 80: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases80SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

We advance S(TS3), And manage to find two joins –

C3 B1 A2 =2+9+2/6=2.16, which we can’t yet output⇝ ⇜

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

C2B1A12.33

MPFS’3= 0

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

MPFS’all =2.33

Output

Page 81: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases81SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

But – with MPFS3=0, we have to update MPFS’all , so turnsout we can output C3 B⇝ 1 A⇜ 2 after all…

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

C2B1A12.33C3B1A22.16

MPFS’3= 0

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

MPFS’all =1.84

Output

Page 82: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases82SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

Also, remember C1 B⇝ 2 A⇜ 2 with SCORE=2? Its time

has come to be output!

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

C2B1A12.33C3B1A22.16

C1B2A2 2MPFS’3= 0

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

MPFS’all =1.84

Output

Now

C1 B2 A2⇝ ⇜

Is also output!

Now

C1 B2 A2⇝ ⇜

Is also output!

Page 83: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases83SDBI 05’

The Single-pipelined algorithmThe Single-pipelined algorithm

Some

Free Relations

R{} ,Q{} ,P{}

That’s it, we’re done!

Phew….

TS3

TupleIdScore

A13

A22

A31

TS1

TupleIdScore

B19

B23

B31

TS2

TupleIdScore

C17

C23

C32

TupleScore

C1B1A13.16C1B1A32.83C2B1A12.5

TupleScore

C2B1A12.33C3B1A22.16

C1B2A2 2MPFS’3= 0

MPFS’1=0 MPFS’2=1.84

S(TS1)=S(TS2)=

S(TS3)=

MPFS’all =1.84

Output

Page 84: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases84SDBI 05’

In the common case…In the common case…

• This algorithm would output the best results of the specific CN quickly

• And will save time by not touching non-promising TSs!

• In our example it didn’t really happen (only the last tuple from TS3 was untouched)…

Page 85: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases85SDBI 05’

The General-pipelined algorithmThe General-pipelined algorithm

• As mentioned before, the Single Pipelined algorithm (that operates on a SINGLE CN) does not solve the whole problem.

• However, a concurrent approach using the single algorithm might!

• This is exactly the idea behind the general-pipelined algorithm:

Page 86: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases86SDBI 05’

The General-pipelined algorithmThe General-pipelined algorithm

• The General pipelined algorithm evaluates concurrently all the CNs, using a priority preemptive, round-robin protocol.

• What’s the priority of each CNi? MPFS’i !

• Also, a result will only be output once its score is higher than GMPFS’ - the maximal value of the current set of MPFS’s.

Page 87: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases87SDBI 05’

The General-pipelined algorithmThe General-pipelined algorithm

CN5

CN1

CN5

CN3

CN2

CN Queue ordered by ascending MPFS

Execution

engine

Output to user

TupleScore

B1C34.22

C1A27

A13

Queue of

Future(?)

Results

Page 88: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases88SDBI 05’

The Hybrid algorithmThe Hybrid algorithm

• The hybrid algorithm simply combines the power of the two most successful algorithms

• It estimates the number of results that would be for a query

• If expecting “few” results, it runs the Sparse algorithm.

• In any other case - it runs the General Pipelined algorithm!

Page 89: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases89SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 90: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases90SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 91: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases91SDBI 05’

RuntimeRuntime• All the algorithms were run through a series of

runtime tests.

• the tests used the DBLP data set translated to relations (Conferences, Papers, Citations…) The tests consisted of some one parameter (I.E. Query size) while others are constant .

• Different tests for AND and OR semantics.

• Also, sometimes use two modified algorithms:– SASymmetric - Single pipelined with round-robin

– GASymmetric - General pipelined with round-robin

Page 92: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases92SDBI 05’

Maximal CN size (OR)Maximal CN size (OR)

• This test evaluates M, the maximal CN size.

Page 93: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases93SDBI 05’

Maximal CN size (AND)Maximal CN size (AND)

• Clearly, bigger M’s have greater impact using AND queries (Why?).

Page 94: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases94SDBI 05’

Number of keywords (OR)Number of keywords (OR)

Page 95: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases95SDBI 05’

Number of keywords (AND)Number of keywords (AND)

Page 96: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases96SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 97: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases97SDBI 05’

ContentsContents

Introduction

Goal and Motivation

Framework and examples

Architecture

Algorithms

Experimental Results

Criticism and Conclusion

Page 98: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases98SDBI 05’

CriticismCriticism• Runtime is not clearly stated in the article (For

a reason!)

• Effected heavily by query size! for |Q|>4, most queries will take a lot of time!

• The same goes for M>6…

• The system is a bit “platform-dependant”… prone to future RDBMS policy changes…

Page 99: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases99SDBI 05’

ConclusionConclusion• Today we’ve discussed a method for using IR-

Style keyword search over relational databases:

– Motivations for such searches

– An Architecture that can achieve such goal

– Several algorithms, in varying efficiencies, that can issue results.

– Experimental results that allow better evaluation of runtime.

Page 100: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

100

Thank You!Thank You!

…Questions?

Phew!...

Page 101: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases102SDBI 05’

DISCOVER – original ArchitectureDISCOVER – original Architecture

Page 102: 1 Efficient IR-Style Keyword Search over Relational Databases 12 December 2005 Databases and the Internet Seminar on Databases and the Internet The Hebrew

Efficient IR-Style Keyword Search over Relational Databases103SDBI 05’

DISCOVER – original ArchitectureDISCOVER – original Architecture