claudio gennaro isdsi 2009 1 query processing in a mediator system for data and multimedia d....

50
Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator Query Processing in a Mediator System for Data and Multimedia System for Data and Multimedia D. Beneventano D. Beneventano 1 , , C. Gennaro C. Gennaro 2 , M. Mordacchini , M. Mordacchini 2 , , R. R. Carlos Nana Mbinkeu Carlos Nana Mbinkeu 1 1 DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy 2 ISTI – CNR, via Moruzzi 1, Pisa, Italy ISTI – CNR, via Moruzzi 1, Pisa, Italy

Upload: ursula-foster

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 1

Query Processing in a Mediator Query Processing in a Mediator System for Data and MultimediaSystem for Data and Multimedia

D. BeneventanoD. Beneventano11, , C. GennaroC. Gennaro22, M. Mordacchini, M. Mordacchini22, , R. R. Carlos Nana MbinkeuCarlos Nana Mbinkeu11

11DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, ItalyDII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy22ISTI – CNR, via Moruzzi 1, Pisa, ItalyISTI – CNR, via Moruzzi 1, Pisa, Italy

Page 2: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 2

OutlineOutline

1. Motivation2. The system and scenario overview3. Querying an ontology of data and multimedia sources

• mapping• Query unfolding for multimedia conditions• ranking

4. Conclusion and future work

Page 3: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 3

MotivationMotivation

• We proposed a method for building a populated domain ontology representative of a set of web data sources.

• The method exploits the capabilities of a mediator system (MOMIS) to create an integrated view of a set of data sources,

• i.e. a domain ontology schema, and a set of annotations linking data to the integrated view.

• We extend that approach with multimedia sources, thus obtaining a methodology for building and querying an ontology representing data and multimedia sources.

1.There are several use cases where applications interact with ontologies of data and multimedia sources.

2.Multimedia and data sources are usually represented with different models. No standard for representing at the same time data and multimedia sources has been adopted by large communities.

3.Different languages and different interfaces for querying “traditional” and “multimedia” data sources have been developed. The formers rely on expressive languages allowing expressing selection clauses, the latters typically implement similarity search techniques for retrieving multimedia documents similar to the ones provided by the user.

Page 4: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 4

Managing a Semantic Peer: MOMIS + MILOS

NeP4B Semantic Peer

provides a unified access to different data sources referring to the same domain by means of a Semantic Peer Data Ontology (SPDO) of the data i.e. a common representation of all the data sources belonging to the peer.

provides a unified access to different data sources referring to the same domain by means of a Semantic Peer Data Ontology (SPDO) of the data i.e. a common representation of all the data sources belonging to the peer.

MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources

MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sourcesMILOS is a general purpose Multimedia Content Management System•Manages and serves any multimedia documents•Manages any metadata of documents

MILOS is a general purpose Multimedia Content Management System•Manages and serves any multimedia documents•Manages any metadata of documents

Page 5: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 5

Data and Multimedia Sources (DMSs)Data and Multimedia Sources (DMSs)

• Data and Multimedia Source (DMS) is an object oriented database of metadata objects describing a collection of multimedia documents (such as images, videos, etc.) represented with a schema defined in ODLI

3

• The DMS schema includes , in general, a set of standard attributes declared using standard predefined ODLI3 types, such as string, double, integer, etc, supporting selection predicates typical of structured and semi-structured data, such as =, <, >, . . .

• And multimedia attributes, LMS includes another set of special attributes, declared by means of special predefined classes in ODLI3 which support similarity based searches (Full text search, image similarity, geographical search, etc.)

Page 6: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 6

A sample scenarioA sample scenario

CITY

Name: stringZip: stringCountry : stringSurface : numberPopulation : numberPhoto :ImageDescription : TextLocation : GeoCoord

HOTEL

Name: stringTelephone : stringFax: stringAddress : stringwww: stringRoom_num: numberPrice: numberCity: string [FK]Stars : numberFree_wifi: booleanPhoto : ImageDescription : Text

RESTAURANT

Name: stringAddress : stringCity: string [FK]Email : stringTelephone : stringHoliday: stringWeb-site: stringSpeciality : stringRank: numberPrice_avg: number

EVENT

Name: stringCategory : stringDetails: stringPrice: numberCity: string [FK]Poster : Image

Page 7: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 7

A sample scenarioA sample scenario

CITY

Name: stringZip: stringCountry : stringSurface : numberPopulation : numberPhoto :ImageDescription : TextLocation : GeoCoord

HOTEL

Name: stringTelephone : stringFax: stringAddress : stringwww: stringRoom_num: numberPrice: numberCity: string [FK]Stars : numberFree_wifi: booleanPhoto : ImageDescription : Text

RESTAURANT

Name: stringAddress : stringCity: string [FK]Email : stringTelephone : stringHoliday: stringWeb-site: stringSpeciality : stringRank: numberPrice_avg: number

EVENT

Name: stringCategory : stringDetails: stringPrice: numberCity: string [FK]Poster : Image

Page 8: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 8

A sample scenarioA sample scenario

CITY

Name: stringZip: stringCountry : stringSurface : numberPopulation : numberPhoto :ImageDescription : TextLocation : GeoCoord

HOTEL

Name: stringTelephone : stringFax: stringAddress : stringwww: stringRoom_num: numberPrice: numberCity: string [FK]Stars : numberFree_wifi: booleanPhoto : ImageDescription : Text

RESTAURANT

Name: stringAddress : stringCity: string [FK]Email : stringTelephone : stringHoliday: stringWeb-site: stringSpeciality : stringRank: numberPrice_avg: number

EVENT

Name: stringCategory : stringDetails: stringPrice: numberCity: string [FK]Poster : Image

Page 9: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 9

Quering DMSsQuering DMSs

• A DMS Mi can be queried using an extension of standard SQL-like syntax SELECT clause. The WHERE clause consists of a conjunctive combination of predicates on the single standard attributes of Mi, as in the following:

• ORDER BY + LIMIT K, specify in practice a top-k similarity query

SELECT Mi.Ak,…, Mi.Sl,…FROM Mi

WHERE Mi.Ax op1 val1AND Mi.Ay op2 val2...ORDER BY Mi.Sw(Q1), Mi.Sz(Q2),…LIMIT K

Page 10: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 10

Quering DMSsQuering DMSsinterface city() {

// standard attributes

attribute string Name;

attribute string Zip;

attribute string Country;

attribute integer Surface;

attribute integer Population;

// similarity attributes

attribute Image Photo;

attribute Text Description;

attribute GeoCoord GeoPosition,

}

// query example

SELECT Name

FROM city

WHERE Country = "Italy“

ORDER BY Photo("http://www.flickr.com/32e324e.jpg"),

GeoCoord(40.25, 14.32),

Description("sea mozzarella pizza")

LIMIT 100

This query tries to find among all Italian cities the ones that best match the image given as example, the textual description, and are nearest as possible to the geographical point of location 40.25N, 14.32E.

Page 11: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 11

DMS: AssumptionsDMS: Assumptions

• Since we would like to build a general purpose framework, we make the following assumptions:

•The way by which the returned objects are ordered is not known (black box);

•The DMS does not return scores associated with the objects indicating the relevance of them with respect to the query;

•If no ORDER BY clause is specified, DMS will return the records sorted in random order.

Page 12: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 12

Representing the SPDORepresenting the SPDO

• We build a conceptualization of a set of DMSs, composed of global classes and global attributes and mappings between the SPDO and the DMS schemata,

Page 13: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 13

MappingMapping

• The query is defined in a semiautomatic way as follows:– A Mapping Table (MT) is specified for each global class G, whose columns represent the n local classes M1,… ,Mn belonging to G and whose rows represent the h global attributes of G. Multimedia attributes can be mapped only onto Global multimedia attributes of the same type.

– Join Conditions are defined between pairs of local classes belonging to G and allow the system to identify instances of the same real-world object in different sources.

Page 14: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 14

Example of mappingExample of mapping

Hotel resort hotelname (join) Name denominationtelephone Telephone telfax Fax faxwww Web-site wwwroom_num Room_num roomsprice (RF) Price_avg mean_pricecity City locationstars Stars –free_wifi – free_wifiphoto Photo imgdescription Description commentary

Page 15: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 15

MappingMapping

– Resolution Functions are introduced to solve data conflicts of local attribute values associated to the same real-world object. In our framework we consider and implement some of such resolution functions, in particular, the PREFERRED function, which takes the value of a preferred source and the RANDOM function, which takes a random value.

– For what concern the multimedia attributes, we introduce a new resolution function, called MOST_SIMILAR, which returns the multimedia objects most similar to the one expressed in the query (if any).

Page 16: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 16

• Given a global class G with m attributes of which k multimedia attributes, denoted by G.S1,…,G.Sk (as photo and description in the class Hotel) and h standard attributes, denoted by G.A1,…,G.Ah, a query on G (global query) is a conjunctive query, expressed in a simple abstract SQL-like syntax as:SELECT G.Al,…,G.Sj

FROM GWHERE G.Ax op1 val1

AND G.Ay op2 val2

...ORDER BY G.Sw(Q1), …, G.Sz(Q2)

LIMIT K

Query the SPDOQuery the SPDO

Page 17: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 17

Query unfoldingQuery unfolding

• To answer a global query on G, the query must be rewritten as an equivalent set of queries (local queries) expressed on the local classes L(G) belonging to G.

• the query rewriting is performed by means of query unfolding, which consists of the following four steps:1. Computation of Local Query conditions2. Computation of Residual Conditions3. Fusion of local answers4. Application of the Residual Condition

Page 18: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 18

Query Fusion: RankingQuery Fusion: Ranking

• Why?– Modern multimedia content managers typically return multimedia objects (i.e., which support similarity) in decreasing order of relevance, that is, so that the “best” answers are on the top;

– we want to preserve this knowledge at global level;

– However, since we cannot exploit scores we use the rank as indicator of the relevance of the record returned.

Page 19: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 19

Ranking the resultsRanking the results

• our problem falls into the category of the partial rank aggregation problems, in which we merge top-k lists rather than fully ranked lists,

• We use a simple but yet effective aggregation function for ordinal ranks is the median function: – The score of an object its median position in all the returned lists.

• The median function is demonstrated by Fagin et al., to be near-optimal, even for top-k or partial lists.

• The algorithm MEDRANK is based on median rank aggregation

Page 20: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 20

The MedRank algorithmThe MedRank algorithm

R1

1 A

2 B

3 C

4 D

R2

1 B

2 A

3 D

4 C

R3

1 B

2 C

3 A

4 D

R

1

2

3

4

• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated ranking

Page 21: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 21

The MedRank algorithmThe MedRank algorithm

• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated rankingR1

1 A

2 B

3 C

4 D

R2

1 B

2 A

3 D

4 C

R3

1 B

2 C

3 A

4 D

R

1 B

2

3

4

Page 22: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 22

The MedRank algorithmThe MedRank algorithm

• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated rankingR1

1 A

2 B

3 C

4 D

R2

1 B

2 A

3 D

4 C

R3

1 B

2 C

3 A

4 D

R

1 B

2 A

3

4

Page 23: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 23

The MedRank algorithmThe MedRank algorithm

• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated rankingR1

1 A

2 B

3 C

4 D

R2

1 B

2 A

3 D

4 C

R3

1 B

2 C

3 A

4 D

R

1 B

2 A

3 C

4

Page 24: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 24

The MedRank algorithmThe MedRank algorithm

R1

1 A

2 B

3 C

4 D

R2

1 B

2 A

3 D

4 C

R3

1 B

2 C

3 A

4 D

R

1 B

2 A

3 C

4 D

• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated ranking

Page 25: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 25

ExampleExample

• We would like to found image about the “Arch of Triumph of Rome by night”.

• and we assume to have two DMSs containing images of monuments in the world, the first DMS1 with geographical coordinates search capabilities, and the second one DMS2 with image similarity search capabilities;

Page 26: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 26

ExampleExample

G DMS1 DMS2URL (join) url url_addressSubject subject typeImg - imgGeoCoord GeoCoord -

Page 27: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 27

SELECT …FROM DMS1WHERE subject=“Monument” ORDER BY GeoCoord(41°53'43.68"N, 12°28'56.34"E )STOP AFTER 5

ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E

Dist = 1km

Dist = 1km

Dist = 1km

Dist = 1km

Dist = 2km

Roma. Palazzo della Civiltà del Lavoro. EUR

Unfortunately if I just for geo coordinates giving the coordinates of Rome as input I found a lot of images of the Colosseum

Page 28: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 28

SELECT …FROM DMS2WHERE type=“Monument”ORDER BY Img(URL),STOP AFTER 5

ORDER BY Image(URL)

Roma. Palazzo della Civiltà del Lavoro. EUR

And if I just search for similarity an image of the “Arch of Triumph of Rome by night” I found a lot of images about the Arch of Triumph of Paris, which is very similar but more famous.

Page 29: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 29

SELECT …FROM WorldMonumentsWHERE Subject=“Monument”ORDER BY Img(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E )STOP AFTER 5

ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E

ORDER BY Image(URL)

Dist = 1km

Dist = 1km

Dist = 1km

Dist = 1km

Dist = 2km

Roma. Palazzo della Civiltà del Lavoro. EUR

Page 30: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 30

Conclusion and future workConclusion and future work

• We presented a methodology implemented in a tool that allows a user to create and query an integrated view of data and multimedia sources.

• Future work will be devoted to experiment the tool in real scenarios. In particular, our tool will be exploited for integrating business catalogs related to the area of “tiles”. – We think that such data may provide useful test cases because of the need of connecting data about the features of the tiles with their images.

Page 31: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 31

THE ENDTHE END

Page 32: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 32

Building the Data Ontology: Building the Data Ontology: MOMISMOMIS

MOMIS* (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources

Semantic Integration of Information• A common data model ODLI3 (derived from ODL-ODMG and I3) & mapped

into OLCD description logicsTool-supported techniques to construct the Global Virtual View

(GVV)• Local sources wrapping• Local Schema Annotation w.r.t. a common lexical ontology (WordNet)• Semi-automatic discovery of relationships between local schemata• Clustering techniques to build the GVV & mappings between the GVV

and local schemata (Mapping Table)• automatic GVV Annotation w.r.t. a common lexical ontology & OWL

exportationGlobal Query Management• Including services and multimedia data sources

25/03/2008 32

D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini: "Synthesizing an Integrated Ontology ", IEEE Internet Computing Magazine, September-October 2003,42-51.

S. Bergamaschi, S. Castano, M. Vincini "Semantic Integration of Semistructured and Structured Data Sources", SIGMOD Record Special Issue on Semantic Interoperability in Global Information, Vol. 28, No. 1, March 1999.

*

Page 33: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 33

MOMIS architecture

SYNSET2

SYNSET#

SYNSET4

SYNSET1

MANUALANNOTATION

SEMI-AUTOMATICANNOTATION

INFERRED RELATIONSHIPS

LEXICON DERIVEDRELATIONSHIPS

SCHEMA DERIVEDRELATIONSHIPS

CommonThesaurus

COMMON THESAURUSGENERATION

USER SUPPLIEDRELATIONSHIPS

ODLI3LOCAL SCHEMA N

WRAPPING

ODLI3LOCAL SCHEMA 1

GVV GENERATION

MAPPING TABLES

GLOBAL CLASSES

Page 34: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 34

Mapping definition in MOMISMapping definition in MOMIS

Mappings among a Global Class G of the GVV and its local classes are represented by a Mapping Table

Global-as-View (GAV) mappings: for each global class G a view VG over the local classes of G

is defined by a Full-Join Merge Operator: • Outer Join : to include into the result all tuples of all

local sources• Merge : to perform data reconciliation (Resolution

functions)

L1.resort L2.Hotel

name(join) Name denominationcity City locationstars Stars free_wifi free_wifiprice Price_avg mean_price

Page 35: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 35

Building the Mappings: an exampleBuilding the Mappings: an example

L1.resort L2.hotel

name(join) Name denominationcity City location stars Stars free_wifi free_wifiprice Price_avg mean_price

from T_L1 outer join T_L2

DollarEuro(mean_price)

Data Conversion Functions

on (T_L1.Name = T_L2.denomination)

Join Attribute

Join Conditions

FullJoin

Select name, avg(T_L1.price_avg, T_L2.mean_price) as price, T_L1.Stars, …

Resolution Functions

avg(L1,L2)

FullJoinMerge

Mapping Table of the global Class Hotel = {L1.resort, L2.hotel}

Page 36: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 36

Global Query ManagementGlobal Query Management The querying problem:

How to answer queries expressed on the GS (global queries)?

In a Virtual Data Integration system, data reside at the data sources then the query processing is based on Query rewriting : to rewrite a global query as an equivalent set of queries expressed on the local schemata data sources (local queries).

GAV approach: query rewriting is performed by unfolding, i.e. by expanding a global query on G according to the view associated to G

Query Optimization Techniques for the Full-Join Merge Operator

Motivation :

1. full outer join queries are very expensive, especially in a distributed environment

2. only limited optimization is performed on full outer join

Page 37: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 37

An example of Full-Join Merge An example of Full-Join Merge OptmizationOptmization

SELECT * FROM G WHERE city LIKE "%Modena%" AND price < 200

Apply resolution functions: price =AVG()

Apply residual constraints : price < 200

Result

LQ1= SELECT * FROM L1 WHERE City LIKE "%Modena%"

LQ2= SELECT * FROM L2 WHERE location LIKE "%Modena%"

LQ1 FULL JOIN LQ2

AND stars = 4

AND stars = 4

RIGHT JOIN

AND free_wifi = true

AND free_wifi = true

INNER JOIN

Page 38: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 38

MILOS

Metadata Editor:Visual Basic (SOAP Comm.)

Repository Metadata Integrator:

Access to documentsAccess to metadataMetadata indepence(SOAP Web Service)

MultiMedia doc. serv.:Allows homoneous acces to heterogeneous media

(SOAP Web Service)

XML Search Engine:Structure searchFielded searchFull text searchMultimedia searchSchema independentXQuery support(SOAP Web Service)

Metadata independence:The schema seen in the

interface logic can be different of the one(s) used in

the repository

Retrieval Interface:JSP(SOAP Comm.)

Page 39: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 39

MILOS (2)

• The MILOS system is based on a three–tier distributed architecture:

• Client tier This is the top most level of the system. It contains client application that interacts with MILOS and that displays results to user applications.

• Business logic It manages query processing by integrating and aligning information stored in the databases. It performs reconciliation of retrieved data by managing ranking.

• Data tier It is composed of the Large Object Database, that physically stores multimedia documents managed by the system and the metadata database, where all metadata associated with the multimedia items are stored.

• Multimedia metadata are represented in the data tier in XML formats. MILOS adopts a native XML database, which supports XML query language standards and offers advanced search and indexing functionality on arbitrary XML documents.

• MILOS XML database provides full–text search, automatic classification, and feature similarity search functionalities.

• the Large Object Database permits clients of MILOS to deal with multimedia in an uniform way.

Page 40: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 40

The MedRank algorithmThe MedRank algorithm

• Whenever there are multiple multimedia attributes strange side effects can affect the precision of the answer.

• Example:

– Suppose we have two image database consisting of monument images. • MS1: provides image similarity and geografic coordinates• MS2: provides only image similarity

– The query consists of a sample image and a point coordinates

Page 41: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 41

SELECT …FROM WorldMonuments

ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E )STOP AFTER 5

ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E

ORDER BY Image(URL)

Dist = 1km

Dist = 1km

Dist = 1km

Dist = 1km

Dist = 2km

Roma. Palazzo della Civiltà del Lavoro. EUR

Page 42: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 42

DMS: AssumptionsDMS: Assumptions

• The rationale of the above assumptions is that our aim is to work in a general environment with heterogeneous DMSs for which we do not have any knowledge of their scoring functions. – The motivation is that the final scores themselves are often the result

of the contributions of the scores of each attribute. A scoring function is therefore usually defined as an aggregation over partial heterogeneous scores (e.g., the relevance for text-based IR with keyword queries, or similarity degrees for color and texture of images in a multimedia database).

– Even in the simpler case of single multimedia attributes the knowledge of the scores become meaningless outside the context in which they are evaluated. As an example consider the TF * IDF scoring function used by normal text search engines. The score of a document depends upon the collection statistics and search engines could use different scoring algorithms.

• However, the above assumptions of considering a local DMS as a black box that does not return any score associated to result elements, do not presume that local DMSs do not use internally scoring functions for combing different multimedia attributes .– Typically modern multimedia systems use fuzzy logic to aggregate scores

of different multimedia attributes that are graded in the interval [0,1]. Classical examples of thesefunctions are the min and mean functions.

Page 43: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 43

• Each atomic predicate Pi and similarity predicate in the global query are rewritten into corresponding constraints supported by the local classes.

• For example, the constraints stars = 3 is translated into a constrain Stars = 3 considering the local class resort and is not translated into any constraint considering the local class hotel.

Computation of Local Query conditionsComputation of Local Query conditions

Page 44: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 44

Computation of Residual ConditionsComputation of Residual Conditions

• Conditions on not homogeneous standard attributes cannot be translated into local conditions: they are considered as residual and have to be solved at the global level.

Page 45: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 45

Computation of Residual ConditionsComputation of Residual Conditions

• for multimedia attribute we use the MOST_SIMILAR. For example, suppose we are searching for images similar to one specified in the query by means of ’ORDER BY’ clause. If we retrieve two or more multimedia objects with one or more corresponding images, MOST_SIMILAR function will simply select the image that is more similar to the query image.

• However since we do not know scores, how do we evaluate similarity?

Page 46: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 46

Computation of Residual ConditionsComputation of Residual Conditions

• Rank Based Similarity: • we simply exploit the rank of the objects in the

returned list as indicator of similarity between the attributes values belonging to the objects.

• This aspect is related with the problem of the fusion

Page 47: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 47

Fusion of local answersFusion of local answers

• For each local source involved in the global query, a local query is generated and executed on the local sources. The local answers are fused into the global answer on the basis of the mapping query qG defined for G, i.e. by using the Full Outerjoin-merge (FOJ) operation.– Computation of the full outer join of local answers (FOJ). The result of this operation is ordered on the basis of the multimedia attributes specified in the query, this aspect is deeply examined in the next Slide.

– Application of the Resolution Functions : for each attribute GA of the global query the related Resolution Function is applied to FOJ

Page 48: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 48

Ranking the resultsRanking the results

• In principle, if we had ALL the (fused) records of the result set we can exploit an optimal rank aggregation method based on a distance measure to quantify the disagreements among different rankings.

• In this respect the overall ranking is the one that has minimum distance to the different rankings obtained from different sources.

• Several different distance measures are available in literature. However, the difficult of solving the problem of distance-based rank aggregation is related to the choice of the distance measure and its corresponding complexity that can be even NP-Hard in some cases (see Kendall distance).

• However, fortunately, our case falls into this category of the partial rank aggregation problems, in which we measures the distance between only the top-k lists rather than fully ranked lists.

Page 49: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 49

ExampleExample11

R1

1 A

2 B

3 C

4 D

R2

1 B

2 A

3 D

4 C

R3

1 B

2 C

3 A

4 D

A: ( 1 , 2 , 3 )B: ( 1 , 1 , 2 )C: ( 3 , 3 , 4 )D: ( 3 , 4 , 4 )

R

1 B

2 A

3 C

4 D

1 http://www.cs.helsinki.fi/u/tsaparas/InformationNetworks/lectures/lecture10.ppt

Page 50: Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos

Claudio GennaroISDSI 2009 50

Combining rankingsCombining rankings

• In many cases the scores are not known– e.g. meta-search engines – scores are proprietary information

• … or we do not know how they were obtained– one search engine returns score 10, the other 100. What does this mean?

• … or the scores are incompatible– apples and oranges: does it make sense to combine price with distance?

• In this cases we can only work with the rankings