claudio gennaro isdsi 2009 1 query processing in a mediator system for data and multimedia d....
TRANSCRIPT
Claudio GennaroISDSI 2009 1
Query Processing in a Mediator Query Processing in a Mediator System for Data and MultimediaSystem for Data and Multimedia
D. BeneventanoD. Beneventano11, , C. GennaroC. Gennaro22, M. Mordacchini, M. Mordacchini22, , R. R. Carlos Nana MbinkeuCarlos Nana Mbinkeu11
11DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, ItalyDII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy22ISTI – CNR, via Moruzzi 1, Pisa, ItalyISTI – CNR, via Moruzzi 1, Pisa, Italy
Claudio GennaroISDSI 2009 2
OutlineOutline
1. Motivation2. The system and scenario overview3. Querying an ontology of data and multimedia sources
• mapping• Query unfolding for multimedia conditions• ranking
4. Conclusion and future work
Claudio GennaroISDSI 2009 3
MotivationMotivation
• We proposed a method for building a populated domain ontology representative of a set of web data sources.
• The method exploits the capabilities of a mediator system (MOMIS) to create an integrated view of a set of data sources,
• i.e. a domain ontology schema, and a set of annotations linking data to the integrated view.
• We extend that approach with multimedia sources, thus obtaining a methodology for building and querying an ontology representing data and multimedia sources.
1.There are several use cases where applications interact with ontologies of data and multimedia sources.
2.Multimedia and data sources are usually represented with different models. No standard for representing at the same time data and multimedia sources has been adopted by large communities.
3.Different languages and different interfaces for querying “traditional” and “multimedia” data sources have been developed. The formers rely on expressive languages allowing expressing selection clauses, the latters typically implement similarity search techniques for retrieving multimedia documents similar to the ones provided by the user.
Claudio GennaroISDSI 2009 4
Managing a Semantic Peer: MOMIS + MILOS
NeP4B Semantic Peer
provides a unified access to different data sources referring to the same domain by means of a Semantic Peer Data Ontology (SPDO) of the data i.e. a common representation of all the data sources belonging to the peer.
provides a unified access to different data sources referring to the same domain by means of a Semantic Peer Data Ontology (SPDO) of the data i.e. a common representation of all the data sources belonging to the peer.
MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources
MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sourcesMILOS is a general purpose Multimedia Content Management System•Manages and serves any multimedia documents•Manages any metadata of documents
MILOS is a general purpose Multimedia Content Management System•Manages and serves any multimedia documents•Manages any metadata of documents
Claudio GennaroISDSI 2009 5
Data and Multimedia Sources (DMSs)Data and Multimedia Sources (DMSs)
• Data and Multimedia Source (DMS) is an object oriented database of metadata objects describing a collection of multimedia documents (such as images, videos, etc.) represented with a schema defined in ODLI
3
• The DMS schema includes , in general, a set of standard attributes declared using standard predefined ODLI3 types, such as string, double, integer, etc, supporting selection predicates typical of structured and semi-structured data, such as =, <, >, . . .
• And multimedia attributes, LMS includes another set of special attributes, declared by means of special predefined classes in ODLI3 which support similarity based searches (Full text search, image similarity, geographical search, etc.)
Claudio GennaroISDSI 2009 6
A sample scenarioA sample scenario
CITY
Name: stringZip: stringCountry : stringSurface : numberPopulation : numberPhoto :ImageDescription : TextLocation : GeoCoord
HOTEL
Name: stringTelephone : stringFax: stringAddress : stringwww: stringRoom_num: numberPrice: numberCity: string [FK]Stars : numberFree_wifi: booleanPhoto : ImageDescription : Text
RESTAURANT
Name: stringAddress : stringCity: string [FK]Email : stringTelephone : stringHoliday: stringWeb-site: stringSpeciality : stringRank: numberPrice_avg: number
EVENT
Name: stringCategory : stringDetails: stringPrice: numberCity: string [FK]Poster : Image
Claudio GennaroISDSI 2009 7
A sample scenarioA sample scenario
CITY
Name: stringZip: stringCountry : stringSurface : numberPopulation : numberPhoto :ImageDescription : TextLocation : GeoCoord
HOTEL
Name: stringTelephone : stringFax: stringAddress : stringwww: stringRoom_num: numberPrice: numberCity: string [FK]Stars : numberFree_wifi: booleanPhoto : ImageDescription : Text
RESTAURANT
Name: stringAddress : stringCity: string [FK]Email : stringTelephone : stringHoliday: stringWeb-site: stringSpeciality : stringRank: numberPrice_avg: number
EVENT
Name: stringCategory : stringDetails: stringPrice: numberCity: string [FK]Poster : Image
Claudio GennaroISDSI 2009 8
A sample scenarioA sample scenario
CITY
Name: stringZip: stringCountry : stringSurface : numberPopulation : numberPhoto :ImageDescription : TextLocation : GeoCoord
HOTEL
Name: stringTelephone : stringFax: stringAddress : stringwww: stringRoom_num: numberPrice: numberCity: string [FK]Stars : numberFree_wifi: booleanPhoto : ImageDescription : Text
RESTAURANT
Name: stringAddress : stringCity: string [FK]Email : stringTelephone : stringHoliday: stringWeb-site: stringSpeciality : stringRank: numberPrice_avg: number
EVENT
Name: stringCategory : stringDetails: stringPrice: numberCity: string [FK]Poster : Image
Claudio GennaroISDSI 2009 9
Quering DMSsQuering DMSs
• A DMS Mi can be queried using an extension of standard SQL-like syntax SELECT clause. The WHERE clause consists of a conjunctive combination of predicates on the single standard attributes of Mi, as in the following:
• ORDER BY + LIMIT K, specify in practice a top-k similarity query
SELECT Mi.Ak,…, Mi.Sl,…FROM Mi
WHERE Mi.Ax op1 val1AND Mi.Ay op2 val2...ORDER BY Mi.Sw(Q1), Mi.Sz(Q2),…LIMIT K
Claudio GennaroISDSI 2009 10
Quering DMSsQuering DMSsinterface city() {
// standard attributes
attribute string Name;
attribute string Zip;
attribute string Country;
attribute integer Surface;
attribute integer Population;
// similarity attributes
attribute Image Photo;
attribute Text Description;
attribute GeoCoord GeoPosition,
}
// query example
SELECT Name
FROM city
WHERE Country = "Italy“
ORDER BY Photo("http://www.flickr.com/32e324e.jpg"),
GeoCoord(40.25, 14.32),
Description("sea mozzarella pizza")
LIMIT 100
This query tries to find among all Italian cities the ones that best match the image given as example, the textual description, and are nearest as possible to the geographical point of location 40.25N, 14.32E.
Claudio GennaroISDSI 2009 11
DMS: AssumptionsDMS: Assumptions
• Since we would like to build a general purpose framework, we make the following assumptions:
•The way by which the returned objects are ordered is not known (black box);
•The DMS does not return scores associated with the objects indicating the relevance of them with respect to the query;
•If no ORDER BY clause is specified, DMS will return the records sorted in random order.
Claudio GennaroISDSI 2009 12
Representing the SPDORepresenting the SPDO
• We build a conceptualization of a set of DMSs, composed of global classes and global attributes and mappings between the SPDO and the DMS schemata,
Claudio GennaroISDSI 2009 13
MappingMapping
• The query is defined in a semiautomatic way as follows:– A Mapping Table (MT) is specified for each global class G, whose columns represent the n local classes M1,… ,Mn belonging to G and whose rows represent the h global attributes of G. Multimedia attributes can be mapped only onto Global multimedia attributes of the same type.
– Join Conditions are defined between pairs of local classes belonging to G and allow the system to identify instances of the same real-world object in different sources.
Claudio GennaroISDSI 2009 14
Example of mappingExample of mapping
Hotel resort hotelname (join) Name denominationtelephone Telephone telfax Fax faxwww Web-site wwwroom_num Room_num roomsprice (RF) Price_avg mean_pricecity City locationstars Stars –free_wifi – free_wifiphoto Photo imgdescription Description commentary
Claudio GennaroISDSI 2009 15
MappingMapping
– Resolution Functions are introduced to solve data conflicts of local attribute values associated to the same real-world object. In our framework we consider and implement some of such resolution functions, in particular, the PREFERRED function, which takes the value of a preferred source and the RANDOM function, which takes a random value.
– For what concern the multimedia attributes, we introduce a new resolution function, called MOST_SIMILAR, which returns the multimedia objects most similar to the one expressed in the query (if any).
Claudio GennaroISDSI 2009 16
• Given a global class G with m attributes of which k multimedia attributes, denoted by G.S1,…,G.Sk (as photo and description in the class Hotel) and h standard attributes, denoted by G.A1,…,G.Ah, a query on G (global query) is a conjunctive query, expressed in a simple abstract SQL-like syntax as:SELECT G.Al,…,G.Sj
FROM GWHERE G.Ax op1 val1
AND G.Ay op2 val2
...ORDER BY G.Sw(Q1), …, G.Sz(Q2)
LIMIT K
Query the SPDOQuery the SPDO
Claudio GennaroISDSI 2009 17
Query unfoldingQuery unfolding
• To answer a global query on G, the query must be rewritten as an equivalent set of queries (local queries) expressed on the local classes L(G) belonging to G.
• the query rewriting is performed by means of query unfolding, which consists of the following four steps:1. Computation of Local Query conditions2. Computation of Residual Conditions3. Fusion of local answers4. Application of the Residual Condition
Claudio GennaroISDSI 2009 18
Query Fusion: RankingQuery Fusion: Ranking
• Why?– Modern multimedia content managers typically return multimedia objects (i.e., which support similarity) in decreasing order of relevance, that is, so that the “best” answers are on the top;
– we want to preserve this knowledge at global level;
– However, since we cannot exploit scores we use the rank as indicator of the relevance of the record returned.
Claudio GennaroISDSI 2009 19
Ranking the resultsRanking the results
• our problem falls into the category of the partial rank aggregation problems, in which we merge top-k lists rather than fully ranked lists,
• We use a simple but yet effective aggregation function for ordinal ranks is the median function: – The score of an object its median position in all the returned lists.
• The median function is demonstrated by Fagin et al., to be near-optimal, even for top-k or partial lists.
• The algorithm MEDRANK is based on median rank aggregation
Claudio GennaroISDSI 2009 20
The MedRank algorithmThe MedRank algorithm
R1
1 A
2 B
3 C
4 D
R2
1 B
2 A
3 D
4 C
R3
1 B
2 C
3 A
4 D
R
1
2
3
4
• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated ranking
Claudio GennaroISDSI 2009 21
The MedRank algorithmThe MedRank algorithm
• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated rankingR1
1 A
2 B
3 C
4 D
R2
1 B
2 A
3 D
4 C
R3
1 B
2 C
3 A
4 D
R
1 B
2
3
4
Claudio GennaroISDSI 2009 22
The MedRank algorithmThe MedRank algorithm
• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated rankingR1
1 A
2 B
3 C
4 D
R2
1 B
2 A
3 D
4 C
R3
1 B
2 C
3 A
4 D
R
1 B
2 A
3
4
Claudio GennaroISDSI 2009 23
The MedRank algorithmThe MedRank algorithm
• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated rankingR1
1 A
2 B
3 C
4 D
R2
1 B
2 A
3 D
4 C
R3
1 B
2 C
3 A
4 D
R
1 B
2 A
3 C
4
Claudio GennaroISDSI 2009 24
The MedRank algorithmThe MedRank algorithm
R1
1 A
2 B
3 C
4 D
R2
1 B
2 A
3 D
4 C
R3
1 B
2 C
3 A
4 D
R
1 B
2 A
3 C
4 D
• Access the rankings sequentially– when an element has appeared in more than half of the rankings, output it in the aggregated ranking
Claudio GennaroISDSI 2009 25
ExampleExample
• We would like to found image about the “Arch of Triumph of Rome by night”.
• and we assume to have two DMSs containing images of monuments in the world, the first DMS1 with geographical coordinates search capabilities, and the second one DMS2 with image similarity search capabilities;
Claudio GennaroISDSI 2009 26
ExampleExample
G DMS1 DMS2URL (join) url url_addressSubject subject typeImg - imgGeoCoord GeoCoord -
Claudio GennaroISDSI 2009 27
SELECT …FROM DMS1WHERE subject=“Monument” ORDER BY GeoCoord(41°53'43.68"N, 12°28'56.34"E )STOP AFTER 5
ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E
Dist = 1km
Dist = 1km
Dist = 1km
Dist = 1km
Dist = 2km
Roma. Palazzo della Civiltà del Lavoro. EUR
Unfortunately if I just for geo coordinates giving the coordinates of Rome as input I found a lot of images of the Colosseum
Claudio GennaroISDSI 2009 28
SELECT …FROM DMS2WHERE type=“Monument”ORDER BY Img(URL),STOP AFTER 5
ORDER BY Image(URL)
Roma. Palazzo della Civiltà del Lavoro. EUR
And if I just search for similarity an image of the “Arch of Triumph of Rome by night” I found a lot of images about the Arch of Triumph of Paris, which is very similar but more famous.
Claudio GennaroISDSI 2009 29
SELECT …FROM WorldMonumentsWHERE Subject=“Monument”ORDER BY Img(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E )STOP AFTER 5
ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E
ORDER BY Image(URL)
Dist = 1km
Dist = 1km
Dist = 1km
Dist = 1km
Dist = 2km
Roma. Palazzo della Civiltà del Lavoro. EUR
Claudio GennaroISDSI 2009 30
Conclusion and future workConclusion and future work
• We presented a methodology implemented in a tool that allows a user to create and query an integrated view of data and multimedia sources.
• Future work will be devoted to experiment the tool in real scenarios. In particular, our tool will be exploited for integrating business catalogs related to the area of “tiles”. – We think that such data may provide useful test cases because of the need of connecting data about the features of the tiles with their images.
Claudio GennaroISDSI 2009 31
THE ENDTHE END
Claudio GennaroISDSI 2009 32
Building the Data Ontology: Building the Data Ontology: MOMISMOMIS
MOMIS* (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources
Semantic Integration of Information• A common data model ODLI3 (derived from ODL-ODMG and I3) & mapped
into OLCD description logicsTool-supported techniques to construct the Global Virtual View
(GVV)• Local sources wrapping• Local Schema Annotation w.r.t. a common lexical ontology (WordNet)• Semi-automatic discovery of relationships between local schemata• Clustering techniques to build the GVV & mappings between the GVV
and local schemata (Mapping Table)• automatic GVV Annotation w.r.t. a common lexical ontology & OWL
exportationGlobal Query Management• Including services and multimedia data sources
25/03/2008 32
D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini: "Synthesizing an Integrated Ontology ", IEEE Internet Computing Magazine, September-October 2003,42-51.
S. Bergamaschi, S. Castano, M. Vincini "Semantic Integration of Semistructured and Structured Data Sources", SIGMOD Record Special Issue on Semantic Interoperability in Global Information, Vol. 28, No. 1, March 1999.
*
Claudio GennaroISDSI 2009 33
MOMIS architecture
SYNSET2
SYNSET#
SYNSET4
SYNSET1
MANUALANNOTATION
SEMI-AUTOMATICANNOTATION
INFERRED RELATIONSHIPS
LEXICON DERIVEDRELATIONSHIPS
SCHEMA DERIVEDRELATIONSHIPS
CommonThesaurus
COMMON THESAURUSGENERATION
USER SUPPLIEDRELATIONSHIPS
ODLI3LOCAL SCHEMA N
WRAPPING
ODLI3LOCAL SCHEMA 1
…
…
GVV GENERATION
MAPPING TABLES
GLOBAL CLASSES
Claudio GennaroISDSI 2009 34
Mapping definition in MOMISMapping definition in MOMIS
Mappings among a Global Class G of the GVV and its local classes are represented by a Mapping Table
Global-as-View (GAV) mappings: for each global class G a view VG over the local classes of G
is defined by a Full-Join Merge Operator: • Outer Join : to include into the result all tuples of all
local sources• Merge : to perform data reconciliation (Resolution
functions)
L1.resort L2.Hotel
name(join) Name denominationcity City locationstars Stars free_wifi free_wifiprice Price_avg mean_price
Claudio GennaroISDSI 2009 35
Building the Mappings: an exampleBuilding the Mappings: an example
L1.resort L2.hotel
name(join) Name denominationcity City location stars Stars free_wifi free_wifiprice Price_avg mean_price
from T_L1 outer join T_L2
DollarEuro(mean_price)
Data Conversion Functions
on (T_L1.Name = T_L2.denomination)
Join Attribute
Join Conditions
FullJoin
Select name, avg(T_L1.price_avg, T_L2.mean_price) as price, T_L1.Stars, …
Resolution Functions
avg(L1,L2)
FullJoinMerge
Mapping Table of the global Class Hotel = {L1.resort, L2.hotel}
Claudio GennaroISDSI 2009 36
Global Query ManagementGlobal Query Management The querying problem:
How to answer queries expressed on the GS (global queries)?
In a Virtual Data Integration system, data reside at the data sources then the query processing is based on Query rewriting : to rewrite a global query as an equivalent set of queries expressed on the local schemata data sources (local queries).
GAV approach: query rewriting is performed by unfolding, i.e. by expanding a global query on G according to the view associated to G
Query Optimization Techniques for the Full-Join Merge Operator
Motivation :
1. full outer join queries are very expensive, especially in a distributed environment
2. only limited optimization is performed on full outer join
Claudio GennaroISDSI 2009 37
An example of Full-Join Merge An example of Full-Join Merge OptmizationOptmization
SELECT * FROM G WHERE city LIKE "%Modena%" AND price < 200
Apply resolution functions: price =AVG()
Apply residual constraints : price < 200
Result
LQ1= SELECT * FROM L1 WHERE City LIKE "%Modena%"
LQ2= SELECT * FROM L2 WHERE location LIKE "%Modena%"
LQ1 FULL JOIN LQ2
AND stars = 4
AND stars = 4
RIGHT JOIN
AND free_wifi = true
AND free_wifi = true
INNER JOIN
Claudio GennaroISDSI 2009 38
MILOS
Metadata Editor:Visual Basic (SOAP Comm.)
Repository Metadata Integrator:
Access to documentsAccess to metadataMetadata indepence(SOAP Web Service)
MultiMedia doc. serv.:Allows homoneous acces to heterogeneous media
(SOAP Web Service)
XML Search Engine:Structure searchFielded searchFull text searchMultimedia searchSchema independentXQuery support(SOAP Web Service)
Metadata independence:The schema seen in the
interface logic can be different of the one(s) used in
the repository
Retrieval Interface:JSP(SOAP Comm.)
Claudio GennaroISDSI 2009 39
MILOS (2)
• The MILOS system is based on a three–tier distributed architecture:
• Client tier This is the top most level of the system. It contains client application that interacts with MILOS and that displays results to user applications.
• Business logic It manages query processing by integrating and aligning information stored in the databases. It performs reconciliation of retrieved data by managing ranking.
• Data tier It is composed of the Large Object Database, that physically stores multimedia documents managed by the system and the metadata database, where all metadata associated with the multimedia items are stored.
• Multimedia metadata are represented in the data tier in XML formats. MILOS adopts a native XML database, which supports XML query language standards and offers advanced search and indexing functionality on arbitrary XML documents.
• MILOS XML database provides full–text search, automatic classification, and feature similarity search functionalities.
• the Large Object Database permits clients of MILOS to deal with multimedia in an uniform way.
Claudio GennaroISDSI 2009 40
The MedRank algorithmThe MedRank algorithm
• Whenever there are multiple multimedia attributes strange side effects can affect the precision of the answer.
• Example:
– Suppose we have two image database consisting of monument images. • MS1: provides image similarity and geografic coordinates• MS2: provides only image similarity
– The query consists of a sample image and a point coordinates
Claudio GennaroISDSI 2009 41
SELECT …FROM WorldMonuments
ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E )STOP AFTER 5
ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E
ORDER BY Image(URL)
Dist = 1km
Dist = 1km
Dist = 1km
Dist = 1km
Dist = 2km
Roma. Palazzo della Civiltà del Lavoro. EUR
Claudio GennaroISDSI 2009 42
DMS: AssumptionsDMS: Assumptions
• The rationale of the above assumptions is that our aim is to work in a general environment with heterogeneous DMSs for which we do not have any knowledge of their scoring functions. – The motivation is that the final scores themselves are often the result
of the contributions of the scores of each attribute. A scoring function is therefore usually defined as an aggregation over partial heterogeneous scores (e.g., the relevance for text-based IR with keyword queries, or similarity degrees for color and texture of images in a multimedia database).
– Even in the simpler case of single multimedia attributes the knowledge of the scores become meaningless outside the context in which they are evaluated. As an example consider the TF * IDF scoring function used by normal text search engines. The score of a document depends upon the collection statistics and search engines could use different scoring algorithms.
• However, the above assumptions of considering a local DMS as a black box that does not return any score associated to result elements, do not presume that local DMSs do not use internally scoring functions for combing different multimedia attributes .– Typically modern multimedia systems use fuzzy logic to aggregate scores
of different multimedia attributes that are graded in the interval [0,1]. Classical examples of thesefunctions are the min and mean functions.
Claudio GennaroISDSI 2009 43
• Each atomic predicate Pi and similarity predicate in the global query are rewritten into corresponding constraints supported by the local classes.
• For example, the constraints stars = 3 is translated into a constrain Stars = 3 considering the local class resort and is not translated into any constraint considering the local class hotel.
Computation of Local Query conditionsComputation of Local Query conditions
Claudio GennaroISDSI 2009 44
Computation of Residual ConditionsComputation of Residual Conditions
• Conditions on not homogeneous standard attributes cannot be translated into local conditions: they are considered as residual and have to be solved at the global level.
Claudio GennaroISDSI 2009 45
Computation of Residual ConditionsComputation of Residual Conditions
• for multimedia attribute we use the MOST_SIMILAR. For example, suppose we are searching for images similar to one specified in the query by means of ’ORDER BY’ clause. If we retrieve two or more multimedia objects with one or more corresponding images, MOST_SIMILAR function will simply select the image that is more similar to the query image.
• However since we do not know scores, how do we evaluate similarity?
Claudio GennaroISDSI 2009 46
Computation of Residual ConditionsComputation of Residual Conditions
• Rank Based Similarity: • we simply exploit the rank of the objects in the
returned list as indicator of similarity between the attributes values belonging to the objects.
• This aspect is related with the problem of the fusion
Claudio GennaroISDSI 2009 47
Fusion of local answersFusion of local answers
• For each local source involved in the global query, a local query is generated and executed on the local sources. The local answers are fused into the global answer on the basis of the mapping query qG defined for G, i.e. by using the Full Outerjoin-merge (FOJ) operation.– Computation of the full outer join of local answers (FOJ). The result of this operation is ordered on the basis of the multimedia attributes specified in the query, this aspect is deeply examined in the next Slide.
– Application of the Resolution Functions : for each attribute GA of the global query the related Resolution Function is applied to FOJ
Claudio GennaroISDSI 2009 48
Ranking the resultsRanking the results
• In principle, if we had ALL the (fused) records of the result set we can exploit an optimal rank aggregation method based on a distance measure to quantify the disagreements among different rankings.
• In this respect the overall ranking is the one that has minimum distance to the different rankings obtained from different sources.
• Several different distance measures are available in literature. However, the difficult of solving the problem of distance-based rank aggregation is related to the choice of the distance measure and its corresponding complexity that can be even NP-Hard in some cases (see Kendall distance).
• However, fortunately, our case falls into this category of the partial rank aggregation problems, in which we measures the distance between only the top-k lists rather than fully ranked lists.
Claudio GennaroISDSI 2009 49
ExampleExample11
R1
1 A
2 B
3 C
4 D
R2
1 B
2 A
3 D
4 C
R3
1 B
2 C
3 A
4 D
A: ( 1 , 2 , 3 )B: ( 1 , 1 , 2 )C: ( 3 , 3 , 4 )D: ( 3 , 4 , 4 )
R
1 B
2 A
3 C
4 D
1 http://www.cs.helsinki.fi/u/tsaparas/InformationNetworks/lectures/lecture10.ppt
Claudio GennaroISDSI 2009 50
Combining rankingsCombining rankings
• In many cases the scores are not known– e.g. meta-search engines – scores are proprietary information
• … or we do not know how they were obtained– one search engine returns score 10, the other 100. What does this mean?
• … or the scores are incompatible– apples and oranges: does it make sense to combine price with distance?
• In this cases we can only work with the rankings