the m ediation of i nformation using x ml project

90
Mediation of Information using Xml project BY:Amir Atauna & Michael Brautbar BY:Amir Atauna & Michael Brautbar

Upload: hue

Post on 19-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

The M ediation of I nformation using X ml project. BY:Amir Atauna & Michael Brautbar. What is a Mediator and Why is it Needed?. Huge quantity of information on the web. Users wants to find information on the web that is related to their problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The M ediation of                     I nformation using X ml         project

The Mediation of Information using Xml project

BY:Amir Atauna & Michael BrautbarBY:Amir Atauna & Michael Brautbar

Page 2: The M ediation of                     I nformation using X ml         project

What is a Mediator and Why is it Needed?

• Huge quantity of information on the web.Huge quantity of information on the web.• Users wants to find information on the web that Users wants to find information on the web that is related to their problem. is related to their problem.• Problem: The information is distributed across Problem: The information is distributed across many sources, each source provides a different many sources, each source provides a different interface and exports the data in a different interface and exports the data in a different format. format.

Page 3: The M ediation of                     I nformation using X ml         project

•Mediator systems will assist the users by Mediator systems will assist the users by providing them integrated views of the data they are providing them integrated views of the data they are interested in. interested in.

•Example: a Web-shopping mediator will provide Example: a Web-shopping mediator will provide to the Web value-shopper a view where the to the Web value-shopper a view where the lowest prices for each product are provided. lowest prices for each product are provided.

• The goal of MIX is to facilitate the development of The goal of MIX is to facilitate the development of such mediators. such mediators.

Page 4: The M ediation of                     I nformation using X ml         project

Is the mediator concept new?

• No, the TSIMMIS mediator uses the No, the TSIMMIS mediator uses the semistructured model OEM (Object Exchange semistructured model OEM (Object Exchange Model).Model).• Wrappers export the source data translated to Wrappers export the source data translated to OEM.OEM.• The mediator export an integrated view of the The mediator export an integrated view of the wrapper data based on a view definition provided wrapper data based on a view definition provided by the administrator.by the administrator.

Page 5: The M ediation of                     I nformation using X ml         project

• The view definition is expressed in the The view definition is expressed in the Mediator Mediator Specification Language Specification Language (MSL).(MSL).• At runtime the mediator receives queries, which At runtime the mediator receives queries, which refer to the view objects and expressed in MSL.refer to the view objects and expressed in MSL.• First, the incoming query is combined with the First, the incoming query is combined with the view definition into a query which refers directly view definition into a query which refers directly to source data.to source data.• Then the optimizer finds a plan to execute the Then the optimizer finds a plan to execute the latter query by sending queries to the wrappers latter query by sending queries to the wrappers and combining their results in the mediator.and combining their results in the mediator.

Page 6: The M ediation of                     I nformation using X ml         project

• The wrappers translate the queries they receive The wrappers translate the queries they receive into queries understood by the sources.into queries understood by the sources.• The MSL specifications can be very “loose” on The MSL specifications can be very “loose” on the amount of info they provide on the structures the amount of info they provide on the structures they provide.they provide.• This is a valuable feature when working with This is a valuable feature when working with dynamic semistructured sources.dynamic semistructured sources.• There are two weak points: There are two weak points: - The user does not know the structure ot the - The user does not know the structure ot the underlying data and this impedes his efforts to underlying data and this impedes his efforts to formulate a reasonable queries.formulate a reasonable queries.

Page 7: The M ediation of                     I nformation using X ml         project

Second - the mediator may not have complete or Second - the mediator may not have complete or any information of the metadata and structure of any information of the metadata and structure of each source and this leads to a heavy loss of each source and this leads to a heavy loss of performanceperformance

• MIX solves this problems with DTDsMIX solves this problems with DTDs

Page 8: The M ediation of                     I nformation using X ml         project

The Philosophy of MIX: The Web as a Distributed Database

•The developer of this system strongly believe that The developer of this system strongly believe that the Web will emerge as a distributed database and the Web will emerge as a distributed database and XML (or some extension/modification of XML) XML (or some extension/modification of XML) will be the data model of this huge database.will be the data model of this huge database.• The MIX mediator views XML as a database The MIX mediator views XML as a database model and uses the mediator concept as known in model and uses the mediator concept as known in the DB area.the DB area.

Page 9: The M ediation of                     I nformation using X ml         project
Page 10: The M ediation of                     I nformation using X ml         project

• Sources will be exporting an XML view of their Sources will be exporting an XML view of their data along with semantic descriptions of the content data along with semantic descriptions of the content (Source DTDs) and descriptions of the interfaces (Source DTDs) and descriptions of the interfaces (XML queries) that may be used for accessing the (XML queries) that may be used for accessing the data.data.• Users and applications will then be able to query Users and applications will then be able to query these view documents using some XML query these view documents using some XML query language.language.• The MIX mediator uses the source DTDs to assist The MIX mediator uses the source DTDs to assist the user in query formulation and the query the user in query formulation and the query processors in running queries more efficiently.processors in running queries more efficiently.

Page 11: The M ediation of                     I nformation using X ml         project

•MIX’s query evaluation is done in a MIX’s query evaluation is done in a lazylazy approach approach ((on demand), on demand), i.e. XML queries (expressed in i.e. XML queries (expressed in XMAS) are unfolded and rewritten at runtime. XMAS) are unfolded and rewritten at runtime. •In the other approach, the In the other approach, the eager (warehousing)eager (warehousing), the , the data integration occurs in a separate materialization data integration occurs in a separate materialization step, before the actual user queries. step, before the actual user queries.

Page 12: The M ediation of                     I nformation using X ml         project

• Conventional data repositories are not Conventional data repositories are not expected to be converted to XML. expected to be converted to XML. • Wrappers technologies that allow us to Wrappers technologies that allow us to logically view an information source (which logically view an information source (which may be a relational database, a collection of may be a relational database, a collection of html pages, or even a legacy information html pages, or even a legacy information system) as a large XML source. system) as a large XML source. •The wrappers are able to translate XMAS The wrappers are able to translate XMAS queries into queries or commands that the queries into queries or commands that the underlying source understands. underlying source understands. •They are also able to translate the result of They are also able to translate the result of the source into XML. the source into XML.

Page 13: The M ediation of                     I nformation using X ml         project

Creating Mediated Views Using MIX mediator and Querying

them with BBQ• The XML documents have to be integrated.The XML documents have to be integrated.• One goal of MIX is to develop integrated views One goal of MIX is to develop integrated views and fast.and fast.• For this the developers use XMAS as the view For this the developers use XMAS as the view definition language.definition language.

Page 14: The M ediation of                     I nformation using X ml         project
Page 15: The M ediation of                     I nformation using X ml         project

• The BBQ (Blended Browsing and Querying ) user The BBQ (Blended Browsing and Querying ) user interface enables the users to formulate XMAS interface enables the users to formulate XMAS queries using a GUI that reminds of query-by-queries using a GUI that reminds of query-by-example interfaces in relational databaseexample interfaces in relational database

Page 16: The M ediation of                     I nformation using X ml         project

The MIX Architecture

Page 17: The M ediation of                     I nformation using X ml         project

• The graphical user interface BBQ allows the The graphical user interface BBQ allows the construction of queries.construction of queries.• In order to accomplish the integration, the MIX In order to accomplish the integration, the MIX mediator comprises several modules.mediator comprises several modules.

- Its main inputs are XMAS queries generated - Its main inputs are XMAS queries generated by the BBQ, and the by the BBQ, and the mediator view definition mediator view definition (also in XMAS) for the integrated view.(also in XMAS) for the integrated view.

- The - The resolution resolution module resolves the user query module resolves the user query with the mediator view definition, resulting in a with the mediator view definition, resulting in a set of unfolded XML queries that refer to the set of unfolded XML queries that refer to the wrapper views.wrapper views.

Page 18: The M ediation of                     I nformation using X ml         project

- The - The simplification simplification module is used to further module is used to further simplify the XML queries based on the simplify the XML queries based on the underlying XML DTDs.underlying XML DTDs.

- The DTD - The DTD inferenceinference module can be used to module can be used to automatically derive view DTDs from source automatically derive view DTDs from source DTDs and queries for supporting the integration DTDs and queries for supporting the integration task of the mediation engineer (This is done off-task of the mediation engineer (This is done off-line).line).

- The - The translation translation module maps the simplified module maps the simplified queries into the XMAS algebra.queries into the XMAS algebra.

Page 19: The M ediation of                     I nformation using X ml         project

- The - The optimization optimization module can be used to module can be used to further optimize the XMAS queries.further optimize the XMAS queries.

- The - The execution execution engine issues XMAS queries engine issues XMAS queries against the wrappers, and returns the requested against the wrappers, and returns the requested XML data to the user, after integrating the XML data to the user, after integrating the retrieved data according to the mediator view.retrieved data according to the mediator view.

The The wrappers wrappers are used to export data in a are used to export data in a uniform format to the mediatoruniform format to the mediator

Page 20: The M ediation of                     I nformation using X ml         project

The XMAS Language

The data model of the sources of the mix The data model of the sources of the mix mediator are valid XML docsmediator are valid XML docs

We need a way to formulate queries that We need a way to formulate queries that can relate to data in multiple XML docscan relate to data in multiple XML docs

XML document structure may be tightly XML document structure may be tightly structured as in a relational databases structured as in a relational databases or to have no structure at allor to have no structure at all

Page 21: The M ediation of                     I nformation using X ml         project

The XMAS Language Cont So we need a query language that is as So we need a query language that is as

strong as relational algebra strong as relational algebra Preferable features of the language :Preferable features of the language : Simple formulation of queriesSimple formulation of queries Will logically describe what we Will logically describe what we

want to say want to say

Page 22: The M ediation of                     I nformation using X ml         project

Solution : XMAS

XMAS stands for XML matching and XMAS stands for XML matching and structuring languagestructuring language

Declarative ,high level languageDeclarative ,high level language Build upon ideas of languages like Build upon ideas of languages like

XML - QL , MSL.XML - QL , MSL.

Page 23: The M ediation of                     I nformation using X ml         project

General Structure Of An XMAS Query

CONSTRUCTCONSTRUCT head head

WHEREWHERE bodybody11 ININ sourcesource11 (AND |OR |NOT )(AND |OR |NOT ) bodybody22 IN IN sourcesource22 (AND |OR |NOT )(AND |OR |NOT ) bodybody33 IN IN sourcesource33 ...... (AND |OR |NOT )(AND |OR |NOT )

bodybodynn ININ sourcesourcenn (AND |OR)(AND |OR) predicatepredicate

Page 24: The M ediation of                     I nformation using X ml         project

BodyBody (the “where” clause) : (the “where” clause) : specifies the data which is to be extracted specifies the data which is to be extracted from the XML sourcesfrom the XML sources

HeadHead (the “construct” clause) : (the “construct” clause) : describes how the extracted data is arranged describes how the extracted data is arranged into a new answer XML document. In this into a new answer XML document. In this part we may use the “collection” operator part we may use the “collection” operator and the “ordering” operator. and the “ordering” operator. (Will be explained later on) (Will be explained later on)

( Body and head roughly resembles the ( Body and head roughly resembles the select and where in SQL)select and where in SQL)

Page 25: The M ediation of                     I nformation using X ml         project

Predicate Predicate : defines conditions on the variables : defines conditions on the variables

occurring in the occurring in the sourcessources Lets look at an example Lets look at an example <!Element neighborhoods (neighborhood)*> <!<!Element neighborhoods (neighborhood)*> <!

Element neighborhood (zip, name, type, Element neighborhood (zip, name, type, population)> <!population)> <!

Element zip (#pcdata)> Element zip (#pcdata)> <!<!Element name (#pcdata)> Element name (#pcdata)> <! <!Element type (#pcdata)> Element type (#pcdata)> <!Element <!Element population (#pcdata)>population (#pcdata)>

Page 26: The M ediation of                     I nformation using X ml         project

For Example We Can Have The Following XML Doc For That DTD <Neighborhoods <Neighborhoods

<neighborhood><neighborhood><zip>91901</zip><zip>91901</zip><name>alpine</name><name>alpine</name><type>rural/town</type><type>rural/town</type><population>13238</population><population>13238</population>

</neighborhood></neighborhood><neighborhood><neighborhood>

<zip>91903</zip><zip>91903</zip><name>alpine</name><name>alpine</name><type>rural/town</type><type>rural/town</type><population>4783</population><population>4783</population>

</neighborhood></neighborhood>

……

Page 27: The M ediation of                     I nformation using X ml         project

Query Example

Suppose we want to retrieve all names of Suppose we want to retrieve all names of “big” neighborhoods ,say where population “big” neighborhoods ,say where population is greater than 30000is greater than 30000

In XMAS we can write the In XMAS we can write the following query:following query:

Page 28: The M ediation of                     I nformation using X ml         project

ConstructConstruct <Big_neighborhoods><Big_neighborhoods> <Big_neighborhood><Big_neighborhood> <Name>$n</><Name>$n</> </> {$N}</> {$N} </></> WhereWhere <Neighborhoods><Neighborhoods> <Neighborhood><Neighborhood> <Name>$n</><Name>$n</> <Population>$p</><Population>$p</> </></> </></> IN IN

"http://www.Pnaci.Edu/dice/mix/tutorial/neighborhoods.Xml”"http://www.Pnaci.Edu/dice/mix/tutorial/neighborhoods.Xml” And $p>30000And $p>30000

Page 29: The M ediation of                     I nformation using X ml         project

How Does It Work Lets look at the body of the query above.Lets look at the body of the query above.

This tree pattern mimics the tree This tree pattern mimics the tree structure of the input XML documentstructure of the input XML document

The variables $N and $P are used to “get a The variables $N and $P are used to “get a hold” of the data at the corresponding hold” of the data at the corresponding locations in the tree structure representing locations in the tree structure representing the input XML doc. the input XML doc. In In other words , the tree pattern specifies that : other words , the tree pattern specifies that :

the the root element of the XML doc is of type root element of the XML doc is of type big_neighborhoodsbig_neighborhoods

Page 30: The M ediation of                     I nformation using X ml         project

Within big_neighborhoods there must be some Within big_neighborhoods there must be some big_neighborhood subelement ,which itself contain big_neighborhood subelement ,which itself contain name and population subelementsname and population subelements

In this way , the tree pattern specifies a list of pairs In this way , the tree pattern specifies a list of pairs of variable bindings for $N and $Pof variable bindings for $N and $P

From this list we want to select only those which From this list we want to select only those which satisfy the condition $P > 30000 satisfy the condition $P > 30000

To summarize , To summarize , the body defines a listthe body defines a list [(n1; p1); ...; (nk; pk)] of all variable bindings for [(n1; p1); ...; (nk; pk)] of all variable bindings for ($N,$P), which match (or satisfy) the body($N,$P), which match (or satisfy) the body

Page 31: The M ediation of                     I nformation using X ml         project

The “head” consists of an XML tree pattern which The “head” consists of an XML tree pattern which contains some or all the of the variables of the bodycontains some or all the of the variables of the body

In the example above , In the example above , the head define a root the head define a root element big_neighborhoods with a element big_neighborhoods with a big_neighborhood subelement, having in turn a big_neighborhood subelement, having in turn a name subelement. name subelement. The The latter is used to hold the bindings for $N which latter is used to hold the bindings for $N which have been obtained through the bodyhave been obtained through the body

Using {$N} expresses that we want to have only Using {$N} expresses that we want to have only one big_neighborhoods element that has a one big_neighborhoods element that has a number of big_neighborhood subelements. number of big_neighborhood subelements. (One for each name $N obtained from the (One for each name $N obtained from the body)body)

Page 32: The M ediation of                     I nformation using X ml         project

The Collection Operator Is used to collect all binding of the subelemnt to be Is used to collect all binding of the subelemnt to be

put under the father elementput under the father element Has two kinds : implicit and explicit Has two kinds : implicit and explicit The usage for the explicit version is {$N} where $N The usage for the explicit version is {$N} where $N

is a free variable in that levelis a free variable in that level For example (of the explicit usage), For example (of the explicit usage),

consider the previous example consider the previous example

Page 33: The M ediation of                     I nformation using X ml         project

The Collection Operator Cont We create exactly one big We create exactly one big

neighborhood element for each binding neighborhood element for each binding nn11; ...; n; ...; nkk of $N (thereby biding the value of $N (thereby biding the value of $N within the big neighborhood of $N within the big neighborhood element to one nelement to one nii), and all these ), and all these elements are collected as subelements elements are collected as subelements of the parent elementof the parent element

Page 34: The M ediation of                     I nformation using X ml         project

The Collection Operator Cont For elements in the head which do not For elements in the head which do not

have an explicit collection label, an have an explicit collection label, an implicit collection label may be used implicit collection label may be used

The implicit collection variables of an The implicit collection variables of an element E are those which are free element E are those which are free in Ein E

The usage for the explicit version is [ ... ] The usage for the explicit version is [ ... ] where ‘[ ‘ is before the beginning of the where ‘[ ‘ is before the beginning of the section and ‘]’ is at it’s endsection and ‘]’ is at it’s end

Page 35: The M ediation of                     I nformation using X ml         project

The Collection Operator Cont For example consider the following For example consider the following

code code <<answer>answer>[<a>[<a> $A $A

[<b> [<b> $B $B

[<c> $C </c>] [<c> $C </c>] </b>] </b>]

</a>] </a>] </answer></answer>

The above corresponds to a nested loop structureThe above corresponds to a nested loop structure

Page 36: The M ediation of                     I nformation using X ml         project

The Ordering Operator All subelemnts binding may be ordered by a All subelemnts binding may be ordered by a

given ordergiven order If no order is specified a default order is used.If no order is specified a default order is used.

(Based on the order in which the data was (Based on the order in which the data was found)found)

Example :consider the next DTD and the Example :consider the next DTD and the given query after itgiven query after it

Page 37: The M ediation of                     I nformation using X ml         project

<!Element home empty> <!Element home empty> <!Attlist home <!Attlist home zip pcdata #required zip pcdata #required

pcdata #required > pcdata #required > And the query is:And the query is: CONSTRUCT <answer> CONSTRUCT <answer>

<homes> { $H} <homes> { $H} order by order by $H.Price $H.Price </homes> WHERE </homes> WHERE <home> $H </><home> $H </>

IN "http://www.Mine.Xml" IN "http://www.Mine.Xml"

Page 38: The M ediation of                     I nformation using X ml         project

So ,Mmm ,Is XMAS So Powerful ?

Home buyer's scenario. Home buyer's scenario. A user who wants to buy a home . A user who wants to buy a home .

he wants to make use of information available he wants to make use of information available fromfrom the web to guide this decision. A possible query thatthe web to guide this decision. A possible query that

the user may issue is:the user may issue is:find all houses with 3 bedrooms, 2 baths, interiorfind all houses with 3 bedrooms, 2 baths, interior

area at least 1600 sq.Ft., Priced between $ area at least 1600 sq.Ft., Priced between $ 250k and $ 350k, in regions where the school rating 250k and $ 350k, in regions where the school rating is at leastis at least 70 (out of 100) and the crime rate is no more 70 (out of 100) and the crime rate is no more thanthan 15 incidents per year. Group the answers by 15 incidents per year. Group the answers by regionregion and order them by price. For each home also and order them by price. For each home also show the show the nearby schools."nearby schools."

Page 39: The M ediation of                     I nformation using X ml         project
Page 40: The M ediation of                     I nformation using X ml         project

Strong As Relational Algebra As mentioned before , one of the features of As mentioned before , one of the features of

XMAS is that it is as expressive as relational XMAS is that it is as expressive as relational algebra . some examples for this :algebra . some examples for this :

Selection : selection Selection : selection on a variable is made in the ‘predicate’ part of the on a variable is made in the ‘predicate’ part of the query:query:

Projection: write in the head just those variable Projection: write in the head just those variable that you want to project that you want to project

Page 41: The M ediation of                     I nformation using X ml         project

A natural join can be obtained by A natural join can be obtained by equating variables in the bodyequating variables in the body

Cartesian product may also be Cartesian product may also be expressed easilyexpressed easily

Page 42: The M ediation of                     I nformation using X ml         project

CONSTRUCT<neighborhoods_med> <neighborhood_med> $N $S </> {$N, $S}</>WHERE <neighborhoods> $N: <neighborhood> <zip>$Z</> </> </>IN "http://www.npaci.edu/DICE/MIX/tutorial/neighborhoods.xml"AND <schools> $S: <school> <zip>$Z1</> </> </>IN "http://www.npaci.edu/DICE/MIX/tutorial/schools.xml"AND $Z=$Z1

Cartesian product is easily expressed by removing the condition $Z=$Z1

Page 43: The M ediation of                     I nformation using X ml         project

Merry XMASMerry XMAS

Page 44: The M ediation of                     I nformation using X ml         project

DTD Inference

Page 45: The M ediation of                     I nformation using X ml         project

The MIX mediator and the advantages of living with DTD-

provided structure• The MIX mediator employs DTDs to assist the The MIX mediator employs DTDs to assist the user in information discovery, query formulation user in information discovery, query formulation and to allow the query processor to derive more and to allow the query processor to derive more efficient plans.efficient plans.• The The view DTD inference view DTD inference module derive view module derive view DTD given the source DTDs and the view.DTD given the source DTDs and the view.

Page 46: The M ediation of                     I nformation using X ml         project

•The view DTD is passed to the DTD-based query The view DTD is passed to the DTD-based query interface to enable query formulation.interface to enable query formulation.• A DTD inference algorithms developed for a A DTD inference algorithms developed for a limited class of XMAS queries/views.limited class of XMAS queries/views.

- - pick-elements pick-elements XMAS queries, i.e., queries XMAS queries, i.e., queries whose SELECT clause has a single variable, whose SELECT clause has a single variable, called called pick-variablepick-variable, that binds to elements and , that binds to elements and WHERE clause consists of a single condition that WHERE clause consists of a single condition that is applied to only one source. is applied to only one source.

Page 47: The M ediation of                     I nformation using X ml         project

• It is easy to compute a loose DTD for a view but It is easy to compute a loose DTD for a view but it is critical to the query interface and the query it is critical to the query interface and the query processor to get the one that describe the view as processor to get the one that describe the view as precisely as possible.precisely as possible.

Page 48: The M ediation of                     I nformation using X ml         project

• Also “precise” view DTDs may have other Also “precise” view DTDs may have other applications than ours, for example, it may be used applications than ours, for example, it may be used as a toolkit for generating XSL style sheets for as a toolkit for generating XSL style sheets for presentation of the view.presentation of the view.• A criterion for judging the precision of a view A criterion for judging the precision of a view DTD is DTD is tightness.tightness.• A DTD d1 is tighter then a DTD d2 if every A DTD d1 is tighter then a DTD d2 if every document described by d1 also described by d2.document described by d1 also described by d2.• The tightness criterion can be a benchmark for The tightness criterion can be a benchmark for other powerful view definition languages and view other powerful view definition languages and view inference algorithms.inference algorithms.

Page 49: The M ediation of                     I nformation using X ml         project

• So the view DTD inference algorithm attempts to So the view DTD inference algorithm attempts to derive to tightest DTD that contains all the possible derive to tightest DTD that contains all the possible documents that may appear as the content of the documents that may appear as the content of the view.view.• Unfortunately, even the tightest view DTD Unfortunately, even the tightest view DTD describes structures that can never appear as the describes structures that can never appear as the view’s content.view’s content.• For this the view DTD inference algorithm derive For this the view DTD inference algorithm derive an extended form of DTDs that typically does not an extended form of DTDs that typically does not have non-tightness problems known as have non-tightness problems known as Specialized Specialized DTDsDTDs..

Page 50: The M ediation of                     I nformation using X ml         project

Model and Query Language Framework

• The focus is on XML documents that meet the The focus is on XML documents that meet the following requirements:following requirements:

- XML always - XML always valid valid i.e. Have a DTD.i.e. Have a DTD.

- There are no other attributes than the ID - There are no other attributes than the ID attribute and all elements have an ID attribute.attribute and all elements have an ID attribute.

- There are no empty elements but elements with - There are no empty elements but elements with empty content are allowed.empty content are allowed.

- Mix content elements are not allowed i.e - Mix content elements are not allowed i.e elements whose content mixes strings with elements whose content mixes strings with elementselements

Page 51: The M ediation of                     I nformation using X ml         project

• Definition:Definition:Element Element - An element - An element e e is a triplet is a triplet consisting a name, name(e), a unique ID and consisting a name, name(e), a unique ID and content, content(e) which is a sequence of content, content(e) which is a sequence of elements or PCDATA value.elements or PCDATA value.• Definition:A DTD is a set Definition:A DTD is a set {<n : type(n)>| n {<n : type(n)>| n is in is in N} where N is the set of N} where N is the set of names and type(n) is either a regular expression names and type(n) is either a regular expression over N or PCDATA.over N or PCDATA.• L(r) is the regular language described by r.L(r) is the regular language described by r.

Page 52: The M ediation of                     I nformation using X ml         project

• Definition:An element e satisfies a DTD D, Definition:An element e satisfies a DTD D, e |= D, if the following conditions exist:e |= D, if the following conditions exist:

- name(e) is in N where N is the set of element - name(e) is in N where N is the set of element namesnames

- if content(e) = e- if content(e) = e11,e,e22,...,e,...,em m then name(e then name(e11) ... ) ...

Name(eName(e m m) are in L(type(name(e)) and e) are in L(type(name(e)) and e i i |= D |= D

1<=i<=m.1<=i<=m.

Else if content(e) is a string then Else if content(e) is a string then type(name(e))=PCDATA.type(name(e))=PCDATA.

Page 53: The M ediation of                     I nformation using X ml         project

Soundness & Tightness

• Definition:A view DTD DDefinition:A view DTD DV V is sound if, given is sound if, given

source DTDs Dsource DTDs D11,D,D22,...,D,...,Dn n and a view definition V, and a view definition V,

for every tuple (dfor every tuple (d11,d,d22,...,d,...,dnn) of n documents such ) of n documents such

that dthat d11|= D|= D11,d,d22|= D|= D22,...,d,...,dnn|= D|= Dn n the view document the view document

V(dV(d11,d,d22,...,d,...,dnn) |= D) |= DV V

• Definition:A DTD D is tighter then a DTD D’ if Definition:A DTD D is tighter then a DTD D’ if

every document satisfying D satisfies D’.every document satisfying D satisfies D’.• A type <n : r> is tighter then a type <n : r’> if L(r) A type <n : r> is tighter then a type <n : r’> if L(r) is contained in L(r’).is contained in L(r’).

Page 54: The M ediation of                     I nformation using X ml         project

• Definition: A DTD DDefinition: A DTD DV V is a tightest view DTD for is a tightest view DTD for

given source DTDs Dgiven source DTDs D11,D,D22,...,D,...,Dn n and a view and a view

definition V is there is no view DTD Ddefinition V is there is no view DTD DVV’ such that ’ such that

DDVV’ tighter than D’ tighter than DVV..

Page 55: The M ediation of                     I nformation using X ml         project

Structural Tightness

• In many practical cases even the tightest view In many practical cases even the tightest view DTDs describe view document structures that DTDs describe view document structures that cannot be produced by the view.cannot be produced by the view.• This information loss phenomenon is formalized This information loss phenomenon is formalized by introducing the structural tightness property of by introducing the structural tightness property of view DTDs.view DTDs.

Page 56: The M ediation of                     I nformation using X ml         project
Page 57: The M ediation of                     I nformation using X ml         project

• Definition: A structural class of documents is a Definition: A structural class of documents is a set of documents such that for every two set of documents such that for every two documents ddocuments d11,d,d22 in the class there is a mapping that in the class there is a mapping that

maps:maps:

- every string of d- every string of d11 on a string of d on a string of d22 and vice and vice

versa.versa.

- every id of d- every id of d11 into an id of d into an id of d22 and vice versa and vice versa

- if the mappings are applied to d- if the mappings are applied to d11 , d , d11 becomes becomes

identical to didentical to d22 and vice versa and vice versa

Page 58: The M ediation of                     I nformation using X ml         project

• Definition: A structural class of documents Definition: A structural class of documents satisfies a DTD D if the documents of the class satisfies a DTD D if the documents of the class satisfy D.satisfy D.• Definition: Given a set of sources DTDs DDefinition: Given a set of sources DTDs D11,…,D,…,Dnn

and a view V, a DTD Dand a view V, a DTD DVV is structurally tight if: is structurally tight if:

- it is the tightest DTD of the view given the - it is the tightest DTD of the view given the source DTDssource DTDs

- for every structural class S that satisfies D- for every structural class S that satisfies DVV

there is a view document I that satisfies Dthere is a view document I that satisfies DVV and and

there are also source documents Ithere are also source documents I11,…,I,…,Inn, satisfying , satisfying

DD11,…,D,…,Dn n and I = V(Iand I = V(I11,…,I,…,Inn).).

Page 59: The M ediation of                     I nformation using X ml         project

Specialized DTDs

• Specialized DTDs resolve the inherent non-Specialized DTDs resolve the inherent non-tightness problems of DTDstightness problems of DTDs• Query: Find all the “professor” and “grad” sub-Query: Find all the “professor” and “grad” sub-elements of “department” with one journal elements of “department” with one journal publication. publication.

Page 60: The M ediation of                     I nformation using X ml         project

How specialized DTDs are computed?

• The DTD tightening algorithm recursively The DTD tightening algorithm recursively “tightens” each type of the initial DTD by means “tightens” each type of the initial DTD by means of the of the type refinement type refinement algorithm.algorithm.• Definition: The type refinement refine(r,n) of a Definition: The type refinement refine(r,n) of a regular expression r given a name n is the regular regular expression r given a name n is the regular expression r’ that describes all strings L(r) that expression r’ that describes all strings L(r) that contain at least one instance of n.contain at least one instance of n.

Page 61: The M ediation of                     I nformation using X ml         project

Converting s-DTDs to DTDs• First we obtain the images of all types of the s-First we obtain the images of all types of the s-DTDs.DTDs.• Then we merge all images that have the same Then we merge all images that have the same name.name.

Page 62: The M ediation of                     I nformation using X ml         project

Schema Inference Algorithm• RefinementRefinement

- Tightens individual types- Tightens individual types• SpecializationSpecialization

- uses the refinement algorithm and tightens the - uses the refinement algorithm and tightens the whole input document.whole input document.• Result List Type Inference.Result List Type Inference.

- Discovers the names and order of the types that - Discovers the names and order of the types that appear in the result.appear in the result.

Page 63: The M ediation of                     I nformation using X ml         project

Future Work

• Powerful Query LanguagesPowerful Query Languages

- group-by, nest, navigation using recursive paths - group-by, nest, navigation using recursive paths in the vertical and horizontal direction, check in the vertical and horizontal direction, check order, manipulate order.order, manipulate order.• More powerful/flexible schema descriptionsMore powerful/flexible schema descriptions

- XML-Data, DCDs, many academic proposals- XML-Data, DCDs, many academic proposals• Conditions for existence of tight/tightest DTDs.Conditions for existence of tight/tightest DTDs.• Other quality metrics for a view DTD.Other quality metrics for a view DTD.

Page 64: The M ediation of                     I nformation using X ml         project

The BBQ application introduction

BBQ stand for “ BBQ stand for “ BBlended lended BBrowsing rowsing and and QQuerying”uerying” - a graphical user - a graphical user interface for browsing and querying interface for browsing and querying XML data sources.XML data sources.

There are very few visual There are very few visual interfaces for querying and interfaces for querying and browsing semistructured data, browsing semistructured data, and fewer for XML. and fewer for XML.

Page 65: The M ediation of                     I nformation using X ml         project

introduction cont.

BBQ support query refinement by BBQ support query refinement by having query results be sources used having query results be sources used in subsequent queries. Users can in subsequent queries. Users can construct a query result document construct a query result document (essentially a virtual view) and that (essentially a virtual view) and that document becomes a first-class data document becomes a first-class data source within BBQ, meaning it can be source within BBQ, meaning it can be browsed, queried, or used to construct browsed, queried, or used to construct another query result document.another query result document.

Page 66: The M ediation of                     I nformation using X ml         project

introduction cont. This is quiet useful if the user does not know , in This is quiet useful if the user does not know , in

advance , what exactly he is looking for. advance , what exactly he is looking for. The interface allows users to quickly create The interface allows users to quickly create

complex queries without writing XMAS syntax by complex queries without writing XMAS syntax by hand.hand.

BBQ displays the structure of multiple BBQ displays the structure of multiple data sources using a paradigm that data sources using a paradigm that resembles drilling-down in Windows’ resembles drilling-down in Windows’ director structures.director structures.

Page 67: The M ediation of                     I nformation using X ml         project

DataSource

XML DataSource

MixMixMediatorMediator

Blended BrowsingBlended Browsingand Querying (BBQ)and Querying (BBQ)

interface

WrapperWrapper

ComputationalSource

Page 68: The M ediation of                     I nformation using X ml         project

The BBQ interface

BBQ ,which is XML driven, uses a set of BBQ ,which is XML driven, uses a set of DTDs exported by the MIX mediator. They DTDs exported by the MIX mediator. They will be referred from now on as base will be referred from now on as base DTDsDTDs

The BBQ interface consists of one main The BBQ interface consists of one main window and zero or more floating window and zero or more floating windows. The main window contains a of windows. The main window contains a of toolbar, a split pane, and a message toolbar, a split pane, and a message console, while the floating windows console, while the floating windows contain a toolbar and split pane only.contain a toolbar and split pane only.

Page 69: The M ediation of                     I nformation using X ml         project
Page 70: The M ediation of                     I nformation using X ml         project

From now on we will use the following DTDs which will represent the base DTDs .From now on we will use the following DTDs which will represent the base DTDs .

<!DOCTYPE CSEStudents [ <!DOCTYPE CSEStudents [ <!ELEMENT CSEStudents (CSEStudent)*> <!ELEMENT CSEStudents (CSEStudent)*> <!ELEMENT CSEStudent (name, advisor?, degree)> <!ELEMENT CSEStudent (name, advisor?, degree)> <!ELEMENT name (#PCDATA)> <!ELEMENT advisor <!ELEMENT name (#PCDATA)> <!ELEMENT advisor (#PCDATA)> <!ELEMENT degree (#PCDATA)> ]>(#PCDATA)> <!ELEMENT degree (#PCDATA)> ]>

Page 71: The M ediation of                     I nformation using X ml         project

<!DOCTYPE Interns [ <!DOCTYPE Interns [ <!ELEMENT Interns (Intern)* <!ELEMENT Interns (Intern)* > <!ELEMENT Intern > <!ELEMENT Intern (name, supervisor, sponsor) > (name, supervisor, sponsor) > <!ELEMENT name (#PCDATA) > <!ELEMENT name (#PCDATA) > <!ELEMENT supervisor <!ELEMENT supervisor (#PCDATA) > ]>(#PCDATA) > ]>

Page 72: The M ediation of                     I nformation using X ml         project

BBQ power :selecting and browsing XML source DTD and data

The DTDs are represented as trees in the The DTDs are represented as trees in the obvious hierarchical manner: an element obvious hierarchical manner: an element name is a parent node, and that name is a parent node, and that element’s sub-elements are its childrenelement’s sub-elements are its children

BBQ features special tree nodes to BBQ features special tree nodes to represent XML DTD's structural represent XML DTD's structural operators such as the operators such as the choice choice and the and the seqseq(uence).(uence).

Page 73: The M ediation of                     I nformation using X ml         project

These special tree nodes give the These special tree nodes give the user a more accurate view of the user a more accurate view of the DTD's structure than other DTD's structure than other semistructured-data semistructured-data viewing systems, and they also viewing systems, and they also facilitate more complex queries. facilitate more complex queries.

For example, a default order For example, a default order constraint is introduced, namely constraint is introduced, namely the one that corresponds to the the one that corresponds to the order in which elements are listed order in which elements are listed on the screen.on the screen.

Page 74: The M ediation of                     I nformation using X ml         project

XML data corresponding to given DTD are XML data corresponding to given DTD are represented as a directory tree.represented as a directory tree.

The XML data is materialized on demand from the The XML data is materialized on demand from the source.source.

The buttons labeled The buttons labeled next next and and previous previous in the XML panel retrieve the next and in the XML panel retrieve the next and previous previous n n instances, respectively.instances, respectively.

Page 75: The M ediation of                     I nformation using X ml         project
Page 76: The M ediation of                     I nformation using X ml         project

BBQ power cont. Creating XMAS Queries with BBQ

A A query session query session is the set of events is the set of events that occur while BBQ is connected that occur while BBQ is connected to the mediator. to the mediator.

Each query session consists of one Each query session consists of one or more or more query cyclesquery cycles. A query . A query cycle is the set of events that cycle is the set of events that starts with the user constructing a starts with the user constructing a query, and ends with the user query, and ends with the user browsing the query result. browsing the query result.

Page 77: The M ediation of                     I nformation using X ml         project

The basic BBQ The basic BBQ query cyclesquery cycles takes takes place in four steps :place in four steps :

First, constraints are set on the First, constraints are set on the data sources. data sources.

Second, a tree representing the Second, a tree representing the query result schema is created by query result schema is created by dragging and dropping elements. dragging and dropping elements.

Third, the XMAS query is Third, the XMAS query is generated and submitted to the generated and submitted to the mediator. mediator.

Fourth, a DTD is generated for the Fourth, a DTD is generated for the query result and the query result query result and the query result schema and data are displayed.schema and data are displayed.

Page 78: The M ediation of                     I nformation using X ml         project

First step: constraints set

Constraints can be set on the leaf Constraints can be set on the leaf nodes of the DTD tree or XML tree. nodes of the DTD tree or XML tree. Constraints cannot be set on Constraints cannot be set on nonleaf nodesnonleaf nodes

The operators are a basic set of The operators are a basic set of comparators (’=’,’<=’, ’>=’, ’<’, comparators (’=’,’<=’, ’>=’, ’<’, ’>’, ’substr’)’>’, ’substr’)

Page 79: The M ediation of                     I nformation using X ml         project

Example

The user right-clicks the The user right-clicks the degree degree element and selects "View/Edit element and selects "View/Edit Constraint...” from the Constraint...” from the popup menu. This action popup menu. This action brings up the "View/Edit Constraint" brings up the "View/Edit Constraint" dialog box, where “dialog box, where “==” is selected as the ” is selected as the operator, and “PhD” is typed in as the operator, and “PhD” is typed in as the operand. At this operand. At this point, the user clicks “OKpoint, the user clicks “OK

Page 80: The M ediation of                     I nformation using X ml         project
Page 81: The M ediation of                     I nformation using X ml         project

Joins can take place within a data source or across Joins can take place within a data source or across data sources. Creating a join in BBQ is as simple data sources. Creating a join in BBQ is as simple as selecting one leaf element, and dragging and as selecting one leaf element, and dragging and dropping it onto another leaf elementsdropping it onto another leaf elements

Suppose the user is interested in Suppose the user is interested in CSEStudents CSEStudents who are also interns, and whose advisor is also who are also interns, and whose advisor is also their supervisor.their supervisor.

Page 82: The M ediation of                     I nformation using X ml         project
Page 83: The M ediation of                     I nformation using X ml         project

Second : construct the head construct a tree that the answer construct a tree that the answer

document(s) must conform to, called document(s) must conform to, called the head or the head or query result treequery result tree. The right . The right panel of BBQ’s main window is where panel of BBQ’s main window is where the head is built.the head is built.

The head is composed of elements The head is composed of elements (and their sub-trees) dragged from (and their sub-trees) dragged from source DTDs, and tags created on the source DTDs, and tags created on the spot with the “Create New Child” spot with the “Create New Child” popup menu item.popup menu item.

Ordering and group - by operators are Ordering and group - by operators are also used in the creation of the head.also used in the creation of the head.

Page 84: The M ediation of                     I nformation using X ml         project
Page 85: The M ediation of                     I nformation using X ml         project

Third and forth steps:

BBQ converts the visual layout into BBQ converts the visual layout into XMAS query language, contacts XMAS query language, contacts the MIX mediator and submits the the MIX mediator and submits the query.query.

Finally, BBQ generates a DTD for the Finally, BBQ generates a DTD for the query result and it is displayed with the query result and it is displayed with the corresponding data corresponding data

Page 86: The M ediation of                     I nformation using X ml         project

Mix mediator

BBQInterface

OODBDatabase

Xml result ,DTD Query in xmas

wrapperwrapper

Page 87: The M ediation of                     I nformation using X ml         project

Important things to remember about the BBQ Enable the query creator to construct Enable the query creator to construct

queries in an easy and graphical-oriented queries in an easy and graphical-oriented way.way.

Graphically support all the features of the Graphically support all the features of the XMAS query language.XMAS query language.

Supports blended browsing and queryingSupports blended browsing and querying accurate representation of DTDs and XML accurate representation of DTDs and XML

data.data.

Page 88: The M ediation of                     I nformation using X ml         project

Allows graphical represantion for the query result also.Allows graphical represantion for the query result also. DTD for the result XML page of the given query is DTD for the result XML page of the given query is

created by the DTD -inference mechanism.created by the DTD -inference mechanism. Because of that ,we may treat the query result as any Because of that ,we may treat the query result as any

other XML source we use.( so we may use this result as other XML source we use.( so we may use this result as one of the sources used to build new queries.one of the sources used to build new queries.

Page 89: The M ediation of                     I nformation using X ml         project

These is usually the case when we want to These is usually the case when we want to get some information from the internet. We get some information from the internet. We don’t know exactly what we are looking for don’t know exactly what we are looking for , and the results of the first queries aim us , and the results of the first queries aim us towards the goal of our search.towards the goal of our search.

Mix mediator

Page 90: The M ediation of                     I nformation using X ml         project

Selected biblography Enhancing Semistructured Data Mediators with Enhancing Semistructured Data Mediators with

Document Type Denitions by Document Type Denitions by Yannis Papakonstantinou, Pavel VelikhovYannis Papakonstantinou, Pavel Velikhov

BBQ: A Visual Interface for Integrated BBQ: A Visual Interface for Integrated Browsing and Querying of XML Browsing and Querying of XML

Kevin D. Munroe, Yannis Kevin D. Munroe, Yannis PapakonstantinouPapakonstantinou

XML-Based Information Mediation with MIX XML-Based Information Mediation with MIX Chaitanya Baru Amarnath Gupta Bertram Ludascher�Chaitanya Baru Amarnath Gupta Bertram Ludascher�

Introduction to XMAS by Introduction to XMAS by the XMAS sub-group of MIXthe XMAS sub-group of MIX