managing xml and semistructured data lecture 17: publishing xml data from relations prof. dan suciu...

22
Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

Managing XML and Semistructured Data

Lecture 17: Publishing XML Data From Relations

Prof. Dan Suciu

Spring 2001

Page 2: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

In this lecture• XML Publishing Example

• XML Publishing Languages

• Virtual XML Publishing

• Materialized XML Publishing (next time)

Resources• SilkRoute: Trading between relations and XML by Fernandez, Suciu,

Tan R, in WWW9, 2000

• Efficient Evaluation of XML Middle-ware Queries in SIGMOD'2001

Page 3: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing

Today:• Legacy data

– fragmented into many flat relations– 3rd normal form– proprietary

• XML data– nested– un-normalized– public (450 schemas at www.biztalk.org)

Page 4: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing: an Example

Eu-Stores US-Stores

Products

Eu-Sales US-Sales

name country name url

date

date tax

name priceUSD

euSid usSid

pid

Legacy data in E/R:

Page 5: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing: an Example• XML view

<allsales> <country> <name> France </name> <store> <name> Nicolas </name> <product> <name> Blanc de Blanc </name> <sold> 10/10/2000 </sold> <sold> 12/10/2000 </sold> … </product> <product>…</product>… </store>…. </country> …</allsales>

• In summary: group by country store product

Page 6: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

allsales

country

name store

name product

name sold

date tax

url

PCDATA

PCDATA

PCDATA

PCDATA PCDATA

PCDATA

*

*

*

*

?

?

Output “schema”:

Page 7: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing

Need a language for specifying the Relational XML mapping

• SilkRoute:– a SQL/XML-QL blend

• IBM (formerly Experanto project)– extension of SQL

• SQL Server:– “FOR XML” – and extension of SQL– XDR’s

Page 8: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

{ FROM EuStores $S, EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country($S.country)> <name> $S.country </name> <store($S.euSid)> <name> $S.name </name> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> </product> </store> </country> <allsales>} /* union….. */

{ FROM EuStores $S, EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country($S.country)> <name> $S.country </name> <store($S.euSid)> <name> $S.name </name> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> </product> </store> </country> <allsales>} /* union….. */

XML Publishing: SilkRoute

In SilkRoute [Fernandez, Suciu, Tan ’00]

Page 9: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SilkRoute …. /* union */{ FROM USStores $S, EuSales $L, Products $P WHERE $S.usSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country(“USA”)> <name> USA </name> <store($S.euSid)> <name> $S.name </name> <url> $S.url </url> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> <tax> $L.tax </tax> </product> </store> </country> <allsales>}

…. /* union */{ FROM USStores $S, EuSales $L, Products $P WHERE $S.usSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country(“USA”)> <name> USA </name> <store($S.euSid)> <name> $S.name </name> <url> $S.url </url> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> <tax> $L.tax </tax> </product> </store> </country> <allsales>}

Page 10: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

Non-recursive datalog(SELECT DISTINCT … )allsales()

country(c)

name(c) store(c,x)

name(n) product(c,x,y)

name(n) sold(c,x,y,d)

date(c,x,y,d) Tax(c,x,y,d,t)

url(c,x,u)

c

n

n

d t

u

Internal Representation

country(c) :-EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_)

country(“USA”) :-

store(c,x) :- EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_)

store(c,x) :- USStores(x,_,_), USSales(x,y,_), Products(y,_,_), c=“USA”

url(c,x,u):-USStores(x,_,u), USSales(x,y,_),Products(y,_,_)

allsales():-

*

*

*

*

?

View Tree:

Page 11: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : IBM

• XPERANTO: Publishing Object-Relational Data as XML, Carey, Florescu, Ives, Lu, Shanmugasundaram, Shekita, Subramanian, WebDB’2000

• Efficiently Publishing Relational Data as XML Documents, Shanmugasundaram, Shekita, Barr, Carey, Lindsay, Pirahesh, Reinwald, VLDB’2000

Page 12: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : IBM(Select S.name, STORE(S.euSid, S.name, (Select XMLAGG(PRODUCT(P.pid, P.name, P.priceUSD)) From EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)) From EuStores S) Union All . . . . .

(Select S.name, STORE(S.euSid, S.name, (Select XMLAGG(PRODUCT(P.pid, P.name, P.priceUSD)) From EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)) From EuStores S) Union All . . . . .

Define XML Constructor STORE(storeID: integer, name: varchar(20), prodList: xml) AS { <store id=$storeID> <name> $name </name> $prodList </store>}

Define XML Constructor STORE(storeID: integer, name: varchar(20), prodList: xml) AS { <store id=$storeID> <name> $name </name> $prodList </store>}

Define XML Constructor PRODUCT( ...) AS { . . .}

Define XML Constructor PRODUCT( ...) AS { . . .}

SQL +User definedfunctions

Page 13: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL Server

Three modes

• RAW mode

• Auto Mode

• Explicit Mode

Page 14: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL Server, RAW Mode

Select S.euSid, L.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid != L.euSid AND L.pid = P.pidFor XML Raw

Select S.euSid, L.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid != L.euSid AND L.pid = P.pidFor XML Raw

<row euSid = “SLKDJFS”, name = “Saint Emilion”, price=“23.99”/><row euSid = “DRJLKSD”, name = “Loire”, price=“12.99”/>. . . .

<row euSid = “SLKDJFS”, name = “Saint Emilion”, price=“23.99”/><row euSid = “DRJLKSD”, name = “Loire”, price=“12.99”/>. . . .

• flat XML• default tag and attribute names

Page 15: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerAuto Mode

Select S.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid = L.euSid AND L.pid = P.pidFor XML Auto

Select S.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid = L.euSid AND L.pid = P.pidFor XML Auto

<Stores euSid = “SLKDJFS”> <Products name = “Saint Emilion”, price=“23.99”/> <Products name = “Loire”, price=“12.99”/></Stores><Stores euSid = “FGJISOD”> . . . .</Stores>. . .

<Stores euSid = “SLKDJFS”> <Products name = “Saint Emilion”, price=“23.99”/> <Products name = “Loire”, price=“12.99”/></Stores><Stores euSid = “FGJISOD”> . . . .</Stores>. . .

• nested XML• default tag and attribute names

Page 16: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerExplicit Mode

• Nested XML

• User defined tags and attributes

• Idea: write SQL queries with complex column names

• Ad-hoc, order dependent semantics

Page 17: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerExplicit Mode

(Select 1 as Tag, null as Parent, S.euSid as [Store!1!id], S.name as [Store!1!name!element], null as [Product!2!name!element], null as [Product!2!price!element] From Stores S)

Union All

(Select 2 as Tag, 1 as Parent, S.euSid as [Store!1!id], null as [Store!1!name!element], P.name as [Product!2!name!element], P.price as [Product!2!name!element] From Stores S, EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)

Order By [Store!1!id]

(Select 1 as Tag, null as Parent, S.euSid as [Store!1!id], S.name as [Store!1!name!element], null as [Product!2!name!element], null as [Product!2!price!element] From Stores S)

Union All

(Select 2 as Tag, 1 as Parent, S.euSid as [Store!1!id], null as [Store!1!name!element], P.name as [Product!2!name!element], P.price as [Product!2!name!element] From Stores S, EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)

Order By [Store!1!id]

Page 18: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerExplicit Mode

• All column names are legal SQL names– Special form: [tagname!k!something],– Or [tagname!k!something!element]– Or other variations...

• Hence everything is legal SQL

• But what does it mean ?• Construct the universal table first• Then process the table sequentially

Page 19: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerExplicit Mode

Tag Parent Store!1!id Store!1!name!element Product!2!name!element Product!2!price!element

1 ABCDE Nicolas

2 1 ABCDE Saint Emilion 23.99

2 1 ABCDE Loire 12.99

1 FDKLS FNAC

2 1 FDKLS Databases 49.99

. . . . . . . . .

Universal table:

Page 20: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerExplicit Mode

Converting universal table to XML:

• scan each row sequentially• let Tag=k

– look up only columns with that tag– all are called [tagname!k!something] with the same tagname

• What happens if one has a different tagname ?– Create an element called tagname

• Output <tagname>– Columns become its children

• Either subelements or attributes• if Parent is specified, last element with that tag is the parent• otherwise it is a root element

Page 21: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerExplicit Mode

<Store id=“ABCDE”> <name> Nicolas </name> <Product> <name> Saint Emilion </name> <price> 23.99 </price> </Product> <Product> <name> Loire </name> <price> 12.99 </price> </Price></Store><Store id=“FDKLS”> <name> FNAC </name> . . .</Store>

<Store id=“ABCDE”> <name> Nicolas </name> <Product> <name> Saint Emilion </name> <price> 23.99 </price> </Product> <Product> <name> Loire </name> <price> 12.99 </price> </Price></Store><Store id=“FDKLS”> <name> FNAC </name> . . .</Store>

Page 22: Managing XML and Semistructured Data Lecture 17: Publishing XML Data From Relations Prof. Dan Suciu Spring 2001

XML Publishing : SQL ServerExplicit Mode

• Seems complex, but also powerful

• Can construct arbitrarily deeply nested hierarchies– How ?

• However, they are very, very limited– Why ?