managing xml and semistructured data lecture 17: publishing xml data from relations prof. dan suciu...
Post on 22-Dec-2015
216 views
TRANSCRIPT
Managing XML and Semistructured Data
Lecture 17: Publishing XML Data From Relations
Prof. Dan Suciu
Spring 2001
In this lecture• XML Publishing Example
• XML Publishing Languages
• Virtual XML Publishing
• Materialized XML Publishing (next time)
Resources• SilkRoute: Trading between relations and XML by Fernandez, Suciu,
Tan R, in WWW9, 2000
• Efficient Evaluation of XML Middle-ware Queries in SIGMOD'2001
XML Publishing
Today:• Legacy data
– fragmented into many flat relations– 3rd normal form– proprietary
• XML data– nested– un-normalized– public (450 schemas at www.biztalk.org)
XML Publishing: an Example
Eu-Stores US-Stores
Products
Eu-Sales US-Sales
name country name url
date
date tax
name priceUSD
euSid usSid
pid
Legacy data in E/R:
XML Publishing: an Example• XML view
<allsales> <country> <name> France </name> <store> <name> Nicolas </name> <product> <name> Blanc de Blanc </name> <sold> 10/10/2000 </sold> <sold> 12/10/2000 </sold> … </product> <product>…</product>… </store>…. </country> …</allsales>
• In summary: group by country store product
allsales
country
name store
name product
name sold
date tax
url
PCDATA
PCDATA
PCDATA
PCDATA PCDATA
PCDATA
*
*
*
*
?
?
Output “schema”:
XML Publishing
Need a language for specifying the Relational XML mapping
• SilkRoute:– a SQL/XML-QL blend
• IBM (formerly Experanto project)– extension of SQL
• SQL Server:– “FOR XML” – and extension of SQL– XDR’s
{ FROM EuStores $S, EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country($S.country)> <name> $S.country </name> <store($S.euSid)> <name> $S.name </name> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> </product> </store> </country> <allsales>} /* union….. */
{ FROM EuStores $S, EuSales $L, Products $P WHERE $S.euSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country($S.country)> <name> $S.country </name> <store($S.euSid)> <name> $S.name </name> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> </product> </store> </country> <allsales>} /* union….. */
XML Publishing: SilkRoute
In SilkRoute [Fernandez, Suciu, Tan ’00]
XML Publishing : SilkRoute …. /* union */{ FROM USStores $S, EuSales $L, Products $P WHERE $S.usSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country(“USA”)> <name> USA </name> <store($S.euSid)> <name> $S.name </name> <url> $S.url </url> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> <tax> $L.tax </tax> </product> </store> </country> <allsales>}
…. /* union */{ FROM USStores $S, EuSales $L, Products $P WHERE $S.usSid = $L.euSid AND $L.pid = $P.pid CONSTRUCT <allsales()> <country(“USA”)> <name> USA </name> <store($S.euSid)> <name> $S.name </name> <url> $S.url </url> <product($P.pid)> <name> $P.name </name> <price> $P.priceUSD </price> <tax> $L.tax </tax> </product> </store> </country> <allsales>}
Non-recursive datalog(SELECT DISTINCT … )allsales()
country(c)
name(c) store(c,x)
name(n) product(c,x,y)
name(n) sold(c,x,y,d)
date(c,x,y,d) Tax(c,x,y,d,t)
url(c,x,u)
c
n
n
d t
u
Internal Representation
country(c) :-EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_)
country(“USA”) :-
store(c,x) :- EuStores(x,_,c), EuSales(x,y,_), Products(y,_,_)
store(c,x) :- USStores(x,_,_), USSales(x,y,_), Products(y,_,_), c=“USA”
url(c,x,u):-USStores(x,_,u), USSales(x,y,_),Products(y,_,_)
allsales():-
*
*
*
*
?
View Tree:
XML Publishing : IBM
• XPERANTO: Publishing Object-Relational Data as XML, Carey, Florescu, Ives, Lu, Shanmugasundaram, Shekita, Subramanian, WebDB’2000
• Efficiently Publishing Relational Data as XML Documents, Shanmugasundaram, Shekita, Barr, Carey, Lindsay, Pirahesh, Reinwald, VLDB’2000
XML Publishing : IBM(Select S.name, STORE(S.euSid, S.name, (Select XMLAGG(PRODUCT(P.pid, P.name, P.priceUSD)) From EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)) From EuStores S) Union All . . . . .
(Select S.name, STORE(S.euSid, S.name, (Select XMLAGG(PRODUCT(P.pid, P.name, P.priceUSD)) From EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)) From EuStores S) Union All . . . . .
Define XML Constructor STORE(storeID: integer, name: varchar(20), prodList: xml) AS { <store id=$storeID> <name> $name </name> $prodList </store>}
Define XML Constructor STORE(storeID: integer, name: varchar(20), prodList: xml) AS { <store id=$storeID> <name> $name </name> $prodList </store>}
Define XML Constructor PRODUCT( ...) AS { . . .}
Define XML Constructor PRODUCT( ...) AS { . . .}
SQL +User definedfunctions
XML Publishing : SQL Server
Three modes
• RAW mode
• Auto Mode
• Explicit Mode
XML Publishing : SQL Server, RAW Mode
Select S.euSid, L.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid != L.euSid AND L.pid = P.pidFor XML Raw
Select S.euSid, L.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid != L.euSid AND L.pid = P.pidFor XML Raw
<row euSid = “SLKDJFS”, name = “Saint Emilion”, price=“23.99”/><row euSid = “DRJLKSD”, name = “Loire”, price=“12.99”/>. . . .
<row euSid = “SLKDJFS”, name = “Saint Emilion”, price=“23.99”/><row euSid = “DRJLKSD”, name = “Loire”, price=“12.99”/>. . . .
• flat XML• default tag and attribute names
XML Publishing : SQL ServerAuto Mode
Select S.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid = L.euSid AND L.pid = P.pidFor XML Auto
Select S.euSid, P.name, P.priceFrom Stores S, EuSales L, Products PWhere S.euSid = L.euSid AND L.pid = P.pidFor XML Auto
<Stores euSid = “SLKDJFS”> <Products name = “Saint Emilion”, price=“23.99”/> <Products name = “Loire”, price=“12.99”/></Stores><Stores euSid = “FGJISOD”> . . . .</Stores>. . .
<Stores euSid = “SLKDJFS”> <Products name = “Saint Emilion”, price=“23.99”/> <Products name = “Loire”, price=“12.99”/></Stores><Stores euSid = “FGJISOD”> . . . .</Stores>. . .
• nested XML• default tag and attribute names
XML Publishing : SQL ServerExplicit Mode
• Nested XML
• User defined tags and attributes
• Idea: write SQL queries with complex column names
• Ad-hoc, order dependent semantics
XML Publishing : SQL ServerExplicit Mode
(Select 1 as Tag, null as Parent, S.euSid as [Store!1!id], S.name as [Store!1!name!element], null as [Product!2!name!element], null as [Product!2!price!element] From Stores S)
Union All
(Select 2 as Tag, 1 as Parent, S.euSid as [Store!1!id], null as [Store!1!name!element], P.name as [Product!2!name!element], P.price as [Product!2!name!element] From Stores S, EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)
Order By [Store!1!id]
(Select 1 as Tag, null as Parent, S.euSid as [Store!1!id], S.name as [Store!1!name!element], null as [Product!2!name!element], null as [Product!2!price!element] From Stores S)
Union All
(Select 2 as Tag, 1 as Parent, S.euSid as [Store!1!id], null as [Store!1!name!element], P.name as [Product!2!name!element], P.price as [Product!2!name!element] From Stores S, EuSales L, Products P Where S.euSid = L.euSid AND L.pid = P.pid)
Order By [Store!1!id]
XML Publishing : SQL ServerExplicit Mode
• All column names are legal SQL names– Special form: [tagname!k!something],– Or [tagname!k!something!element]– Or other variations...
• Hence everything is legal SQL
• But what does it mean ?• Construct the universal table first• Then process the table sequentially
XML Publishing : SQL ServerExplicit Mode
Tag Parent Store!1!id Store!1!name!element Product!2!name!element Product!2!price!element
1 ABCDE Nicolas
2 1 ABCDE Saint Emilion 23.99
2 1 ABCDE Loire 12.99
1 FDKLS FNAC
2 1 FDKLS Databases 49.99
. . . . . . . . .
Universal table:
XML Publishing : SQL ServerExplicit Mode
Converting universal table to XML:
• scan each row sequentially• let Tag=k
– look up only columns with that tag– all are called [tagname!k!something] with the same tagname
• What happens if one has a different tagname ?– Create an element called tagname
• Output <tagname>– Columns become its children
• Either subelements or attributes• if Parent is specified, last element with that tag is the parent• otherwise it is a root element
XML Publishing : SQL ServerExplicit Mode
<Store id=“ABCDE”> <name> Nicolas </name> <Product> <name> Saint Emilion </name> <price> 23.99 </price> </Product> <Product> <name> Loire </name> <price> 12.99 </price> </Price></Store><Store id=“FDKLS”> <name> FNAC </name> . . .</Store>
<Store id=“ABCDE”> <name> Nicolas </name> <Product> <name> Saint Emilion </name> <price> 23.99 </price> </Product> <Product> <name> Loire </name> <price> 12.99 </price> </Price></Store><Store id=“FDKLS”> <name> FNAC </name> . . .</Store>
XML Publishing : SQL ServerExplicit Mode
• Seems complex, but also powerful
• Can construct arbitrarily deeply nested hierarchies– How ?
• However, they are very, very limited– Why ?