Page 1: Data Exchange with  Data-Metadata Translations

Data Exchange with Data-Metadata Translations

Data Exchange with Data-Metadata Translations

Mauricio A. Mauricio A. HernándezHernández

IBMIBMAlmaden ResearchAlmaden ResearchCenterCenter

Wang-ChiewWang-ChiewTan Tan     UC  Santa CruzUC  Santa Cruz

Paolo Paolo PapottiPapotti   UniversitàUniversitàRoma TreRoma Tre


August 24 -- Auckland, New Zealand

Page 2: Data Exchange with  Data-Metadata Translations


• Data exchange scenarios may involve metadata transformations.

– E.g., Pivot/Unpivot in spreadsheets.

[example from Miller98]

Data-Metadata TranslationsData-Metadata Translations

• Mapping systems support Data-to-Data transformations with fixed schemas.

• Goal: Extend mapping systems to support Data-Metadata Translations.

Page 3: Data Exchange with  Data-Metadata Translations


Source schema S

Source schema S

Target schema T

Target schema T

Declarative (internal) representationDeclarative (internal) representation


Executable code (XSLT, XQuery, Java)Executable code (XSLT, XQuery, Java)


IBM Clio



Altova MapForce


BEA Aqualogic

Data exchange

Mapping SystemsMapping Systems

Page 4: Data Exchange with  Data-Metadata Translations


1. Data and Metadata translations



Data-to-MetadataData-to-Metadata Graphic Design

2. Generation Algorithms

Mapping GenerationMapping Generation

Query GenerationQuery Generation

Graphic Design

3. Results & Discussion

ExperimentsExperimentsRelated WorkRelated Work


Page 5: Data Exchange with  Data-Metadata Translations

• Mapping Generation Algorithm: [PVMHF 2002]

– Input: Source and Target schemas, and correspondences.

– Output: declarative schema mapping

• For example:


Source: Rcd Sales: SetOf Rcd country region style shipdate units price

Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id

for $s in Source.Salesexists $t in Target.CountrySales, $c in $t.Saleswhere $ = $ and $ = $ and $c.shipdate = $s.shipdate and $c.units = $s.units

Page 6: Data Exchange with  Data-Metadata Translations


• Query Generation into multiple query languages:– Input: a data to data schema mapping– Output: a query script (XQuery, XSLT, SQL, etc.)

for $s in Source.Salesexists $t in Target.CountrySales, $c in $t.Saleswhere $ = $ and $ = $ and $c.shipdate = $s.shipdate and $c.units = $s.units

for $s in Source.Salesexists $t in Target.CountrySales, $c in $t.Saleswhere $ = $ and $ = $ and $c.shipdate = $s.shipdate and $c.units = $s.units

for $x0 in $doc/Source/Sales return ( <CountrySales>

<country> { $x0/country/text() } </country> …

for $x0 in $doc/Source/Sales return ( <CountrySales>

<country> { $x0/country/text() } </country> …

Page 7: Data Exchange with  Data-Metadata Translations


Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56

Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56

m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “USA” and $t.units = $s.USA

““State-of-the-art” Metadata-to-DataState-of-the-art” Metadata-to-Data

Source: Rcd Sales: SetOf Rcd month USA UK Italy

Target: Rcd Sales: SetOf Rcd month country units

How can we transform the following source data into the corresponding target?

Schema mapping m1


Page 8: Data Exchange with  Data-Metadata Translations


Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56

Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56

m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “USA” and $t.units = $s.USA

m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “UK” and $t.units = $s.UK

““State-of-the-art” Metadata-to-DataState-of-the-art” Metadata-to-Data

Source: Rcd Sales: SetOf Rcd month USA UK Italy

Target: Rcd Sales: SetOf Rcd month country units

How can we transform the following source data into the corresponding target?

Schema mapping m2


Page 9: Data Exchange with  Data-Metadata Translations


Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56

Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56

m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “USA” and $t.units = $s.USA

m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “UK” and $t.units = $s.UK

m3: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $ = “Italy” and $t.units = $s.Italy

““State-of-the-art” Metadata-to-DataState-of-the-art” Metadata-to-Data

Source: Rcd Sales: SetOf Rcd month USA UK Italy

Target: Rcd Sales: SetOf Rcd month country units

How can we transform the following source data into the corresponding target?

Schema mapping m3


Page 10: Data Exchange with  Data-Metadata Translations


Source: Rcd Sales: SetOf Rcd month USA UK Italy

Target: Rcd Sales: SetOf Rcd month country units

countries label value

Select the elements to group

Placeholder Copy elements’


Copy elements’ labels

Source.Sales Jan 120 223 89 Feb 83 168 56

Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56

Set of labels (strings)

Dynamic selection of the source


Is a label value

for $s in Source.Sales, $c in {“USA”, “UK”, “Italy”}{“USA”, “UK”, “Italy”}exists $t in Target.Saleswhere $t.month = $s.month and $ = $c and $t.units = $s.($c)

MetadatA-Data (MAD) mapping:

Metadata-to-Data: Our solutionMetadata-to-Data: Our solution

Page 11: Data Exchange with  Data-Metadata Translations


Target: Rcd Stockquotes: SetOf Rcd time symbols label value

Source: Rcd StockTicker: SetOf Rcd time symbol price Dynamic


Now we want to support the opposite operation [example from Miller98]

The target schema depends on the source data

We define a target template: Nested Dynamic Output Schemas (ndos)

Run-time: The dynamic element defines the target instance and the target schema.


Page 12: Data Exchange with  Data-Metadata Translations

StockTicker (time: 0900, Symbol : MSFT, Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM, Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT, Price: 27.30 )

There are two possible interpretations for the target ndos:

Consider this mapping and this source instance:

Stockquotes (time: 0900, MSFT: 27.20 ) Stockquotes (time: 0900, IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30 )

Target: Rcd Stockquotes: SetOf Rcd time symbols: Choice MSFT IBM

Computed Target Instance

Source Instance

First alternative: Heterogeneous target records

Computed Target Schema

Data-to-Metadata: Heterogeneous recordsData-to-Metadata: Heterogeneous records

Target: Target: RcdRcd Stockquotes: Stockquotes: SetOf RcdSetOf Rcd timetime symbolssymbols labellabel valuevalue

Source: Source: RcdRcd StockTickerStockTicker: : SetOf RcdSetOf Rcd timetime symbolsymbol priceprice

Page 13: Data Exchange with  Data-Metadata Translations

Target: Target: RcdRcd Stockquotes: Stockquotes: SetOf RcdSetOf Rcd timetime symbolssymbols labellabel valuevalue

Source: Source: RcdRcd StockTickerStockTicker: : SetOf RcdSetOf Rcd timetime symbolsymbol priceprice

StockTicker (time: 0900, Symbol : MSFT Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT Price: 27.30 )

There are two possible interpretations for the target ndos:

Data-to-Metadata: Homogenous recordsData-to-Metadata: Homogenous records

Consider this mapping and this source instance:

Computed Target Instance

Source Instance

Computed Target SchemaTarget: Rcd Stockquotes: SetOf Rcd time MSFT IBM

Stockquotes (time: 0900, MSFT: 27.20, IBM: null ) Stockquotes (time: 0900, MSFT: null , IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30, IBM: null )

Second alternative: Homogeneous target records

Page 14: Data Exchange with  Data-Metadata Translations


Natural solution for the Relational data model

Stockquotes(time: 0900, MSFT : 27.20, IBM: null ) Stockquotes(time: 0900, MSFT : null , IBM: 120.00) Stockquotes(time: 0905, MSFT : 27.30, IBM: null )

Homogeneity Constraint:“For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2”

for $t1 in Target.Stockquotes, $t2 in Target.Stockquotes, $a in dom ($t1)exists $a’ in dom ($t2)where $a = $a’

Stockquotes(time: 0900, MSFT : 27.20 ) Stockquotes(time: 0900, IBM : 120.00 ) Stockquotes(time: 0905, MSFT : 27.30 )

Natural solution for semi-structured data models (XSD, DTD, JSON)

Data-to-Metadata: Homogenous recordsData-to-Metadata: Homogenous records

Target: Target: RcdRcd Stockquotes: Stockquotes: SetOf RcdSetOf Rcd timetime symbolssymbols labellabel valuevalue

Source: Source: RcdRcd StockTickerStockTicker: : SetOf RcdSetOf Rcd timetime symbolsymbol priceprice

Page 15: Data Exchange with  Data-Metadata Translations


Source.Salescountry region style shipdate units price USA East Tee 12-07 11 1200 USA East Elec. 12-07 12 3600 USA West Tee 01-08 10 1600 UK West Tee 02-08 12 2000

MAD Mapping GenerationMAD Mapping Generation

Target: Target: RcdRcd ByShipdateCountryByShipdateCountry: : SetOf ChoiceSetOf Choice datesdates labellabel1 1

valuevalue1 1 : : RcdRcd countriescountries labellabel22 valuevalue2 2 : : SetOfSetOf RcdRcd stylestyle unitsunits price price

Source: Source: RcdRcd SalesSales: : SetOf RcdSetOf Rcd countrycountry regionregion stylestyle shipdateshipdate unitsunits priceprice <ByShipDateCountry>

<12-07> <USA> <style>Tee</style><units>11</units><price>1200</price> </USA><USA> <style>Elec.</style><units>12</units><price>3600</price> </USA> </12-07> <01-08> <USA> <style>Tee</style><units>10</units><price>1600</price> </USA> </01-08> <02-08> <UK> <style>Tee</style><units>12</units><price>2000</price> </UK> </02-08></ByShipDataCountry>

<ByShipDateCountry> <12-07> <USA> <style>Tee</style><units>11</units><price>1200</price> </USA><USA> <style>Elec.</style><units>12</units><price>3600</price> </USA> </12-07> <01-08> <USA> <style>Tee</style><units>10</units><price>1600</price> </USA> </01-08> <02-08> <UK> <style>Tee</style><units>12</units><price>2000</price> </UK> </02-08></ByShipDataCountry>

Page 16: Data Exchange with  Data-Metadata Translations


for $s in Source.Salesexists $t in Target.ByShipdateCountry, $y in dates, $u in case $t of $y, $z in countries, $v in $u.($z) where $y = $s.shipdate and $z= $ and $ = $ and $v.units = $s.units and $v.price = $s.price and $u.($z) = SK[$s.shipdate,$]

for $s in Source.Salesexists $t in Target.ByShipdateCountry, $y in dates, $u in case $t of $y, $z in countries, $v in $u.($z) where $y = $s.shipdate and $z= $ and $ = $ and $v.units = $s.units and $v.price = $s.price and $u.($z) = SK[$s.shipdate,$]

for $s in Source.Salesexists $t in Target.ByShipdateCountry, $u in case $t of $s.shipdate, $v in $u.($ where $ = $ and $v.units = $s.units and $v.price = $s.price and $u.($ = SK[$s.shipdate,$]

for $s in Source.Salesexists $t in Target.ByShipdateCountry, $u in case $t of $s.shipdate, $v in $u.($ where $ = $ and $v.units = $s.units and $v.price = $s.price and $u.($ = SK[$s.shipdate,$]

MAD Mapping GenerationMAD Mapping Generation

Target: Target: RcdRcd ByShipdateCountryByShipdateCountry: : SetOf ChoiceSetOf Choice datesdates labellabel1 1

valuevalue1 1 : : RcdRcd countriescountries labellabel22 valuevalue2 2 : : SetOfSetOf RcdRcd stylestyle unitsunits price price

Source: Source: RcdRcd SalesSales: : SetOf RcdSetOf Rcd countrycountry regionregion stylestyle shipdateshipdate unitsunits priceprice

This is what we get from Clio [PVMHF 02]

Three Steps:

1. Modify schemas with dynamic placeholders

2. Compile mappings

3. Simplify mapping

Page 17: Data Exchange with  Data-Metadata Translations



S S1

Phase 1: Q1 shreds the source instance I over relational views of the target schema


[PVMHF [PVMHF 02]02]

Query Generation: two-phase algorithmQuery Generation: two-phase algorithm


r r r r

Phase 2: Q2 assembles the target instance J from the relational views



TT1 T2


Page 18: Data Exchange with  Data-Metadata Translations


S S1


Phase 1: Q1 shreds the source instance I over relational views of the target ndos


New Query GenerationNew Query Generation




Phase 2: Q2 assembles the target instance J from the relational views

Q3 computes the target schema T

Q4 is the optional post - processing




TT1 T2


ndosT1 T2


r r r r

Page 19: Data Exchange with  Data-Metadata Translations


Commercial Tool

MAD Clio vs. Commercial ToolsMAD Clio vs. Commercial Tools







0 100 200 300 400 500 600

Number of distinct labels


ry e




[s] Naive query

Page 20: Data Exchange with  Data-Metadata Translations

MAD Clio vs. Commercial ToolsMAD Clio vs. Commercial Tools







0 100 200 300 400 500 600

Number of distinct labels


ry e





Naive queryDynamic queryStatic query

48 source labels (10 MB): naïve 183 s, dynamic 14 s, optimized 10 s

Optimized query

MAD Clio

Page 21: Data Exchange with  Data-Metadata Translations


12 target labels (10 MB): naïve 590 s, optimized 80 s [1 phase: 3 s]

MAD Clio Performance

Page 22: Data Exchange with  Data-Metadata Translations


• Lots of related work in the relational setting:– FIRA/FISQL [Wyss,Robertson 2005] has an excellent survey.– SchemaSQL [Lakshmanan,Sadri,Subramanian 1996],

FIRA/FISQL [Wyss,Robertson 2005] • Extensions to SQL to handle metadata as data

• Only relational dynamic output schemas

• Language and semantics, NO transformations from GUI

• In XML settings– HepTox [BCHLP 2005], commercial mapping tools [Altova

MapForce, MS, StylusStudio, BEA (Oracle) Aqualogic]• No dynamic elements in the target

Some Related Work

Page 23: Data Exchange with  Data-Metadata Translations


Source schema S

Target schema T

Declarative (internal) representation


Executable code (XSLTXSLT, XQuery, JavaJava)

New construct to iterate over elements’ labels: placeholder

Target schema can be incomplete: nested dynamic output schema (ndos)

New constructs for the mapping language

New mapping & query generation algorithms

Including a query to generate the target schema.

Data exchange with data-metadata support: Data to Data is a special case

MAD ClioMAD Clio

Page 24: Data Exchange with  Data-Metadata Translations


Thank you.Thank you.


Data Exchange with Data-Metadata Translations

Data Exchange with Data-Metadata Translations

Page 25: Data Exchange with  Data-Metadata Translations


...<properties name=“price” lang=“en-us”

date=“01-01-2008” ... > <pval>48.15</pval></properties> ...

...<price value=“48.15” lang=“en-us” date=“01-01-2008” ... /> ...

for $x1 in, $x2 in { @lang, @date, …, @format }exists $y1 in Target.($x1.@name),where $y1.@value = $x1.pval and $y1.($x2) = $x1.($x2)

Source: Rcd properties: SetOf Rcd @name @lang @date … @format pval

<<attrs>> label value

Target: Rcd label1 value1: SetOf Rcd @value label2 value2



Metadata to Metadata: placeholder to dynamic element


Top Related