data exchange with data-metadata translations
DESCRIPTION
Data Exchange with Data-Metadata Translations. MAD Algorithm. Paolo Papotti. Mauricio A. Hernández. Wang-Chiew Tan. Data Exchange. “ Scientia potentia est ” What is Data Exchange?: - PowerPoint PPT PresentationTRANSCRIPT
Data Exchange with Data-Metadata Translations
Data Exchange with Data-Metadata Translations
MAD Algorithm
Paolo Paolo PapottiPapotti
Mauricio A. Mauricio A. HernándezHernández
Wang-ChiewWang-ChiewTanTan
Data ExchangeData Exchange
“Scientia potentia est”
• What is Data Exchange?:• The process of taking data built under a
source schema and transforming it into data built under a target schema
• Data Exchange is the restructuring of data
Data Exchange – why?Data Exchange – why?
1. Today when companies merge they also merge information sources.
Data Exchange – why?Data Exchange – why?
2. When several institutions are working on a joint venture – a combined database is
Data Exchange – why?Data Exchange – why?
3. Refreshing and updating data base scheme
Few problems with data exchange
1. The labels in the Source Schema and the values Target Schema could be very different
2. Data could be kept in a plethora of waysFor instance: Car price could be stored in Shekels and in U.S dollars
3. Data could be lost in the exchange process if the Source Schema and Target Schema don’t correspond well
Data ExchangeData Exchange
In the past Data Exchange was done manually, taking many resources
such as time and money.
Many researchers struggle with ways of improving data exchange
Location List-price Automobile
Seniority
Agent- name
Belfast, NR 650000 Morris 8 2 Gerry Adams
Newry, NR 500000 Bentley Mark V
1 Martin McGuiness
Id Name Car model Commission
48 Nigel Dodds Vauxhall 14 0.03
66 Ian Paisley Ford T 0.04
Schema Clunkers –R-Us
Schema Buy-A-Wreck
cars
Car AGENTS
Clunker table
Antique Car DealershipAntique Car Dealership
Car Model price Agent-id
Vauxhall 14 360,000 48
Ford Model T 430,000 66
Schema Clunkers –R-Us
Schema Buy-A-Wreck
Name
Nigel Dodds
Ian Paisley
Agent- name
Nigel Dodds
Ian Paisley
Matching Examples
Car model
Vauxhall 14
Ford T
Automobile
Vauxhall 14
Ford T
Schema Clunkers –R-Us
Schema Buy-A-Wreck
Matching Examples
Matching Examples
Car type
price Agent-id
Vauxhall
14 360,000 48
Ford Model T
430,000 66
Id Commission
48 0.03
66 0.04
Schema Buy-A-Wreck
cars
Car AGENTS
List-price Car model
370800 Vauxhall 14
447200 Ford Model T
Schema Clunkers –R-Us
• Creating mappings:1. schema matching: find matches
2. create query expressions: for automated data translation or exchange
How do we match?
SchemaMatching
Create Query expressions
Data ExchangeData Exchange
1.There may be no way to transform an instance given all of our constraints.
2. There may be numerous ways to transform the instance (possibly infinitely many).
3.We must identify and justify a best suited choice of solutions for our need.
S T
Source schema S
Target schema
T
Data Exchange - SummeryData Exchange - Summery
To conclude:1. Data exchange is exchanging data from a Source Schema to a Target Schema2.It is a greatly dealt problem in the computerized world3. Some Data exchange scenarios deal with Metadata
What is Metadata?What is Metadata?
•Metadata: Data on Data.
Metadata can come as: Video
Audio
Image
Text
Why Do we need Meta – Data?Why Do we need Meta – Data?
Meta-Data helps us to understand data
Can anyone tell what these numbers mean?
Jan 120 223 89Feb 83 168 56
Why Do we need Meta – Data?Why Do we need Meta – Data?
Umbrella SalesMonth USA UK Italy Jan 120 223 89 Feb 83 168 56
After adding Meta-Data…
Why Do we need Meta – Data?Why Do we need Meta – Data?
We all know this picture…
Why Do we need Meta – Data?Why Do we need Meta – Data?
What is this picture all about?
Why Do we need Meta – Data?Why Do we need Meta – Data?
Sir Edward Carson signing the Ulster Covenant
Why Do we need Meta – Data?Why Do we need Meta – Data?
Why Do we need Meta – Data?Why Do we need Meta – Data?
Wall Street, New York City, New York.
23
• Data exchange scenarios may involve metadata transformations.
Data-Metadata TranslationsData-Metadata Translations
• Transforming the data in the Stock Ticker table to metadata in the Stock Quotes table is vital in the stock exchange world.
Data-Metadata TranslationsData-Metadata Translations
• Mapping systems support Data-to-Data transformations with fixed schemas (Clio).
• Goal: Extend mapping systems to support Data-Metadata Translations.
Data Exchange ClioData Exchange Clio
• One software developed for simple graphic data exchange is “Clio”
• Clio corresponded values between the source scheme and the target scheme
• However, the Clio solution did not provide answers for possible data exchange scenarios that involve Metadata
• the solution involving Metadata is based on Clio
Clio interfaceClio interface
27
Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56
Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56
m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA
Metadata-to-DataMetadata-to-Data
Source: Rcd Sales: SetOf Rcd month USA UK Italy
Target: Rcd Sales: SetOf Rcd month country units
How can we transform the following source data into the corresponding target?
Schema mapping m1
“USA”
28
Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56
Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56
m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA
m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK
Metadata-to-DataMetadata-to-Data
Source: Rcd Sales: SetOf Rcd month USA UK Italy
Target: Rcd Sales: SetOf Rcd month country units
How can we transform the following source data into the corresponding target?
Schema mapping m2
“UK”
29
Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56
Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56
m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA
m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK
m3: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “Italy” and $t.units = $s.Italy
Metadata-to-DataMetadata-to-Data
Source: Rcd Sales: SetOf Rcd month USA UK Italy
Target: Rcd Sales: SetOf Rcd month country units
How can we transform the following source data into the corresponding target?
Schema mapping m3
“Italy”
30
Source: Rcd Sales: SetOf Rcd month USA UK Italy
Target: Rcd Sales: SetOf Rcd month country units
countries label value
Select the elements to group
Placeholder Copy elements’
values
Copy elements’ labels
Source.Sales Jan 120 223 89 Feb 83 168 56
Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56
Set of labels (strings)
Dynamic selection of the source
element
Is a label value
for $s in Source.Sales, $c in {“USA”, “UK”, “Italy”}{“USA”, “UK”, “Italy”}exists $t in Target.Saleswhere $t.month = $s.month and $t.country = $c and $t.units = $s.($c)
MetadatA-Data (MAD) mapping:
Metadata-to-Data: Our solutionMetadata-to-Data: Our solution
31
Target: Rcd Stockquotes: SetOf Rcd time symbols label value
Source: Rcd StockTicker: SetOf Rcd time symbol price Dynamic
element
Now we want to support the opposite operation
The target schema depends on the source data
We define a target template: Nested Dynamic Output Schemas (ndos)
Run-time: The dynamic element defines the target instance and the target schema.
Data-to-MetadataData-to-Metadata
StockTicker (time: 0900, Symbol : MSFT, Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM, Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT, Price: 27.30 )
There are two possible interpretations for the target ndos:
Consider this mapping and this source instance:
Stockquotes (time: 0900, MSFT: 27.20 ) Stockquotes (time: 0900, IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30 )
Target: Rcd Stockquotes: SetOf Rcd time symbols: Choice MSFT IBM
Computed Target Instance
Source Instance
First alternative: Heterogeneous target records
Computed Target Schema
Data-to-Metadata: Heterogeneous recordsData-to-Metadata: Heterogeneous records
Target: Target: RcdRcd Stockquotes: Stockquotes: SetOf RcdSetOf Rcd timetime symbolssymbols labellabel valuevalue
Source: Source: RcdRcd StockTickerStockTicker: : SetOf RcdSetOf Rcd timetime symbolsymbol priceprice
Target: Target: RcdRcd Stockquotes: Stockquotes: SetOf RcdSetOf Rcd timetime symbolssymbols labellabel valuevalue
Source: Source: RcdRcd StockTickerStockTicker: : SetOf RcdSetOf Rcd timetime symbolsymbol priceprice
StockTicker (time: 0900, Symbol : MSFT Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT Price: 27.30 )
There are two possible interpretations for the target ndos:
Data-to-Metadata: Homogenous recordsData-to-Metadata: Homogenous records
Consider this mapping and this source instance:
Computed Target Instance
Source Instance
Computed Target SchemaTarget: Rcd Stockquotes: SetOf Rcd time MSFT IBM
Stockquotes (time: 0900, MSFT: 27.20, IBM: null ) Stockquotes (time: 0900, MSFT: null , IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30, IBM: null )
Second alternative: Homogeneous target records
34
The Homogenous approach is a MAD improvemnet
Stockquotes(time: 0900, MSFT : 27.20, IBM: null ) Stockquotes(time: 0900, MSFT : null , IBM: 120.00) Stockquotes(time: 0905, MSFT : 27.30, IBM: null )
Homogeneity Constraint:“For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2”
Stockquotes(time: 0900, MSFT : 27.20 ) Stockquotes(time: 0900, IBM : 120.00 ) Stockquotes(time: 0905, MSFT : 27.30 )
Natural solution for semi-structured data models (XSD, DTD, JSON)
Data-to-Metadata: Homogenous recordsData-to-Metadata: Homogenous records
Target: Target: RcdRcd Stockquotes: Stockquotes: SetOf RcdSetOf Rcd timetime symbolssymbols labellabel valuevalue
Source: Source: RcdRcd StockTickerStockTicker: : SetOf RcdSetOf Rcd timetime symbolsymbol priceprice
MAD MappingMAD Mapping
MetadatA-Data(MAD) mapping three steps:
1. Preliminary mapping
How do we map the Source schema to the Target schema
Preliminary mapping for <<D>> includes the metadata label and the value label of <<D>>.
36
Source: Rcd SalesByCountries: SetOf Rcd month USA UK Italy
Target: Rcd Sales: SetOf Rcd month country units
countries label value
{ $x1 Source.SalesByCountries, $x2<<countries>>; $x3=$x1.($x2) }
Target.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56
Source.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56
Preliminary Mapping
Label Value Transfer
37
MAD MappingMAD Mapping
2. Skeletons:
n x m matrix of skeletons is constructed for the set of source preliminary mapping and the set of target preliminary mapping while each entry(i,j) can be potential mapping.
3. Creating MAD Mapping:
At this stage, the value correspondences need to be matched against the preliminary mapping in order to factor them into the appropriate skeletons.
Source.Sales.country Target.CountrySales.country
Matched against one or more
source mappings
Matched against one or more target
mappings
Source.SalesByCountries.<<countries>> Target.Sales.countrySource.SalesByCountries.&<<countries>> Target.Sales.units
MAD Mapping Generation ExampleMAD Mapping Generation Example
Source: Rcd SalesByCountry: SetOf Rcd month USA UK Italy
Target: Rcd Sales: SetOf Rcd month country units
countries label value
Source : { $x1 Source.SalesByCountry, $x2<<countries>>; $x3:=$x1.($x2) }
Target : { $y1 Target.Sales}
Source schema S
Target schema T
Declarative (internal) representation
GUI
Executable code (XSLTXSLT, XQuery, JavaJava)
New construct to iterate over elements’ labels: placeholder
Target schema can be incomplete: nested dynamic output schema (ndos)
New mapping & query generation algorithms
Data exchange with data-metadata support: Data to Data is a special case
MAD vs ClioMAD vs Clio
40
Fin.Fin.