a survey of approaches to automatic schema matching
DESCRIPTION
A survey of approaches to automatic schema matching. Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/1.jpg)
A survey of approaches to automatic schema matching
Erhard Rahm, Universität für Informatik, LeipzigPhilip A. Bernstein, Microsoft Research
VLDB 2001
![Page 2: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/2.jpg)
2
ProblemSchema matching: produce a mapping between elements of two schemas
such that the elements in the mapping correspond semantically to each other.
Cust
C#
Cname
FirstName
LastName
Customer
CustID
Company
Contact
Schema 1 Schema 2
A real-world problem:Schema integration, Data warehouses, E-commerce, Semantic query processing
![Page 3: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/3.jpg)
3
Problem (cont.)
The paper surveys approaches for automated schema matching and presents a taxonomy.
• Manual schema matching: tedious, time-consuming, error-prone and therefore expensive.
• Automated schema matching: the solution
![Page 4: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/4.jpg)
4
Overview• Problem and applications• Match operator• Classification
– Schema level matchers– Instance level matchers– Combining matchers
• Prototype implementations• Conclusion• Critique
![Page 5: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/5.jpg)
5
Match• Match is an abstract operator for implementing schema matching
– Input: two input schemas– Output: a set of mapping elements
• Match is based on heuristics that approximate what the user considers to be a good match
• Implementations of match produces ’match candidates’• Not possible to determine all matches automatically
![Page 6: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/6.jpg)
6
Match (cont.)
Match candidates
S1.C# = S2.CustID
S1.Cname = S2.Company
S1.Firstname = S2.Contact
Match
Schema 1
Schema 2
User acceptance
Matches
S1.C# = S2.CustID
S1.Cname = S2.Company
![Page 7: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/7.jpg)
7
Generic Match Architecture
![Page 8: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/8.jpg)
8
Classification
![Page 9: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/9.jpg)
9
Classification
![Page 10: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/10.jpg)
10
Schema-level matchers• Element-level
– Linguistic approaches:• Similarity of names, e.g. FirstName first_name• Equality of synonyms, e.g. car automobile• Equality of hypernyms, i.e. book publication, article publication• Description matching:
S1: empn // employee nameS2: name // name of employee
– Constraint-based approaches:• Data types, e.g. varchar text• Value ranges• Uniqueness
• Structural-level
![Page 11: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/11.jpg)
11
Classification
![Page 12: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/12.jpg)
12
Instance-level matchers• Linguistic characterization
– Keywords, frequencies of words, combinations, etc.
CName
Microsoft
Apple
Microsoft
Microsoft
Lenovo
Schema 1 Schema 2
Company
IBM
Microsoft
Apple
Microsoft
Apple
EmpName
Allan
Steve
Bob
Carol
match
![Page 13: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/13.jpg)
13
Instance-level matchers• Constraint-based characterization
– Character patterns and numerical value ranges
Price
$19.80
$136.25
$5.00
$64.36
Schema 1 Schema 2
Paid
$24.20
$32.54
$532.00
$33.33
match
![Page 14: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/14.jpg)
14
Classification
![Page 15: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/15.jpg)
15
Combining matchers• The best result is archived by combining multiple matchers• Two types:
– Hybrid matchers– Composite matchers
Hybrid matcher
DatatypesNames
Value ranges
Match candidates
...
...
...
![Page 16: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/16.jpg)
16
Combining matchers• The best result is archived by combining multiple matchers• Two types:
– Hybrid matchers– Composite matchers
Composite matcher
Match candidates
...
...
...
Name matcher
Datatypematcher
![Page 17: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/17.jpg)
17
Prototype implementations
![Page 18: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/18.jpg)
20
Example: SemInt• 15 contraint-based, 5 contant-based matching criteria• Each criteria is mapped to a range [0..1] for every element. Yields an N-
dimensional point for N matching criteria
1
1
0
0 Field length
Data type
C#CustID
CName
Company
![Page 19: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/19.jpg)
21
Conclusion• Proposes a taxonomy• Characterizes and compares previous implementations using this taxonomy• Useful for:
– Programmers who need to implement Match– Researchers looking to develop better algorithms
• Proposes subjects for further research:– Test of performance and accuracy of existing approaches– Better utilization of instance-level information
![Page 20: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/20.jpg)
22
CritiqueGood:• Provides a good overview of the subject, Fig. 2 and Table 5 in particular• Good at pointing out subjects that should be researched further• Taxonomy is easy to understand and is explained well
Could be improved:• Does not compared performance or correctness of implementations• No examples in the descripton of existing implementations• Lacking good examples of structural level matching• Relative performance of implementations are mentioned only once: ”Cupid
performed somewhat better overall”. Cupid is developed by the authors.
![Page 21: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/21.jpg)
Questions?Questions?
![Page 22: A survey of approaches to automatic schema matching](https://reader035.vdocument.in/reader035/viewer/2022062305/5681664b550346895dd9c452/html5/thumbnails/22.jpg)
Questions?