learning to match ontologies on the semantic web anhai doan jayant madhavan robin dhamankar pedro...
Post on 15-Jan-2016
215 views
TRANSCRIPT
Learning to Match Ontologies on the Semantic Web
AnHai DoanJayant MadhavanRobin DhamankarPedro Domingos
Alon Halevy
Glue
Identifies Mappings between websites
Uses Machine Learning
Uses Common Sense Knowledge
Domain Constraints
Motivation
Data comes from Different Ontologies Answers come from multiple web
pages Manual:
very tedious, error prone, not very scalable
Outline
Overview of GLUE GLUE Architecture Case Studies CGLUE Case Studies Conclusion Assessment
Overview• Assumes 2 Ontologies• 1-1 Matching• Similarity between two Concepts
• Computing Joint Distribution• P(A,B), P(A, ~B), P(~A,B), P(~A,~B)
• Machine Learning• Multistrategy Learning• Exploiting Domain Constraints• Data Instances
Overview
Relaxation Labeler
Similarity Estimator
Meta Learner M
L1 Lk
Taxonomy 01 Taxonomy 02
Joint DistributionsSimilarity function
Similarity MatrixCommon knowledgeDomain constraints
Mappings for Taxonomies
…………
Distribution Estimator
Meta Learner M
Base LearnerL1 ………
…
Base LearnerLk
Taxonomy 01 Taxonomy 02
Joint Distributions
Distribution Estimator
R
DCA
FE
t1,t2 t3,t4
t5 t6,t7
t1,t2,t3,t4
t5,t6,t7
Trained Learner L
Distribution Estimator
G
HB
JIs2,s3 s4
s5,s6
s1,s2,s3,s4
s5,s6
L
s1
Distribution Estimator
s1,s3
s5 s6
s2,s4
Multistrategy Learning
Base Learners Content Learner
Frequency Naïve Bayes
Name Learner Full Name
Specific and Descriptive Element MetaLearner
MetaLearner
Combines the base learners Gives learner weight
User Input
Joint DistributionsSimilarity function
Similarity Estimator
Similarity Matrix
Similarity Estimator
Similarity Estimator Applies Function From User
Jaccard-sim
Outputs a matrix between concepts
Where are we?
Find Similarities
Compute Similarities
Satisfy Constraints
Relaxation Labeler
Relaxation Labeler
Similarity MatrixCommon knowledgeDomain constraints
Mappings for Taxonomies
Constraints
Domain-Independent General Knowledge
Domain-Dependent Interaction between two nodes
Model each as a feature f()
Domain Independent
Relaxation Labeler
Searches for best mapping given constraints
Labels are influenced by it “neighborhood”
Performs local optimization
Local Optimization
1. Assigns initial labels 2. Performs Optimization 3. Uses a formula to change a label 4. Repeat 2-3
Local Optimization
Node in taxonomy O1 Label in taxonomy O2 Everything we know
Other label assignments to all Nodes besides X
Local Optimization
Where are we?
Relaxation Labeler
Similarity Estimator
Meta Learner M
L1 Lk
Taxonomy 01 Taxonomy 02
Joint DistributionsSimilarity function
Similarity MatrixCommon knowledgeDomain constraints
Mappings for Taxonomies
…………
Case Study
• University Catalogs• Business Profiles
• For Each one• Entire set of data instances• Cleaned it up
Results
Improvements
Insufficient Training Data Local Optimization Additional Base Learners Ambiguous Best Match
CGLUE
CGLUE
Beam Search Uses structure and data No relaxation labeling (no
constraints)
CGLUE Case Study
Improvements
Incorporate Domain Constraints Object Identification
Conclusion
Semantic Similarity Multistategy Learning Relaxation Labeling CGLUE
Assessment
Data Instances Additional Sites? CGLUE Future Work