![Page 1: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/1.jpg)
Integration of Overlay UM for Close Domains
Based on Domain Ontology Mapping
Sergey Sosnovsky
PAWS@SIS@PITT
![Page 2: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/2.jpg)
L3S
• Learning Lab of Lower Saxony• > 60 people (~ 5 professors, ~20 postdocs, ~35
PhD students)• > 5 million annual budget• Area of Research:
– Technology Enhanced Learning – Semantic Web and Digital Libraries – Distributed Systems and Networks
• A coordinator for PROLEARN, REWERS and KnowledgeWeb networks of excellence.
• KBS (Knowledge Based Systems) Lab
![Page 3: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/3.jpg)
Outline
• Project Motivation (Simple Scenario)• Addressed Problem and Chosen Solution• GLUE O-Mapping Algorithm• System Details:
– Developed Ontologies– LO Repositories– Implementation
• Summary…• Discussion
![Page 4: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/4.jpg)
Scenario
UM of C knowledge
JavaC
UM ofJava
knowledge
![Page 5: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/5.jpg)
• Problem: “Cold-start”• Source for solution: Closeness of domains.• Solution:
– Apply ontology mapping techniques to identify similar concepts in both domains
– Using the found mappings translate overlay models of student’s knowledge
• Broader Goals:– To verify the possibility of User Model mediation
between relative domains;– To check how effective for this could be Ontology
mapping technologies.
Problem-Solution
![Page 6: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/6.jpg)
Ontology Mapping
• O-Mapping Approaches:– Using a Shared Ontology– Heuristics and Rule-Based– ML-Based
• GLUE-Approach– Joint Probability Distribution Estimation– Similarity Estimation
![Page 7: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/7.jpg)
• Input data:– Ontology O1 with corresponding set of instances U1
– Ontology O2 with corresponding set of instances U2
• For every pair of concepts A from O1 and B from O2 the algorithm is as follows:
1. Partition U1 into U1A and U1
¬A
2. Partition U2 into U2B and U2
¬B
3. Train a Classifier C for instances of A using U1A and U1
¬A as positive and negative training sets
A ¬A
A ¬A
B ¬B
B ¬B
C1
U1
U2
GLUE: Distribution Estimator (1)
![Page 8: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/8.jpg)
4. Apply Classifier L to each instance in set U2B.
This partition U2B into 2 sets U2
AB and U2¬AB.
Similarly for the set U2¬B.
5. Repeat two previous steps with the roles of O1 and O2 being reversed to obtainU1
AB , U1¬AB, U1
A¬B and U1¬A¬B .
6. Now we can compute joint probabilities for the concepts A and B:• P(A,B) = [N(U1
AB)+N(U2AB)] / [N(U1)+N(U2)]
• P(A, ¬B) = [N(U1A¬B)+N(U2
A¬B)] / [N(U1)+N(U2)]• P(¬A, B) = [N(U1
¬AB)+N(U2¬AB)] / [N(U1)+N(U2)]
C1
C1
C2
C2
U2AB U2
A¬B
U2¬AB U2
¬A¬B
GLUE: Distribution Estimator (2)
![Page 9: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/9.jpg)
• Jaccard-sim(A,B) = P(A∩B) / P(AUB) = P(A,B) / [P(A,B) + P(A,¬B) + P(¬A,B)]
• It takes 0, when A and B are disjoint and 1, when they are the same concept
GLUE: Similarity Estimator
![Page 10: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/10.jpg)
Ontologies
• Java Ontology:– Java Ontology designed for Java Personal Reader
http://hoersaal.kbs.uni-hannover.de/rdf/java_ontology.rdf– Format – rdfs– # of Classes – 544– Relations – rdfs:subClassOf
• C ontology:– Our C ontology (next version)
http://www.sis.pitt.edu/~paws/ont/c_programming.rdf– Format – rdfs– # of Classes – 546– Relations – cprog:isA, cprog:partOf (both are
rdfs:subPropertyOf rdfs:subClassOf)
![Page 11: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/11.jpg)
Repositories• Java Repository:
– Sun Java Tutorial http://java.sun.com/docs/books/tutorial/java/index.html
– # of pages – 208– Repository description – rdf– Namespaces used:
• lom="http://ltsc.ieee.org/2002/09/lom-base#" • lom_cls="http://www.imsproject.org/rdf/imsmd_classificationv1p2#"• dc="http://purl.org/dc/elements/1.1/”• vCard="http://www.w3.org/2001/vcard-rdf/3.0#"
• C Repository– Miles C Tutorial (processed by R2Net tool)
http://www.sis.pitt.edu/~sergeys/_dev/c_tutorial_Miles/ # of pages – 117
– Repository description – rdf– Namespaces used:
• lom-edu="http://ltsc.ieee.org/2002/09/lom-educational#“• dc="http://purl.org/dc/elements/1.1/”
obsolete
![Page 12: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/12.jpg)
System Implementation
• Tomcat servlet• Four free third-party API’s are used:
– Tidy for HTML parsing and text extraction.– Apache Lucene for indexing of text documents.– HP Jena for RDF processing and Ontology inference.– Weka for classification of concept instances.
• Input:– Two domain ontologies– Two repository descriptions
• Output:– Mapping of two ontologies (rdf)
![Page 13: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/13.jpg)
Summary: Current State
• Ontologies for Java and C programming languages are developed.
• Two repositories of learning objects are described in terms of corresponding ontologies: Sun Java Tutorial and Robert Miles’s C Tutorial.
• During manual mapping about 100 possible mapping cases have been identified, which is a good percentage for both ontologies. All possible semantic mapping situations are found (1:n, m:1, n:m mapping). For dealing with such granularity discrepancies on the level of calculating actual values so far we’re going to use weighed average function.
• The GLUE algorithm for Ontology Mapping based on the joint probability distributions of concepts in their instances is implemented.
![Page 14: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/14.jpg)
Current Problems
• Performance
• Precision
• Interface development.– Mapping presentation to the user– Mapping navigation and manual update– 1:n, m:1, n:m mapping situations
• Mapping of scales
• Students – subjects for the experiment
Performance-precision trade-off is necessary
![Page 15: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/15.jpg)
Performance Problem Solutions
• Change ML-library– SMILE instead of WEKA
• Feature selection– Currently 3500 features for about 300 docs
• Modification of GLUE algorithm– Divide and conquer approach to reduce the
dimensionality of the problem • Another ML-based O-Mapping method
– e.g. based on support vector machines [Jason Chaffee, Susan Gauch. Personal Ontologies For Web Navigation. In Proceedings of the 9th International Conference On Information Knowledge Management (CIKM), 2000, pp. 227-234.]
• Excluding Human feedback
![Page 16: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/16.jpg)
Precision Problem Solutions
• Two sources of information not taken into account by the algorithm yet:
– Structural dependencies– Naming of concepts
Scale Mapping Problem Solution
• If some scale ontology exists, we can use it. If not we need to develop one
![Page 17: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/17.jpg)
Experiment
• Undergraduate course: INFSCI 1090 Object-Oriented Programming I
• ~40 students• 22 student have C/C++ experience and no Java
experience• only 4 students have taken our IS12 => we have UM’s of
their C knowledge
• All students took a pre-quiz on C assessing their knowledge of 56 C-concepts related to Java
• Every week students take a quiz on Java with 2-3 extra credit questions assessing Java concepts not covered yet in the class
![Page 18: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/18.jpg)
Other Solutions of Subject Problem
• Administer Java test at the end of our IS-0012 courses
• Send Java test to those students, who have taken our IS-0012 course (C models exist) and ask them to voluntarily participate in the experiment.
• The mapping is bi-directional => We can act the other way around. If we have available some people, who knows Java, but does not know C, we can try to assess their Java knowledge and then evaluate the knowledge transfer to C domain.
![Page 19: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/19.jpg)
Discussion: Other Open Problems
• General Recommendation on Close Domain Ontology Mapping.When it is worthwhile?What are the metrics of domain closeness?How can we say, that for these two domains knowledge mediation is possible?It could be the percentage of related concepts.It could be the closeness of domains in the general hierarchy of domains based on some Common Sense Upper Level Ontology (SUMO, DOLCHE, CYC).It could be some IR-based metrics of analyzing of related recourses (for example: C tutorial and Java Tutorial).
• Comparative evaluation of several different methods of UM mediation:– Ontology-mapping– Collaborative– Stereotype-based clustering– Provided by Expert– Self-estimation based on scrutable UM
• Architectural issues. Should the developed component act as a part of a central ontology server, or it can be a mediator in decentralized world, or every application should have such facilities and perform its own mapping? Where the mapping should be stored?
![Page 20: I ntegration of O verlay U M for C lose D omains B ased on D omain O ntology M apping](https://reader036.vdocument.in/reader036/viewer/2022083006/56813b76550346895da4854b/html5/thumbnails/20.jpg)
Possible Ways of L3S-PAWS Collaboration
• Java Personal Reader as a component of ADAPT2 architecture
• Implementation of some components of ADAPT2 (KnowledgeSea, AnnotatED) as Adaptation Services for Java Personal Reader
• Mohamed’s Web-service based protocol for concurrent learning (WIDEIn and Java Personal Reader).
• Teresa’s publication recommender and COPe.