ontology generation and applications dr. a.c.m. fong, ceng professor of computer engineering school...
TRANSCRIPT
Ontology Generation and
ApplicationsDr. A.C.M. Fong, CEng
Professor of Computer Engineering
School of Computing and Mathematical Sciences Faculty of Design and Creative Technologies
Auckland University of Technology
Contents1. Introduction – Semantic Web and Ontology2. Related Work – Ontology Generation3. Toward Automated Ontology Generation4. Fuzzy Ontology Generation Framework5. Application 1 – Scholarly Info6. Application 2 – Service Helpdesk
1. IntroductionSemantic Web
The basis for the Semantic Web is on its ability to represent real-life domains accurately so that it enables programs to completely understand the environment in which they operate.
In summary, Semantic Web provides the following benefits: SWeb offers an expressive metadata model to represent
data, so that data can be managed effectively. Programs can understand the semantic concepts described
in metadata used on Semantic Web. Hence, knowledge carried on the Semantic Web can be shared and reused among different programs.
Users can interact with programs using a semantic query language to specify their requests and thereby improving the retrieval performance.
Deductive mechanism that is used to derive new information from existing information can be described clearly, so that knowledge can be reasoned with efficiently.
1. IntroductionSemantic Web Architecture
1. IntroductionSemantic Web Architecture - Layers
Foundation Layer. Semantic Web uses Uniform Resource Identifier URI to identify resources and uses Unicode to encode the documents.
Schema Layer. This layer comprises XML + NS (Namespace) + xmlschema layer and the RDF + rdfschema layer.
This layer defines objects and classes, their relations and constrains. The XML Schema (XMLS) and RDF Schema (RDFS), which are based on XML and RDF respectively, are used for these layers.
RDFS has widely been used to describe classes at the Schema Layers.
1. IntroductionSemantic Web Architecture - Layers
Ontology Layer. This layer provides constructs on using meta-information to represent domain knowledge.
In this layer, information is represented as ontology, which is adopted by the Semantic Web to define knowledge.
Logic Layer. This layer infers more knowledge from the existing knowledge. It can be integrated with the Ontology Layer.
In this layer, concepts and relationships defined in lower layers are converted into Turing-complete logic languages in order to generate new knowledge.
1. IntroductionSemantic Web Architecture - Layers
Proof Layer. This layer provides a mechanism to check whether a statement is true or not.
Trust Layer. This Layer provides a mechanism which resolves conflicts between knowledge carried by the Semantic Web to form the "Web of Trust"
Digital Signature Layer. This layer uses public key cryptography to secure documents.
1. IntroductionOntology – Definition
Ontology has different definitions. A commonly cited definition defines ontology as a formal, explicit specification of a shared conceptualization.
Conceptualization refers to an abstract model of phenomena in the world by having identified the relevant concepts of those phenomena.
Explicit means that the type of concepts used, and the constraints on their use are explicitly defined.
Formal: should be machine readable. Shared: should capture consensual knowledge
accepted by the communities.
1. IntroductionOntology Research
Ontology is regarded as a standard conceptual model for knowledge representation, especially on Semantic Web.
The term ontology engineering has been proposed to imply ontology-related research in computer science
Current interesting issues on ontology engineering include ontology generation, ontology mapping, ontology integration and ontology versioning.
This presentation focuses on ontology generation.
1. IntroductionOntology Description Languages
Ontology is described using an ontology description language.
Ontology description languages are based on Web metadata description languages, which can be classified into the following three groups:
HTML-based XML-based RDF- based
1. IntroductionHTML-based Ontology Description Languages
The tags supported by traditional Web are sufficient to represent some semantic knowledge.
Simple HTML Extension (SHOE) and Ontobroker have embedded additional tags into HTML to represent knowledge.
However, HTML does not support self-defined tags. Therefore, HTML-based approach is difficult to define classes for ontology.
Hence, XML-based ontology description languages have been proposed to overcome this limitation.
1. IntroductionXML-based Ontology Description Languages
These languages are usually based on XML Schema (XMLS) or Document Type Definition (DTD).
DTD allows users to define new markup types to describe information. Therefore, users can define ontology classes using DTD.
Moreover, XMLS supports the definition of relations between classes.
Thus, XMLS and DTD can be used directly to embed semantic information.
However, since XML actually only renders syntactic support for knowledge representation, XML-based ontology description languages face the following problems when representing knowledge
1. IntroductionXML-based Ontology Description Languages
A mechanism to define some relationships that are usually central in ontologies such as is-a or element-of relationships is lacking in XML.
XML does not support any notion of inheritance, which is an important attribute in ontologies.
In XML, concepts are defined through tags, which can be either a string or a combination of other nested tags. Such mechanism may not be sufficient for defining concepts in ontology, which may require richer data structures to be represented.
In XML, the order of tags appearing in a document must be previously defined. In contrast, the ordering of attribute description does not matter on ontology.
1. IntroductionRDF-based Ontology Description Languages
RDF extends XML to become a standard for knowledge representation.
In addition, RDF Schema (RDFS) can be used to define classes and class hierarchies in a domain.
The standardization supported by RDF provides two important contributions:
A standard set of modeling primitives (e.g. class, instance, etc.) and their relationships (e.g. subclass) are provided.
A standardized syntax for writing ontologies is supported.
Popular RDF-based ontology description languages include DARPA Agent Markup Language (DAML), Ontology Inference Language (OIL), DAML+OIL and Web Ontology Language (OWL)
1. Introduction DARPA Agent Markup Language
DAML or DAML-ONT extends RDFS to represent ontology using the object-oriented approach.
It embeds some object-oriented concepts to represent classes. Thus, the class representation of DMAL-ONT is better than RDF.
Example of DAML-ONT to represent the class "Journal", which is a subclass of the class "Publication Medium", but is disjoint with classes "Conference" and "Workshop" (i.e. an object which belongs to class "Journal" can not belong to classes "Conference" or "Workshop"
<Class ID="Journal"><subClassOf resource="#Publication Medium"= >
<disjointFrom resource="#Conference"= ><disjointFrom resource="#Workshop"= >
< =Class>
1. IntroductionOntology Inference Language
OIL extends RDFS to represent ontology. It is designed based on three criteria: Frame-based. It supports frames to define classes
and properties of classes. Thus, class contents can be described more informatively (e.g. constraints can be used for class properties)
Description Logic. It describes knowledge using logic rules. Thus, knowledge is represented mathematically and can be processed by programs.
Uses Web Standard. It is based on XML and RDFS.
1. IntroductionOntology Inference Language
<rdfs:Class rdf:ID="animal"= ><rdfs:Class rdf:ID="plant">
<rdfs:subClassOf><oil:NOT>
<oil:hasOperand rdf:resource="#animal"= ><oil:NOT= >
< =rdfs:subClassOf>< =rdfs:Class><rdfs:Class rdf:ID="tree">
<rdfs:subClassOf rdf:resource="#plant">< =rdfs:Class> Class "animal" is defined, followed by class "plant", which is
defined with the operator "NOT" used to state that it is strictly not identical with class "animal“ (i.e. objects which belong to class "animal" can not belong to class "plant" and vice-versa).
Finally, class "tree" is defined as a subclass of "plant".
1. IntroductionDAML vs. OIL
Compared with DAML, OIL can represent class properties better, but DAML can represent class relationships more clearly.
Hence, they can be combined to form a better ontology description language
DAML + OIL It defines class relationships based on DAML. Class properties are defined in a similar way as
OIL. Hence, DAML+OIL takes the advantages of both
DAML and OIL.
1. IntroductionWeb Ontology Language
OWL is extended from DAML+OIL to allow users to define various types of relationships between classes.
Properties can also be defined using additional constructs in OWL.
OWL has three sublanguages OWL Lite OWL DL OWL Full.
1. IntroductionWeb Ontology Language
Even though there is the same OWL syntax used among these sublanguages, they have a little difference in design aimed at various communities of implementers and users:
OWL Lite only primarily supports classification hierarchy and simple constrains when designing classes.
OWL DL includes all OWL language constructs but they can be used only under certain restriction (e.g. a class cannot be an instance of another class).
OWL Full allows all OWL language constructs to be used without any restriction.
1. IntroductionWeb Ontology Language
<rdf:RDF>
xmlns:owl ="http://www.w3.org/2002/07/owl#"xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#"xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#"xmlns:daml="http://www.w3.org/2001/10/daml+oil#"
<owl:Ontology rdf:about="Scholarly Information"><owl:versionInfo>v 1.0 2009-12-07 19:06:40</owl:versionInfo>
< =owl:Ontology><owl:Class rdf:ID="Concept1">
<owl:rdfLabel="Data Mining">< =owl:Class><owl:Class rdf:ID="Concept2">
<owl:rdfLabel="Fuzzy Logic">< =owl:Class>< =owl:Class rdf:ID="Concept2">< =owl:Class rdf:ID="Concept3">
<owl:rdfLabel="Data Mining, Fuzzy Logic"= ><rdf:subClassOf>
<owl:Restriction><owl:onProperty rdf:resource="Concept1"><owl:onProperty rdf:resource="Concept2">
< =rdf:subClassOf>< =rdf:RDF>
Header Info
Ontology Name and Version
3 classes: Concept1 (labelled Data mining), Concept2 (labelled Fuzzy Logic) and Concept3. Concept3 is a subclass of both Concept 1 and Concept2.
2. Related WorkOntology Generation
Ontology uses classes, which contain attributes, to represent concepts.
Ontology also supports taxonomy and non-taxonomy relations between classes.
Although editing tools such as Protege [1] and OilEd [2] have been developed to help users to create and edit ontology, it is a tedious task to manually derive ontology from data.
2. Related WorkOntology Generation – Approaches
Ontology can be generated from various types of data, mostly textual.
Large corpora [3,4] are considered as good sources for mining knowledge for constructing ontology, since the information in the corpus is usually well annotated. Therefore, it can be easily processed by other programs.
Ontology can also be generated from a knowledge base of rules [5], which is represented as a tree with rules residing at tree nodes. Statistical approaches have been used to estimate the existence of relationships between entities involved in rules [6].
2. Related WorkOntology Generation – Approaches
When knowledge is represented in semi-structured schemata such as XML and RDF, its contents can easily be parsed by programs; techniques have been proposed to generate ontology from semi-structured schemata based on Graph Theory [7] and statistical approaches [8].
Learning Source Description (LSD) proposed [9] to generate ontology from any arbitrary formalisms of semi-structured schemata.
Entity-Relationship model used in database schema has also been adopted as an information source for generating ontology [10,11].
2. Related WorkOntology Generation –Textual Data
For textual data, ontology concepts can be extracted efficiently using Natural Language Processing (NLP) techniques [12,13].
NLP for preprocessing the textual data in order to extract significant keywords.
WordNet [14] can be used to improve accuracy of ontology generated by NLP-based techniques.
However, the NLP techniques have difficulty in finding semantic relationships among the keywords.
Data mining techniques can be combined with NLP to improve the efficiency of ontology generation. In Text-to-Onto [15], association rules are used to ¯find associative relations between keywords, which are used to construct non-taxonomy relations for the ontology.
2. Related WorkOntology Generation –Textual Data
Keywords' frequencies are often used in statistical approaches [16,17] to identify significant keywords that can be used to represent a certain concept.
Clustering techniques have also been applied to generate ontology from textual data [18].
Using significant keywords extracted from textual data, clustering techniques can cluster documents and interpret topics from the generated clusters.
2. Related WorkOntology Generation –Clustering
Clustering can be used to mine hidden knowledge from data to construct an ontology. It can also be used to enrich existing ontology.
Traditional clustering techniques are useful for generating non-taxonomy relations for ontology.
In particular, conceptual clustering techniques are powerful clustering techniques that can conceptualize clusters and construct a concept hierarchy of clusters useful for generating taxonomy relations for ontology.
E.g. approach based on COBWEB [18] that can generate taxonomy relations among concepts on a domain for ontology generation.
Mo'K [19] is a system that can obtain taxonomy relations from tagged text using conceptual clustering.
2. Related WorkOntology Applications – Scholarly Info
In E-Scholar Knowledge Inference MOdel (ESKIMO) [20], knowledge on scholarly publications is represented as a simple ontology, known as OntoPortal, which is manually developed and maintained.
OntoPortal describes and provides links to other external research pages on the Web. Hypertext links between the web pages are also described in the OntoPortal ontology.
ESKIMO allows users to retrieve scholarly information from the constructed ontology by using queries represented as Prolog-like rules.
2. Related WorkOntology Applications – Scholarly Info
In the Scholarly Ontology Project [21], a digital library Web server is constructed using Semantic Web technologies in order to support scholarly retrieval.
Developed using a collaborative approach in which researchers will submit their documents in a specifically structured format.
As such, the contents of the submitted documents can be further processed in the system and converted into scholarly ontology accordingly.
2. Related WorkOntology Applications – Scholarly Info
In the Research in Semantic Scholarly Publishing (RSSP) project, scientific publications are collected from online archives such as the Open Archive Initiative (OAI) [22].
Information of the documents (e.g. their authors, titles, citations, publishers, etc.) is extracted, indexed and converted into ontology formalism.
DAML+OIL is used to annotate the ontology as Semantic Web pages to support scholarly retrieval
2. Related WorkSummary
Many techniques to construct ontology from various data types/sources; mainly textual data
Traditionally, NLP techniques are used to analyze textual data.
Recently, data mining techniques have been incorporated into NLP to further discover hidden knowledge from textual data.
Conceptual clustering is an advanced data mining technique that can organize data in a hierarchical conceptual structure.
Thus, conceptual clustering is a useful technique to discover knowledge for generating ontology from textual data.
3. Toward Automated Ontology GenerationBasics
Initial focus on Scholarly info Scholarly ontology generated directly from
explicit information on scientific publications (e.g. their titles, authors, citations, etc.).
Other advanced scholarly knowledge such as research experts and areas are usually inferred manually by human experts.
3. Toward Automated Ontology GenerationBasics
To construct scholarly ontology from citation database, we use data mining techniques to discover hidden knowledge in the database.
Data mining techniques include Context-based Cluster Analysis (CCA) and Fuzzy Concept Hierarchy Generation (FCHG)
Discovered knowledge then converted and integrated into the ontology formalism.
As such, apart from the implicit information available on scientific publications, Scholarly Ontology can also support other useful scholarly retrieval functions such as research experts finding and trends detection
3. Toward Automated Ontology GenerationContext-based Cluster Analysis
CCA is based on Formal Concept Analysis (FCA) [23] technique.
FCA provides a formal model, known as formal context, to represent relations between objects and attributes in a data set.
We use formal contexts to represent multiple resultant clustering data.
Then, relations between the formal contexts are analyzed to find the relations between the corresponding resultant clustering data
3. Toward Automated Ontology GenerationFuzzy Concept Hierarchy Generation
Concept hierarchy is a data structure useful for knowledge presentation.
Widely used in data mining applications. Size of a concept hierarchy may be large to
reflect the knowledge in a domain precisely. Manual construction may be difficult and tedious. Need conceptual clustering
3. Toward Automated Ontology GenerationFuzzy Concept Hierarchy Generation
Many conceptual clustering techniques organize knowledge as a concept hierarchy. It may not be sufficient for representing information in a real domain.
FCA, which is a data exploratory technique, supports concept lattice that provides a more informative conceptual model for representing knowledge.
FCA-based conceptual clustering techniques are potentially useful for constructing taxonomy knowledge of ontology.
However, the typical FCA-based conceptual clustering techniques do not support uncertainty information.
3. Toward Automated Ontology GenerationFuzzy Concept Hierarchy Generation
Traditional FCA-based conceptual clustering approaches can’t represent vague information… Need fuzziness
L-Fuzzy context uses linguistic variables to represent uncertainty in the context.
But needs human interpretation to define linguistic variables.
Fuzzy concept lattice generated from L-fuzzy context usually causes a combinatorial explosion of concepts (compared to traditional concept lattice)
3. Toward Automated Ontology GenerationFuzzy Concept Hierarchy Generation
We combine fuzzy logic and FCA as Fuzzy Formal Concept Analysis (FFCA).
In FFCA, uncertainty information is directly represented by a real number of membership value in the range of [0,1].
Linguistic variables are no longer needed. Compared to fuzzy concept lattice generated from
L-fuzzy context, the fuzzy concept lattice generated using FFCA will be simpler in terms of the number of formal concepts.
It also supports a formal mechanism for calculating concept similarities.
Based on FFCA, we propose the Fuzzy Conceptual Clustering technique in FCHG to generate fuzzy concept hierarchy.
4. Fuzzy Ontology Generation FrameworkFuzzy Ontology
Application of fuzzy logic offers a possible solution for dealing with uncertainty information
Fuzzy ontology is generated and used in text retrieval and search engines, where membership values are used to evaluate the similarities between the concepts in a concept hierarchy
Manual generation of fuzzy ontology from a predefined concept hierarchy is a difficult and tedious task that often requires expert interpretation.
4. Fuzzy Ontology Generation FrameworkIntroduction
Efficient method for generation of concept hierarchy and fuzzy ontology is highly desirable
We propose a Fuzzy Ontology Generation Framework (FOGF) that can automate fuzzy ontology generation from uncertainty data based on Formal Concept Analysis (FCA) theory
Generated fuzzy ontology is mapped to a semantic representation in OWL
4. Fuzzy Ontology Generation FrameworkOverview
Fuzzy Formal Concept Analysis incorporates fuzzy logic into Formal Concept Analysis to represent vague information
Concept Hierarchy Generation clusters the fuzzy concept lattice generated by FFCA to construct a concept hierarchy in two steps: Fuzzy Conceptual Clustering and Hierarchical Relation Generation
Fuzzy Ontology Generation constructs fuzzy ontology from a fuzzy context using the concept hierarchy created by fuzzy conceptual clustering
Semantic Representation Conversion – make knowledge accessible and sharable on the Web environment. Use OWL
Concept HierarchyGeneration
Fuzzy FormalConcept Analysis
Fuzzy OntologyGeneration
Fuzzy Ontology
UncertaintyInformation
Fuzzy Concept Lattice Concept Hierarchy
SemanticRepresentation
Conversion
Semantic Web
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Context)A fuzzy formal context is a triple K =(G, M, I = (G M)) where G is a set of objects, M is a set of attributes,
and I is a fuzzy set on domain G M. Each relation (g, m) I has a membership value
(g,m) in [0,1].
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Fuzzy formal context can be represented as a cross-table (Table 1)
An α-cut can be set to eliminate relations with low membership values, e.g. α = 0.5 (Table 2)
The context has 3 objects representing 3 documents, D1, D2 and D3. It also has 3 attributes, “Data Mining”, “Clustering” and “Fuzzy Logic” representing 3 research topics. The relationship between an object and an attribute is represented by a membership value in [0, 1].
Data Mining Clustering Fuzzy Logic
D1 0.8 0.12 0.61
D2 0.9 0.85 0.13
D3 0.1 0.14 0.87
Data Mining Clustering Fuzzy Logic
D1 0.8 - 0.61
D2 0.9 0.85 -
D3 - - 0.87
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Representation of Object)
Each object O in a fuzzy formal context K can be represented by a fuzzy set (O) as where {A1, A2,…, Am} is the set of attributes in K and µi is the membership of O with attribute Ai in K. (O) is called the fuzzy representation of O.
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Generally, we can consider the attributes of a formal concept as the description of the concept.
Thus, the relationships between the object and the concept should be the intersection of the relationships between the objects and the attributes of the concept
Since each relationship between the object and an attribute is represented as a membership value in fuzzy formal context, the intersection of these membership values should be the minimum of these membership values, hence…
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Concept)Given a fuzzy formal context K =(G, M, I) and a confidence
threshold T, we define A*= {m M | g A: (g, m) T} for A G and B* = {g G | m B: (g,m) T} for B M. A fuzzy formal concept (or fuzzy concept) of a fuzzy formal context (G, M, I) with a confidence threshold T is a pair (Af =(A), B) where A G, B M, A* = B and B* = A. Each object g (A) has a membership g defined as
g = min (g,m) m B
where (g,m) = membership value between object g and attribute m defined in I. If B = {} then g = 1 for every g. A and B are the extent and intent of the formal concept ((A), B) respectively.
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
This version of FFCA as presented in these Definitions preserves differently continuous values of objects’ memberships, crucial for calculating concepts’ similarities.
In a formal context, a concept can have many superconcepts and subconcepts. However, the similarities of a concept to its superconcepts and subconcepts are different.
With fuzzy concept lattice, we can make use of the fuzzy set theory to calculate the similarities between a concept and its subconcepts.
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Concept Cardinality)
Since the fuzziness of a fuzzy formal concept is represented by membership values of objects of the concept, the cardinality of a fuzzy formal concept Kf = ((A), B) is defined as |Kf| = |(A)|.
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Concept Similarity)
The similarity of a fuzzy formal concept Kf1 = ((A1), B1) and its subconcept Kf2 = ((A2), B2) is defined as E(Kf1,Kf2) = E((A1), (A2)).
4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis
Traditional concept lattice generated from Table 1 without membership values
C1
{}
{D1,D2,D3}
{D1,D2}}{“Data Mining”} {“Fuzzy Logic”}{D1,D3}
{“Data Mining”,“Clustering”}
{D2}
{“Data Mining”,“Clustering”,
“Fuzzy Logic”}
{}
{D1} {“Data Mining”,“Fuzzy Logic”}
C2
C3 C4
C1
{}
{D1,D2,D3}
{D1(0.8),D2(0.9)}{“Data Mining”} {“Fuzzy Logic”}{D1(0.61),
D3(0.87)}
{“Data Mining”,“Clustering”}
{D2(0.85)}
{“Data Mining”,“Clustering”,
“Fuzzy Logic”}
{}
{D1(0.61)} {“Data Mining”,“Fuzzy Logic”}
0.00 0.00
0.50.410.35
0.00 0.00
C2
C3 C4
Fuzzy concept lattice generated from fuzzy formal context in Table 2 (similarities
between concepts shown) Fig. 2
Fig. 3
4. Fuzzy Ontology Generation Framework Overview
Concept HierarchyGeneration
Fuzzy FormalConcept Analysis
Fuzzy OntologyGeneration
Fuzzy Ontology
UncertaintyInformation
Fuzzy Concept Lattice Concept Hierarchy
SemanticRepresentation
Conversion
Semantic Web
4. Fuzzy Ontology Generation Framework Step 2 Concept Hierarchy Generation
Concept Hierarchy Generation clusters the fuzzy concept lattice generated by FFCA to construct a concept hierarchy in two steps: Fuzzy Conceptual Clustering and Hierarchical Relation Generation
4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering
Compared to traditional clusters, the conceptual clusters generated have the following properties:
Each conceptual cluster is considered as a human interpretable concept in the domain of the fuzzy concept lattice
Each conceptual cluster is a sublattice extracted from the fuzzy concept lattice
A formal concept must belong to at least one conceptual cluster e.g. a scientific document can belong to more than one research area
4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering
Conceptual clusters are generated based on the idea at if a formal concept A belongs to a conceptual cluster R, then its subconcept B also belongs to R if B is similar to A. We can use a similarity confidence threshold Ts to determine whether two concepts are similar or not.
4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering
Definition (Conceptual Cluster). A conceptual cluster of a concept lattice K
with a similarity confidence threshold Ts is a sublattice SK of K which has the following properties:
SK has a supremum concept CS that is not similar to any of its superconcepts.
Any concept C CS in SK must have at least one superconcept C’ SK so that E(C,C’) > Ts.
4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering
Fig. 5 shows the conceptual clusters generated from the fuzzy concept lattice given in Fig. 3 with similarity confidence threshold Ts = 0.5
C1
{}
{“Data Mining”}
{“Fuzzy Logic”}
{“Data Mining”,“Clustering”}
{“Data Mining”,“Clustering”,
“Fuzzy Logic”}
{“Data Mining”,“Fuzzy Logic”}
0.50.41
C2
C3
C4
0.000.00
0.000.00
0.35
CK1 CK2
CK3
Fig. 5
4. Fuzzy Ontology Generation Framework Step 2 b)Hierarchical Relation Generation
Fuzzy conceptual clustering generates a set of conceptual clusters SC. To construct a concept hierarchy from the conceptual clusters, we need to find the hierarchy relations from the clusters.
We first define a concept hierarchy Definition (Concept Hierarchy)A concept hierarchy is a poset (partially
ordered set) (H,) where H is a finite set of concepts, and is a partial order on H.
4. Fuzzy Ontology Generation Framework Step 2 b)Hierarchical Relation Generation
Definition of superconcept and subconcept relations on conceptual clusters assures that each conceptual cluster has at least one superconcept, unless it corresponds to the root node of the concept hierarchy generated. However, we must prove that the relation is a partial order.
Definition (Subconcept and Superconcept on a Concept Hierarchy)
Let C1 and C2 be two conceptual clusters corresponding to two sublattices L1 and L2 of a fuzzy concept lattice F (K). Let the fuzzy formal concept I be the supremum of L1, i.e. I = sup(L1). C1 is the subconcept of C2, denoted as C1 C2 , if I is the subconcept of any concept C’ L2, or I C’ where is the partial order defined on F (K). Equivalently, C2 is the superconcept of C1.
4. Fuzzy Ontology Generation Framework Step 2 b)Hierarchical Relation Generation
Figure 8(b) illustrates the hierarchical relations constructed from the conceptual clusters given in Figure 8(a). Each concept in the concept hierarchy is represented by a set of its attributes. The supremum and infimum of the lattice are considered as “Thing” and “Nothing” concepts, respectively.
C1
{}
{“Data Mining”}
{“Fuzzy Logic”}
{“Data Mining”,“Clustering”}
{“Data Mining”,“Clustering”,
“Fuzzy Logic”}
{“Data Mining”,“Fuzzy Logic”}
0.50.41
C2
C3
C4
0.000.00
0.000.00
0.35
CK1CK2
CK3
Figure 8(a). Conceptual clusters.
Thing
{“Fuzzy Logic”}{“Data Mining”,
“Clustering”}
Nothing
{“Data Mining”,“Fuzzy Logic”}
Figure 8(b). Concept hierarchy.
4. Fuzzy Ontology Generation Framework Overview
Concept HierarchyGeneration
Fuzzy FormalConcept Analysis
Fuzzy OntologyGeneration
Fuzzy Ontology
UncertaintyInformation
Fuzzy Concept Lattice Concept Hierarchy
SemanticRepresentation
Conversion
Semantic Web
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
This step constructs fuzzy ontology from a fuzzy context using the concept hierarchy created by fuzzy conceptual clustering.
This is done based on the characteristic that both FCA and ontology support formal definitions of concepts.
However, a concept defined in FCA has both extensional and intensional information in a balanced manner, whereas a concept in ontology emphasizes on its intensional aspect.
To construct the fuzzy ontology, we need to convert both intensional and extensional information of FCA concepts into the corresponding classes and relations of the ontology.
Thus, we define the fuzzy ontology as follows…
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
Definition (Fuzzy Ontology). A fuzzy ontology FO consists of 4 elements (C,AC,R, X), where C = set
of concepts; AC represents a collection of attributes sets, one for each concept; R = (RT, RN) represents a set of relationships, which consists of 2 elements: RN is a set of non-taxonomy relationships and RT is a set of taxonomy relationships. Each concept ci in C represents a set of objects, or instances, of the same kind. Each object oij of a concept ci can be described by a set of attributes values denoted by AC(ci). Each relationship ri(cp,cq) in R represents a binary association between concepts cp and cq, and the instances of such a relationship are pairs of (cp,cq) concept objects. Each attribute value of an object or relationship instance is associated with a fuzzy membership value between [0,1] implying the uncertainty degree of this attribute value or relationship. X is a set of axioms. Each axiom in X is a constraint on the concept’s and relationship’s attribute values or a constraint on the relationships between concept objects
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
Example (Fuzzy Ontology). the Scholarly Ontology OS = (C, AC, R, X) is a fuzzy ontology where its
components are as follows. C = {“Document”, “Research Area”} AC(“Document”) = {“Name” ,“Author”, “Title”, “Keywords”, “Abstract”,
“Body”, “Publisher”, “Publication Date”} AC(“Research Area”) = {“Name”,“Keyword”} RN = {belong-to(“Document”, “Research Area”), consist-of(“Research
Area”,”Document”)} RT = {superarea-of(“Research Area”, “Research Area”), subarea-
of(“Research Area”, “Research Area”)} X ={Implies(Antecedent(consist-of(I-variable(x1) I-variable(x2)))
Consequent(belong-to(I-variable(x2) I-variable(x1))))Implies(Antecedent(belong-to(I-variable(x1) I-variable(x2)))
Consequent(consist-of(I-variable(x2) I-variable(x1))))Implies(Antecedent(superarea(I-variable(x1) I-variable(x2)))
Consequent(subarea(I-variable(x2) I-variable(x1))))Implies(Antecedent(subarea(I-variable(x1) I-variable(x2)))
Consequent(superarea(I-variable(x2) I-variable(x1))))}
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
Figure 9. Fuzzy ontology generation process.
Ontology RelationClasses
Taxonomy RelationGeneration
Class MappingNon-Taxonomy
Relation Generation
Ontology Extentand Intent Classes
Fuzzy Context Concept Hierarchy
OntologyHierarchical Classes
InstancesGeneration
Fuzzy Ontology
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
Class Mapping furnishes C = {E, I} in which E and I are classes corresponding to extent and intent of the fuzzy context. For example, the extent class mapped from the extent of the fuzzy context given in Table 1(b) can be labeled manually as Document. We can use appropriate names to represent keyword attributes and use them to label the intent class names as well. For example, the class Research Area can be used to label the initial intent class.
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
Taxonomy Relation Generation furnishes RT = {superclass(I,I), subclass(I,I)}. Thus, the hierarchical relations between instances of intent classes are defined. Also, two rules are added to X accordingly:
superclass(X,Y):-subclass(Y,X). subclass(X,Y):-superclass(Y,X).
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
Non-taxonomy Relation Generation furnishes RN = {RIE(I,E), REI(E,I)}, in which REI is the relation between the extent class and intent class. RIE is the reversed relation of REI. However, we still need to label the non-taxonomy relation. For example, the relation between class Document and class Research Area can be labeled as belong-to, which implies that a document can belong to one or more research areas. Also, two rules are added to X accordingly:
REI(X,Y):- RIE(Y,X). RIE (X,Y):- REI (Y,X).
4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation
Instances Generation generates instances set I = {II, IE} where II and IE are instances of the intent and extent class.
Then, it furnishes membership values for the instances’ attributes and relationships
4. Fuzzy Ontology Generation Framework Overview
Concept HierarchyGeneration
Fuzzy FormalConcept Analysis
Fuzzy OntologyGeneration
Fuzzy Ontology
UncertaintyInformation
Fuzzy Concept Lattice Concept Hierarchy
SemanticRepresentation
Conversion
Semantic Web
4. Fuzzy Ontology Generation Framework Step 4 Semantic Representation Conversion
The generated fuzzy ontology provides a conceptual model of knowledge in the corresponding domain
However, to make such knowledge accessible and sharable, we must convert it into a semantic representation that can be embedded into the contents of Web pages.
In Semantic Web, ontology description language such as OWL can be used to annotate ontology.
Therefore, the generated fuzzy ontology can be automatically converted into the corresponding semantic representation in OWL, in which each class and instance is annotated as shown on the next slide…
4. Fuzzy Ontology Generation Framework Step 4 Semantic Representation Conversion
Ontology for the concept hierarchy represented by OWL <?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.owl-ontologies.com/unnamed.owl#" xml:base="http://www.owl-ontologies.com/unnamed.owl"> <owl:Ontology rdf:about=""/> <owl:Class rdf:ID="Concept_2"/> <owl:Class rdf:ID="Concept_1"/> <owl:Class rdf:ID="Concept_3"> <rdfs:subClassOf rdf:resource="#Concept_1"/> <rdfs:subClassOf rdf:resource="#Concept_2"/> </owl:Class> <owl:DatatypeProperty rdf:ID="Data_Mining"/> <owl:DatatypeProperty rdf:ID="DataMining"> <rdfs:domain rdf:resource="#Concept_1"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/> </owl:DatatypeProperty> <owl:DatatypeProperty rdf:ID="FuzzyLogic"> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/> <rdfs:domain rdf:resource="#Concept_2"/> </owl:DatatypeProperty> <Concept_2 rdf:ID="Document2"> <FuzzyLogic rdf:datatype="http://www.w3.org/2001/XMLSchema#float" >0.87</FuzzyLogic> </Concept_2> </rdf:RDF>
5. Scholarly OntologyOntology Generation
Collected scientific documents on the research area “Information Retrieval” published in 1987-1997 from ISI
Downloaded documents are preprocessed to extract related information such as the title, authors, citation keywords, and other citation information
Extracted information then stored in a citation database
5. Scholarly OntologyOntology Generation
First, we construct a fuzzy formal context Kf = {G,M,I}, with G as the set of documents and M as the set of citation keywords. The membership value of a document D on a citation keyword CK in Kf is computed as
where n1 is the number of documents that cite D and contain CK, and n2 is the number of documents that cite D
This formula is based on the premise that the more frequent a keyword occurs in the citing paper, the more important the keyword is in the cited paper.
2
1),(n
nCD K
5. Scholarly OntologyOntology Generation
Then, conceptual clustering is performed from the fuzzy formal context
Each generated conceptual cluster represents a research area
The generated conceptual clusters form a hierarchy of research areas of documents in the Citation Database, or Research Area Hierarchy
5. Scholarly Ontology Example of concept hierarchy generated
Each research area is represented by a set of most frequent keywords occurring in the documents that belong to that research area. In FFCA, sub-areas inherit keywords from their super-areas. Note that the inherited keywords are not shown in Figure 11 when labeling the concepts. Only keywords specific to the concepts are used for labeling.
{"SemanticSimilarity","Knowledge
Representation"}
{"User Interface","Browsing"}
{"Clustering","Neural Network"}
{"Information Retrieval","Query Processing",
"Searching"}
{"Recall","Precision"}
{“User Satisfaction","User Training","User Study"}
{"Online Search","Information Filtering"}
{"Retrieval Evaluation","System Training"}
{"Data Indexing"}
{"Expert System"}
{"Text Retrieval"}
{"Data Mining"}
Figure 11
5. Scholarly OntologyOntology Generation
The generated ontology contains scholarly information as a hierarchy of research areas as well as research areas for each document.
Taking advantages of the Semantic Web, such knowledge can be easily shared and reused by other systems for browsing or retrieval.
For example, we can use Protégé-2000 for browsing the scholarly ontology.
5. Scholarly Ontology Part of the generated concept
hierarchy of research areas
Fig. 12
We use the keyword that has the highest membership value to label the research area. Nevertheless, users can browse more information of each research area.
Performance of the ontology generation is evaluated based on the generated Research Area Hierarchy.
Firstly, we measure the typical recall, precision and F-measure to evaluate the clustering results.
Secondly, we use the relaxation error and the corresponding cluster goodness measure to evaluate the goodness of the conceptual clusters generated. We also show whether the use of fuzzy membership instead of crisp value can help improve cluster goodness.
Finally, we use the Average Uninterpolated Precision (AUP), which is a typical measure for evaluating a hierarchical construct, to evaluate the goodness of the generated concept hierarchy.
5. Scholarly Ontology Performance Evaluation
5. Scholarly Ontology Performance Evaluation Keyword attributes are descriptors for the generated
clusters, if more keywords are extracted and used, the more meaningful the cluster descriptors are constructed?
To verify this, we vary the number of keywords N extracted from documents from 2 to 10, and the similarity threshold Ts from 0.2 to 0.9 when performing conceptual clustering
We have classified the documents downloaded from ISI into classes based on their research themes. These classes are used as a benchmark to evaluate the clustering results in terms of recall, precision and F-measure.
5. Scholarly Ontology Performance Evaluation - Precision
Ts=0.2 Ts=0.3 Ts=0.4 Ts=0.5 Ts=0.6 Ts=0.7 Ts=0.8 Ts=0.9
N=2 0.64 0.64 0.64 0.64 0.63 0.62 0.62 0.62
N=3 0.66 0.66 0.66 0.66 0.64 0.62 0.62 0.62
N=4 0.73 0.77 0.78 0.79 0.74 0.69 0.68 0.68
N=5 0.8 0.84 0.84 0.85 0.81 0.75 0.75 0.75
N=6 0.9 0.9 0.9 0.9 0.86 0.8 0.79 0.8
N=7 0.96 0.94 0.93 0.93 0.9 0.86 0.84 0.84
N=8 0.95 0.94 0.92 0.93 0.9 0.86 0.83 0.83
N=9 0.94 0.93 0.92 0.92 0.89 0.86 0.83 0.83
N=10 0.93 0.92 0.91 0.91 0.89 0.85 0.83 0.83
Table 6. Performance results using precision measurement.
Precision implies accuracy of the clustering results. Table 6 shows that when
N is small, the precision is poor. It implies that “noisy” data in clusters.
The precision is improved when the number of extracted keywords is increased. However, this will also cause the recall to decrease as shown in Table 7.
5. Scholarly Ontology Performance Evaluation - Recall
Ts=0.2 Ts=0.3 Ts=0.4 Ts=0.5 Ts=0.6 Ts=0.7 Ts=0.8 Ts=0.9
N=2 0.99 0.99 0.99 0.99 0.99 0.98 0.98 0.98
N=3 0.99 0.99 0.99 0.99 0.98 0.98 0.97 0.97
N=4 0.98 0.98 0.97 0.97 0.94 0.95 0.94 0.94
N=5 0.89 0.87 0.87 0.88 0.87 0.89 0.89 0.89
N=6 0.8 0.81 0.83 0.83 0.83 0.85 0.85 0.85
N=7 0.81 0.8 0.82 0.82 0.83 0.84 0.86 0.86
N=8 0.79 0.79 0.81 0.82 0.82 0.84 0.85 0.85
N=9 0.76 0.77 0.8 0.8 0.81 0.83 0.84 0.84
N=10 0.73 0.75 0.78 0.78 0.79 0.81 0.83 0.83
Table 7. Performance results using recall measurement.
When the number of clusters is gradually increased, the efficiency of the clustering results will gradually be decreased.
5. Scholarly Ontology Performance Evaluation - F-measure
Ts=0.2 Ts=0.3 Ts=0.4 Ts=0.5 Ts=0.6 Ts=0.7 Ts=0.8 Ts=0.9
N=2 0.78 0.78 0.78 0.78 0.77 0.76 0.76 0.76
N=3 0.79 0.79 0.79 0.79 0.77 0.76 0.76 0.76
N=4 0.83 0.86 0.86 0.87 0.82 0.79 0.78 0.78
N=5 0.84 0.85 0.85 0.86 0.83 0.81 0.81 0.81
N=6 0.85 0.85 0.86 0.86 0.84 0.82 0.82 0.82
N=7 0.88 0.86 0.87 0.87 0.86 0.85 0.85 0.85
N=8 0.86 0.86 0.86 0.87 0.85 0.85 0.84 0.84
N=9 0.84 0.84 0.86 0.86 0.85 0.84 0.83 0.83
N=10 0.81 0.82 0.84 0.84 0.83 0.83 0.83 0.83
Average 0.83 0.83 0.83 0.84 0.82 0.81 0.8 0.8
Table 8. Performance results using F-measure measurement.
When N is low, the F-measure is quite poor. Nevertheless, the F-measure is stable and good when a sufficient number of keywords are extracted. The results also show that the F-measure tends to have the best performance when Ts = 0.5.
5. Scholarly Ontology Performance Evaluation – Relaxation Error
Relaxation error implies dissimilarities of items in a cluster based on attributes’ values.
Since conceptual clustering techniques typically use a set of attributes for concept generation, relaxation error is quite commonly used for evaluating the goodness of conceptual clusters.
5. Scholarly Ontology Performance Evaluation – Relaxation Error
The relaxation error RE of a cluster C is defined as
where A is the set of the attributes of items in C, P(xi) is the probability of item xi occurring in C and da(xi,xj) is the distance of xi and xj on attribute a.
The cluster goodness G of cluster C is defined as G(C) = 1 - RE(C).
),()()()(
1 1ji
Aa
n
i
n
j
aji xxdxPxPCRE
5. Scholarly Ontology Performance Evaluation – Relaxation Error Comparison of FFCA and COBWEB while the number of extracted keywords
is varied from 2 to 10
0
0.2
0.4
0.6
0.8
1
N=2 N=3 N=4 N=5 N=6 N=7 N=8 N=9 N=10
Fuzzy FCA
COBWEB
we vary the number of keywords extracted to observe the effect of the keyword generated on cluster goodness. Besides, since COBWEB is considered as one of the most popular techniques for conceptual clustering, we also apply COBWEB to the citation database to compare the performance. It shows that FFCA achieves better cluster goodness than COBWEB
5. Scholarly Ontology Performance Evaluation – AUP
Average Uninterpolated Precision (AUP) is defined as the sum of the precision value at each point (or node) in a hierarchical structure where a relevant item appears, divided by the total number of relevant items
Typically, AUP implies the goodness of a concept hierarchical structure.
For evaluating AUP, we have manually classified the downloaded documents into classes based on their research themes.
For each class, we extract 5 most frequent keywords from the documents in the class. Then, we use these keywords as inputs to form retrieval queries and evaluate the retrieval performance using AUP
5. Scholarly Ontology Performance Evaluation – AUP There are two ways to generate
document keywords. The first is to use the set of keywords, known as attribute keywords, from each conceptual cluster as the document keywords. The second is to use the keywords from each document as the document keywords. Then, we vectorize the document keywords and the input query, and calculate the vectors’ distance for measuring the retrieval performance.
5. Scholarly Ontology Performance Evaluation – AUP Two methods1. AUP measured using attribute keywords
Hierarchical Average Uninterpolated Precision (AUP(H)), as each concept inherits attribute keywords from its superconcepts.
2. AUP measured using keywords from documents Unconnected Average Uninterpolated Precision (AUP(U)).
5. Scholarly Ontology Performance Evaluation – AUP Fig. 14 shows the results for AUP(H) and AUP(U) using different numbers of
extracted keywords N.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
N=2 N=3 N=4 N=5 N=6 N=7 N=8 N=9 N=10
AUP (H)
AUP (U)
Fig. 14
It shows that when N gets larger, the performance
on AUP(H) and AUP(U) gets better. In addition,
performance on AUP(H) is generally better than
AUP(U). It means that the attribute keywords
generated for conceptual clusters are appropriate
6. Semantic Helpdesk Application
Introduction Developed in collaboration with a multinational company,
the Semantic Help-Desk Environment comprises the Web Service Requester, Matchmaking Agent and Web Service Provider.
The focus is on the fuzzy ontology generation process that generates Machine Service Ontology from a customer service database.
This approach enables individual machine service knowledge to be shared over the Semantic Web. Thus, machine service knowledge from different machines or models provided by different manufacturers can be shared and integrated. This is important as many customers may have different types of machines and models from different manufacturers.
6. Semantic Helpdesk Application
Introduction - Web Service Requester A kind of Web Service that enables access to
customer support for machine services. Instances of the Web Service Requester can be
created from a Web Requester Server where its address is accessible for all users through the Web.
When encountering a problem, a user can use the Web to connect the Web Requester Server in order to create an instance of the Web Service Requester.
The created instance runs as a web-based program. That is, it can use the Web to interact with the user and other programs.
6. Semantic Helpdesk Application
Introduction - Web Service Requester Through the Web, the Web Service Requester instance
provides an interface for the user to enter their reported problem.
Through the interface, the user can specify the encountered fault as a textual string. The user is also required to enter the code of the machine model. The given information is used to form a profile for the Web Service Requester.
The profile is then sent as a request to the Matchmaking Agent to seek a potential Web Service Provider for solving the problem
6. Semantic Helpdesk Application
Introduction - Web Service Provider
It offers its machine service support as a Web Service extended with ontology capabilities.
There are probably many instances of a Web Service Provider existing concurrently on the Internet.
An instance of the Web Service Provider can be considered as a program that can access the Machine Service Ontology to retrieve machine service knowledge for a given reported problem.
An instance of the Web Service Provider can interact with other programs. That is, it can be called by other programs and return the outputs to the calling programs.
Instances of the Web Service Provider must be registered with a specific agent known as the Matchmaking Agent that serves as a registry and look-up service.
6. Semantic Helpdesk Application
Introduction - Web Service Provider
Each instance of the Web Service Provider also provides a profile file that describes its parameters and capabilities. XML is used in most Web Services to represent the information contained in the profiles.
However, traditional XML lacks the capabilities of representing semantic information.
To overcome this problem, the Web Service Provider uses ontology-based service description language OWL-S (formerly DAML-S) to describe information in its profile. Hence, we describe the service as OWL ontology and its intentional information can be fully understood by other programs.
6. Semantic Helpdesk Application
Introduction - Matchmaking Agent
When the Matchmaking Agent receives machine service requests from the Web Service Requester, it locates the appropriate Web Services that can fulfill the request
6. Semantic Helpdesk Application
Overview
Client Web Browser
Internet
WebService
Requester
MatchmakingAgent
Client Web BrowserCustomer
Web ServiceProvider
Manufacturer
Machine ServiceOntologies
CustomerService
Databases
Web ServiceProvider
Manufacturer
CustomerService
Databases
Customer
Machine ServiceOntologies
6. Semantic Helpdesk Application
Customer Service Database The customer service database contains 9000
service records, each record consists of fault-condition and checkpoint information
Fault-condition contains the service engineer’s description of the machine fault. Checkpoint information indicates the suggested actions to be carried out to repair the machine based on the occurred fault-condition given by the customer
6. Semantic Helpdesk Application
Customer Service Database
Fault-condition 3008 PCB CARRY MISS ERROR. PCB WAS NOT TRANSFERRED BY THE CARRIER DURING LOADING BUT STAYED AT THE DETECTION POSITION OF PCB DETECTION SENSOR 2.
Checkpoint group: AVF_CHK003
Priority Checkpoint description Help file
1 CONFIRM WHETHER THE CARRY GUIDE PINS ARE IN LINE WITH PCB. AVF_CHK007-1.GIF
2 CONFIRM WHETHER THE PCB IS IN CORRECT DIRECTION. AVF_CHK007-2.GIF
3 CONFIRM THE POSITION OF THE GUIDE LOWER LIMIT SENSOR. (I/O 0165) AVF_CHK007-3.GIF
4 CONFIRM THE TIMING FOR PCB 2 DETECT SENSOR. AVF_CHK007-4.GIF
6. Semantic Helpdesk Application
Machine Service Ontology Generation Apply FOGF to obtain Fuzzy Fault Concept
Lattice → Fault Concept Hierarchy → Machine Service Ontology
Any fault
{“Anvil”} {“Drive”} {“Cutter”} {“Component”}
{“Anvil”, “Joint”, “CannotEngage”}
{“Anvil”, “Shaky”, “Unit”}
{“Anvil”, “Drive”, “CannotOpen”,”Pitch”}
{“Cutter”, “Drive”, “CannotOpen”,”Axis”}
{“Cutter”, “Component”,“Cut”,”Insertion”}
{ “Component”,“Float”,”PCB”}
Part of the Fault Concept Hierarchy of the machine model AV_2011
6. Semantic Helpdesk Application
Machine Service Ontology Generation The generation process creates classes, relations
and instances for the service ontology. The machine fault service knowledge stored in
the Customer Service Database is known as non-taxonomy knowledge, whereas the machine fault hierarchy knowledge from the Fault Concept Hierarchy is called taxonomy knowledge. These two types of knowledge are combined to form the Machine Service Ontology.
6. Semantic Helpdesk Application
Machine Service Ontology in OWL
<rdf:RDF> xmlns:owl ="http://www.w3.org/2002/07/owl#" xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#" xmlns:daml="http://www.w3.org/2001/10/daml+oil#" <owl:Ontology rdf:about=””> <owl:versionInfo>v 1.0 2004-12-07 19:06:40 </owl:versionInfo> <rdfs:label> Machine Service Ontology </rdfs:label> </owl:Ontology> <owl:Class rdf:ID=”Machine”/> <owl:Class rdf:ID=”Check_point”> <owl:Class rdf:ID=”Machine_Fault_Cluster”> … <owl:Class rdf:ID=”Machine_Fault_Cluster_1”> <owl:rdfLabel=”Anvil”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/> </rdf:subClassOf>
<owl:ObjectProperty rdf:ID="Anvil"> <rdfs:range rdf:resource="&xsd;Float"/>
</owl:ObjectProperty> </owl:Class> <owl:Class rdf:ID=”Machine_Fault_Cluster_2”> <owl:rdfLabel=”Cutter”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/> </rdf:subClassOf>
<owl:ObjectProperty rdf:ID="Cutter"> <rdfs:range rdf:resource="&xsd;Float"/>
</owl:ObjectProperty> </owl:Class> <owl:Class rdf:ID=”Machine_Fault_Cluster_3”> <owl:rdfLabel=”Anvil_Cutter”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster_1”/> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster_2”/> </rdf:subClassOf> </owl:Class> … <owl:Class rdf:ID=”Machine_Fault”>
<owl:ObjectProperty rdf:ID="occur_on"> <rdfs:domain rdf:resource="#Machine"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="inspect_to"> <rdfs:domain rdf:resource="#Checkpoint"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="belong_to"> <rdfs:domain rdf:resource="#Machine_Fault_Cluster"/> </owl:ObjectProperty>
</owl:Class> </rdf:RDF>
6. Semantic Helpdesk Application Experiments Data stored in the database was divided into 10
subsets. Each subset was sequentially used as a testing set while others were used for generating conceptual clustering.
Keywords in fault conditions in each testing set were extracted and fuzzified as testing fuzzy queries.
To verify whether fuzzy queries can improve the retrieval performance, the keywords extracted are also used for retrieving without membership as crisp queries for comparison.
6. Semantic Helpdesk Application Experiments Manually classified faults in each machine
model into groups based on the machine components in which the fault occurred.
Retrieval accuracy is evaluated based on the number of the retrieved faults that are in the same classified group with the query.
6. Semantic Helpdesk Application Performance Measures Recall, Precision and F-measure
correctconditionsfaultofnumbertotal
correctandretrievedconditionsfaultofnumberrecall
retrievedconditionsfaultofnumbertotal
correctandretrievedconditionsfaultofnumberprecision
precisionrecall
precisionrecallmeasureF
**2
6. Semantic Helpdesk Application
Retrieval Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Confidence Threshold
Re
call Crisp Query
Fuzzy Query
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Confidence Threshold
Pre
cis
ion Crisp Query
Fuzzy Query
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Confidence ThresholdF
-me
asu
re Crisp Query
Fuzzy Query
6. Semantic Helpdesk Application Performance Comparison Retrieval accuracy compared with four other
techniques Two variations of k-nearest neighbor (kNN)
technique. The first variation (kNN1) is based on vector’s normalized Euclidean distance to perform the retrieval. The second (kNN2) makes use of fuzzy-trigram technique to do so.
Two kinds of artificial neural networks (ANN): the supervised learning vector quantization (LVQ3) neural network and the unsupervised Self-Organizing Maps (SOM).
6. Semantic Helpdesk Application Performance Comparison
Retrieval Technique Retrieval Accuracy kNN1 81.4% kNN2 77.6% LVQ3 93.2% SOM 90.3%
FFCA with Crisp Query 84.6% FFCA with Fuzzy Query 93.0%
(Confidence Threshold = 0.2)
•FFCA with fuzzy query outperformed kNN. •LVQ3 performed marginally better, but requires prior expert knowledge for training, which would be a problem when dealing with large amounts of uncertainty information.•The proposed technique can generate a concept hierarchy from the clusters, which is important information for generating a corresponding meaningful ontology.
7. Summary
Proposed a framework for fuzzy ontology generation with uncertainty information
FOGF consists of the following steps: Fuzzy Formal Concept Analysis Fuzzy Conceptual Clustering Fuzzy Ontology Generation Semantic Representation Conversion
7. Summary
FOGF can represent uncertainty information and construct a concept hierarchy from the uncertainty information
Apart from constructing scholarly ontology from citation database, FOGF has also been used to generate Machine Service Ontology for Semantic Help-desk and Reuters News Topic Themes Ontology
Also, the scholarly ontology has been partially used to construct a Scholarly Semantic Web, a Semantic Web-based information retrieval system to support scholarly activities in the Semantic Web environment
References(Not intended to be
Exhaustive) Ontology Editors[1] http://protege.stanford.edu/[2] S. Bechhofer, I. Horrocks, P. Patel-Schneider, and S. Tessaris, "A proposal for a description
logic interface," in Proceedings of the International Workshop on Description Logics, pp. 33-36, 1999.
Large corpora[3] E. Morin, “Automatic acquisition of semantic relations between terms from technical corpora,"
in Proceedings of the Fifth International Congress on Terminology and Knowledge Engineering (TKE-99), (Vienna, Austria), 1999.
[4] M. Hearst, “Automatic acquisition of hyponyms from large text corpora," in Proceedings of the Fourteenth International Conference on Computational Linguistic, (France), 1992.
Knowledge base of rules [5] P. Compton and A. Jansen, Knowledge Acquisition, ch. A Philosophical Basis for Knowledge
Acquisition, pp. 241-257.
Statistical approaches [6] H. Suryanto and P. Compton, “Discovery of ontologies from knowledge bases," in Proceedings
of The 5th International Conference on Knowledge Capture (Y. Gil, M. Musen, J. Shavlik, and Victoria(, eds.), (Canada), pp. 171-178, 2001.
Semi-structured schemata based on Graphs[7] A. Deitel, C. Faron, and R. Dieng, “Learning ontologies from RDF annotations,“ in Proceedings
of the IJCAI Workshop in Ontology Learning, (Seattle,USA), 2001.
References(Not intended to be
Exhaustive) Semi-structured schemata based on Statistics[8] C. Papatheodorou, A. Vassiliou, and B. Simon, “Discovery of ontologies for learning resources
using word-based clustering," in Proceedings of ED-MEDIA 2002, (Denver,USA), 2002.
LSD[9] A. Doan, P. Domingos, and A. Levy, “Learning source descriptions for data integration," in
Proceedings of the Third International Workshop on the Web and Databases, pp. 81-86, 2000.
Database schema [10] P. Johannesson, “A method for transforming relational schemas into conceptual schemas," in
Proceedings of the 10th International Conference on Data Engineering (M. Rusinkiewicz, ed.), (Houston, USA), pp. 115-122, IEEE Press, 1994.
[11] D. Rubin, M. Hewett, D. Oliver, T. Klein, and R. Altman, “Automatic data acquisition into ontologies from pharmacogenetics relational data sources using declarative object de¯nitions and XML," in Proceedings of the Paci¯c Symposium on Biology (R.B.Altman, A. Dunker, L. Hunter, K. Lauderdale, and T. Klein, eds.), (Lihue, HI), 2002.
NLP[12] D. Lonsdale, Y. Ding, D. Embley, and A. Melby, “Peppering knowledge sources with SALT;
boosting conceptual content for ontology generation," in Proceedings of the AAAI Workshop on Semantic Web Meets Language Resources, 2002.
[13] D. I. Moldovan and R. C. Girju, \An interactive tool for the rapid development of knowledge bases," International Journal on Arti¯cial Intelligence Tools (IJAIT), vol. 10, no. 1-2, 2001.
References(Not intended to be
Exhaustive) Wordnet[14] http://wordnet.princeton.edu/wordnet/download/
Text-to-Onto[15] A. Maedche and S. Staab, “Ontology learning for the Semantic Web," IEEE Intelligent
Systems, Special Issue on the Semantic Web, vol. 16, no. 2, 2001.
Keyword frequencies[16] A. Faatz and R. Steinmetz, “Ontology enrichment with texts from the WWW,“ in In
Proceedings of Semantic Web Mining 2nd Workshop at ECML/PKDD-2002, (Helsinki, Finland), 2002.
[17] R. Navigli, P. Velardi, and A. Gangemi, “Ontology learning and its application to automated terminology translation," IEEE Intelligent Systems, vol. 18, no. 1, 2003.
Clustering / COBWEB[18] P. Clerkin, P. Cunningham, and C. Hayes, \Ontology discovery for the Semantic Web
using hierarchical clustering," in Proceedings of Workshop at ECML/PKDD-2001, (Germany), 2001.
Mo'K[19] G. Bisson and C. Nedellec, \Designing clustering methods for ontology building: The
Mo'K workbench," in Proceedings of the Workshop on Ontology Learning, 14th European Conference on Arti¯cial Intelligence, ECAI'00 (S. Staab, A. Maedche, C. Nedellec, and P. WiemerHasting, eds.), (Germany), 2000.
References(Not intended to be
Exhaustive) ESKIMO[20] S. Kampa, T. Miles-Board, and L.Carr, \Hypertext in the Semantic Web," The ACM
Conference on Hypertext and Hypermedia, pp. 237-238, 2001.
Scholarly Ontology Project [21] V. Uren, S. Shum, C. Mancini, and G. Li, “Modelling naturalistic argumentation in
research literatures," in Proceedings of the 4th Workshop on Computational Models of Natural Argument, (Valencia, Spain), 2004.
OAI[22] http://www.openarchives.org/
FCA[23] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations.