ontology generation and applications dr. a.c.m. fong, ceng professor of computer engineering school...

Ontology Generation and

ApplicationsDr. A.C.M. Fong, CEng

Professor of Computer Engineering

School of Computing and Mathematical Sciences Faculty of Design and Creative Technologies

Auckland University of Technology

[email protected]

[email protected] 2

Contents1. Introduction – Semantic Web and Ontology2. Related Work – Ontology Generation3. Toward Automated Ontology Generation4. Fuzzy Ontology Generation Framework5. Application 1 – Scholarly Info6. Application 2 – Service Helpdesk

[email protected] 3

1. IntroductionSemantic Web

The basis for the Semantic Web is on its ability to represent real-life domains accurately so that it enables programs to completely understand the environment in which they operate.

In summary, Semantic Web provides the following benefits: SWeb offers an expressive metadata model to represent

data, so that data can be managed effectively. Programs can understand the semantic concepts described

in metadata used on Semantic Web. Hence, knowledge carried on the Semantic Web can be shared and reused among different programs.

Users can interact with programs using a semantic query language to specify their requests and thereby improving the retrieval performance.

Deductive mechanism that is used to derive new information from existing information can be described clearly, so that knowledge can be reasoned with efficiently.

[email protected] 4

1. IntroductionSemantic Web Architecture

[email protected] 5

1. IntroductionSemantic Web Architecture - Layers

Foundation Layer. Semantic Web uses Uniform Resource Identifier URI to identify resources and uses Unicode to encode the documents.

Schema Layer. This layer comprises XML + NS (Namespace) + xmlschema layer and the RDF + rdfschema layer.

This layer defines objects and classes, their relations and constrains. The XML Schema (XMLS) and RDF Schema (RDFS), which are based on XML and RDF respectively, are used for these layers.

RDFS has widely been used to describe classes at the Schema Layers.

[email protected] 6


Ontology Layer. This layer provides constructs on using meta-information to represent domain knowledge.

In this layer, information is represented as ontology, which is adopted by the Semantic Web to define knowledge.

Logic Layer. This layer infers more knowledge from the existing knowledge. It can be integrated with the Ontology Layer.

In this layer, concepts and relationships defined in lower layers are converted into Turing-complete logic languages in order to generate new knowledge.

[email protected] 7


Proof Layer. This layer provides a mechanism to check whether a statement is true or not.

Trust Layer. This Layer provides a mechanism which resolves conflicts between knowledge carried by the Semantic Web to form the "Web of Trust"

Digital Signature Layer. This layer uses public key cryptography to secure documents.

[email protected] 8

1. IntroductionOntology – Definition

Ontology has different definitions. A commonly cited definition defines ontology as a formal, explicit specification of a shared conceptualization.

Conceptualization refers to an abstract model of phenomena in the world by having identified the relevant concepts of those phenomena.

Explicit means that the type of concepts used, and the constraints on their use are explicitly defined.

Formal: should be machine readable. Shared: should capture consensual knowledge

accepted by the communities.

[email protected] 9

1. IntroductionOntology Research

Ontology is regarded as a standard conceptual model for knowledge representation, especially on Semantic Web.

The term ontology engineering has been proposed to imply ontology-related research in computer science

Current interesting issues on ontology engineering include ontology generation, ontology mapping, ontology integration and ontology versioning.

This presentation focuses on ontology generation.

[email protected] 10

1. IntroductionOntology Description Languages

Ontology is described using an ontology description language.

Ontology description languages are based on Web metadata description languages, which can be classified into the following three groups:

HTML-based XML-based RDF- based


1. IntroductionHTML-based Ontology Description Languages

The tags supported by traditional Web are sufficient to represent some semantic knowledge.

Simple HTML Extension (SHOE) and Ontobroker have embedded additional tags into HTML to represent knowledge.

However, HTML does not support self-defined tags. Therefore, HTML-based approach is difficult to define classes for ontology.

Hence, XML-based ontology description languages have been proposed to overcome this limitation.


1. IntroductionXML-based Ontology Description Languages

These languages are usually based on XML Schema (XMLS) or Document Type Definition (DTD).

DTD allows users to define new markup types to describe information. Therefore, users can define ontology classes using DTD.

Moreover, XMLS supports the definition of relations between classes.

Thus, XMLS and DTD can be used directly to embed semantic information.

However, since XML actually only renders syntactic support for knowledge representation, XML-based ontology description languages face the following problems when representing knowledge


1. IntroductionXML-based Ontology Description Languages

A mechanism to define some relationships that are usually central in ontologies such as is-a or element-of relationships is lacking in XML.

XML does not support any notion of inheritance, which is an important attribute in ontologies.

In XML, concepts are defined through tags, which can be either a string or a combination of other nested tags. Such mechanism may not be sufficient for defining concepts in ontology, which may require richer data structures to be represented.

In XML, the order of tags appearing in a document must be previously defined. In contrast, the ordering of attribute description does not matter on ontology.


1. IntroductionRDF-based Ontology Description Languages

RDF extends XML to become a standard for knowledge representation.

In addition, RDF Schema (RDFS) can be used to define classes and class hierarchies in a domain.

The standardization supported by RDF provides two important contributions:

A standard set of modeling primitives (e.g. class, instance, etc.) and their relationships (e.g. subclass) are provided.

A standardized syntax for writing ontologies is supported.

Popular RDF-based ontology description languages include DARPA Agent Markup Language (DAML), Ontology Inference Language (OIL), DAML+OIL and Web Ontology Language (OWL)


1. Introduction DARPA Agent Markup Language

DAML or DAML-ONT extends RDFS to represent ontology using the object-oriented approach.

It embeds some object-oriented concepts to represent classes. Thus, the class representation of DMAL-ONT is better than RDF.

Example of DAML-ONT to represent the class "Journal", which is a subclass of the class "Publication Medium", but is disjoint with classes "Conference" and "Workshop" (i.e. an object which belongs to class "Journal" can not belong to classes "Conference" or "Workshop"

<Class ID="Journal"><subClassOf resource="#Publication Medium"= >

<disjointFrom resource="#Conference"= ><disjointFrom resource="#Workshop"= >

< =Class>


1. IntroductionOntology Inference Language

OIL extends RDFS to represent ontology. It is designed based on three criteria: Frame-based. It supports frames to define classes

and properties of classes. Thus, class contents can be described more informatively (e.g. constraints can be used for class properties)

Description Logic. It describes knowledge using logic rules. Thus, knowledge is represented mathematically and can be processed by programs.

Uses Web Standard. It is based on XML and RDFS.


1. IntroductionOntology Inference Language

<rdfs:Class rdf:ID="animal"= ><rdfs:Class rdf:ID="plant">

<rdfs:subClassOf><oil:NOT>

<oil:hasOperand rdf:resource="#animal"= ><oil:NOT= >

< =rdfs:subClassOf>< =rdfs:Class><rdfs:Class rdf:ID="tree">

<rdfs:subClassOf rdf:resource="#plant">< =rdfs:Class> Class "animal" is defined, followed by class "plant", which is

defined with the operator "NOT" used to state that it is strictly not identical with class "animal“ (i.e. objects which belong to class "animal" can not belong to class "plant" and vice-versa).

Finally, class "tree" is defined as a subclass of "plant".


1. IntroductionDAML vs. OIL

Compared with DAML, OIL can represent class properties better, but DAML can represent class relationships more clearly.

Hence, they can be combined to form a better ontology description language

DAML + OIL It defines class relationships based on DAML. Class properties are defined in a similar way as

OIL. Hence, DAML+OIL takes the advantages of both

DAML and OIL.


1. IntroductionWeb Ontology Language

OWL is extended from DAML+OIL to allow users to define various types of relationships between classes.

Properties can also be defined using additional constructs in OWL.

OWL has three sublanguages OWL Lite OWL DL OWL Full.



Even though there is the same OWL syntax used among these sublanguages, they have a little difference in design aimed at various communities of implementers and users:

OWL Lite only primarily supports classification hierarchy and simple constrains when designing classes.

OWL DL includes all OWL language constructs but they can be used only under certain restriction (e.g. a class cannot be an instance of another class).

OWL Full allows all OWL language constructs to be used without any restriction.



<rdf:RDF>

xmlns:owl ="http://www.w3.org/2002/07/owl#"xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#"xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#"xmlns:daml="http://www.w3.org/2001/10/daml+oil#"

<owl:Ontology rdf:about="Scholarly Information"><owl:versionInfo>v 1.0 2009-12-07 19:06:40</owl:versionInfo>

< =owl:Ontology><owl:Class rdf:ID="Concept1">

<owl:rdfLabel="Data Mining">< =owl:Class><owl:Class rdf:ID="Concept2">

<owl:rdfLabel="Fuzzy Logic">< =owl:Class>< =owl:Class rdf:ID="Concept2">< =owl:Class rdf:ID="Concept3">

<owl:rdfLabel="Data Mining, Fuzzy Logic"= ><rdf:subClassOf>

<owl:Restriction><owl:onProperty rdf:resource="Concept1"><owl:onProperty rdf:resource="Concept2">

< =rdf:subClassOf>< =rdf:RDF>

Header Info

Ontology Name and Version

3 classes: Concept1 (labelled Data mining), Concept2 (labelled Fuzzy Logic) and Concept3. Concept3 is a subclass of both Concept 1 and Concept2.


2. Related WorkOntology Generation

Ontology uses classes, which contain attributes, to represent concepts.

Ontology also supports taxonomy and non-taxonomy relations between classes.

Although editing tools such as Protege [1] and OilEd [2] have been developed to help users to create and edit ontology, it is a tedious task to manually derive ontology from data.


2. Related WorkOntology Generation – Approaches

Ontology can be generated from various types of data, mostly textual.

Large corpora [3,4] are considered as good sources for mining knowledge for constructing ontology, since the information in the corpus is usually well annotated. Therefore, it can be easily processed by other programs.

Ontology can also be generated from a knowledge base of rules [5], which is represented as a tree with rules residing at tree nodes. Statistical approaches have been used to estimate the existence of relationships between entities involved in rules [6].


2. Related WorkOntology Generation – Approaches

When knowledge is represented in semi-structured schemata such as XML and RDF, its contents can easily be parsed by programs; techniques have been proposed to generate ontology from semi-structured schemata based on Graph Theory [7] and statistical approaches [8].

Learning Source Description (LSD) proposed [9] to generate ontology from any arbitrary formalisms of semi-structured schemata.

Entity-Relationship model used in database schema has also been adopted as an information source for generating ontology [10,11].


2. Related WorkOntology Generation –Textual Data

For textual data, ontology concepts can be extracted efficiently using Natural Language Processing (NLP) techniques [12,13].

NLP for preprocessing the textual data in order to extract significant keywords.

WordNet [14] can be used to improve accuracy of ontology generated by NLP-based techniques.

However, the NLP techniques have difficulty in finding semantic relationships among the keywords.

Data mining techniques can be combined with NLP to improve the efficiency of ontology generation. In Text-to-Onto [15], association rules are used to ¯find associative relations between keywords, which are used to construct non-taxonomy relations for the ontology.


2. Related WorkOntology Generation –Textual Data

Keywords' frequencies are often used in statistical approaches [16,17] to identify significant keywords that can be used to represent a certain concept.

Clustering techniques have also been applied to generate ontology from textual data [18].

Using significant keywords extracted from textual data, clustering techniques can cluster documents and interpret topics from the generated clusters.


2. Related WorkOntology Generation –Clustering

Clustering can be used to mine hidden knowledge from data to construct an ontology. It can also be used to enrich existing ontology.

Traditional clustering techniques are useful for generating non-taxonomy relations for ontology.

In particular, conceptual clustering techniques are powerful clustering techniques that can conceptualize clusters and construct a concept hierarchy of clusters useful for generating taxonomy relations for ontology.

E.g. approach based on COBWEB [18] that can generate taxonomy relations among concepts on a domain for ontology generation.

Mo'K [19] is a system that can obtain taxonomy relations from tagged text using conceptual clustering.


2. Related WorkOntology Applications – Scholarly Info

In E-Scholar Knowledge Inference MOdel (ESKIMO) [20], knowledge on scholarly publications is represented as a simple ontology, known as OntoPortal, which is manually developed and maintained.

OntoPortal describes and provides links to other external research pages on the Web. Hypertext links between the web pages are also described in the OntoPortal ontology.

ESKIMO allows users to retrieve scholarly information from the constructed ontology by using queries represented as Prolog-like rules.



In the Scholarly Ontology Project [21], a digital library Web server is constructed using Semantic Web technologies in order to support scholarly retrieval.

Developed using a collaborative approach in which researchers will submit their documents in a specifically structured format.

As such, the contents of the submitted documents can be further processed in the system and converted into scholarly ontology accordingly.



In the Research in Semantic Scholarly Publishing (RSSP) project, scientific publications are collected from online archives such as the Open Archive Initiative (OAI) [22].

Information of the documents (e.g. their authors, titles, citations, publishers, etc.) is extracted, indexed and converted into ontology formalism.

DAML+OIL is used to annotate the ontology as Semantic Web pages to support scholarly retrieval


2. Related WorkSummary

Many techniques to construct ontology from various data types/sources; mainly textual data

Traditionally, NLP techniques are used to analyze textual data.

Recently, data mining techniques have been incorporated into NLP to further discover hidden knowledge from textual data.

Conceptual clustering is an advanced data mining technique that can organize data in a hierarchical conceptual structure.

Thus, conceptual clustering is a useful technique to discover knowledge for generating ontology from textual data.


3. Toward Automated Ontology GenerationBasics

Initial focus on Scholarly info Scholarly ontology generated directly from

explicit information on scientific publications (e.g. their titles, authors, citations, etc.).

Other advanced scholarly knowledge such as research experts and areas are usually inferred manually by human experts.


3. Toward Automated Ontology GenerationBasics

To construct scholarly ontology from citation database, we use data mining techniques to discover hidden knowledge in the database.

Data mining techniques include Context-based Cluster Analysis (CCA) and Fuzzy Concept Hierarchy Generation (FCHG)

Discovered knowledge then converted and integrated into the ontology formalism.

As such, apart from the implicit information available on scientific publications, Scholarly Ontology can also support other useful scholarly retrieval functions such as research experts finding and trends detection


3. Toward Automated Ontology GenerationContext-based Cluster Analysis

CCA is based on Formal Concept Analysis (FCA) [23] technique.

FCA provides a formal model, known as formal context, to represent relations between objects and attributes in a data set.

We use formal contexts to represent multiple resultant clustering data.

Then, relations between the formal contexts are analyzed to find the relations between the corresponding resultant clustering data


3. Toward Automated Ontology GenerationFuzzy Concept Hierarchy Generation

Concept hierarchy is a data structure useful for knowledge presentation.

Widely used in data mining applications. Size of a concept hierarchy may be large to

reflect the knowledge in a domain precisely. Manual construction may be difficult and tedious. Need conceptual clustering



Many conceptual clustering techniques organize knowledge as a concept hierarchy. It may not be sufficient for representing information in a real domain.

FCA, which is a data exploratory technique, supports concept lattice that provides a more informative conceptual model for representing knowledge.

FCA-based conceptual clustering techniques are potentially useful for constructing taxonomy knowledge of ontology.

However, the typical FCA-based conceptual clustering techniques do not support uncertainty information.



Traditional FCA-based conceptual clustering approaches can’t represent vague information… Need fuzziness

L-Fuzzy context uses linguistic variables to represent uncertainty in the context.

But needs human interpretation to define linguistic variables.

Fuzzy concept lattice generated from L-fuzzy context usually causes a combinatorial explosion of concepts (compared to traditional concept lattice)



We combine fuzzy logic and FCA as Fuzzy Formal Concept Analysis (FFCA).

In FFCA, uncertainty information is directly represented by a real number of membership value in the range of [0,1].

Linguistic variables are no longer needed. Compared to fuzzy concept lattice generated from

L-fuzzy context, the fuzzy concept lattice generated using FFCA will be simpler in terms of the number of formal concepts.

It also supports a formal mechanism for calculating concept similarities.

Based on FFCA, we propose the Fuzzy Conceptual Clustering technique in FCHG to generate fuzzy concept hierarchy.


4. Fuzzy Ontology Generation FrameworkFuzzy Ontology

Application of fuzzy logic offers a possible solution for dealing with uncertainty information

Fuzzy ontology is generated and used in text retrieval and search engines, where membership values are used to evaluate the similarities between the concepts in a concept hierarchy

Manual generation of fuzzy ontology from a predefined concept hierarchy is a difficult and tedious task that often requires expert interpretation.


4. Fuzzy Ontology Generation FrameworkIntroduction

Efficient method for generation of concept hierarchy and fuzzy ontology is highly desirable

We propose a Fuzzy Ontology Generation Framework (FOGF) that can automate fuzzy ontology generation from uncertainty data based on Formal Concept Analysis (FCA) theory

Generated fuzzy ontology is mapped to a semantic representation in OWL


4. Fuzzy Ontology Generation FrameworkOverview

Fuzzy Formal Concept Analysis incorporates fuzzy logic into Formal Concept Analysis to represent vague information

Concept Hierarchy Generation clusters the fuzzy concept lattice generated by FFCA to construct a concept hierarchy in two steps: Fuzzy Conceptual Clustering and Hierarchical Relation Generation

Fuzzy Ontology Generation constructs fuzzy ontology from a fuzzy context using the concept hierarchy created by fuzzy conceptual clustering

Semantic Representation Conversion – make knowledge accessible and sharable on the Web environment. Use OWL

Concept HierarchyGeneration

Fuzzy FormalConcept Analysis

Fuzzy OntologyGeneration

Fuzzy Ontology

UncertaintyInformation

Fuzzy Concept Lattice Concept Hierarchy

SemanticRepresentation

Conversion

Semantic Web


4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis

Definition (Fuzzy Formal Context)A fuzzy formal context is a triple K =(G, M, I = (G M)) where G is a set of objects, M is a set of attributes,

and I is a fuzzy set on domain G M. Each relation (g, m) I has a membership value

(g,m) in [0,1].



Fuzzy formal context can be represented as a cross-table (Table 1)

An α-cut can be set to eliminate relations with low membership values, e.g. α = 0.5 (Table 2)

The context has 3 objects representing 3 documents, D1, D2 and D3. It also has 3 attributes, “Data Mining”, “Clustering” and “Fuzzy Logic” representing 3 research topics. The relationship between an object and an attribute is represented by a membership value in [0, 1].

Data Mining Clustering Fuzzy Logic

D1 0.8 0.12 0.61

D2 0.9 0.85 0.13

D3 0.1 0.14 0.87

Data Mining Clustering Fuzzy Logic

D1 0.8 - 0.61

D2 0.9 0.85 -

D3 - - 0.87



Definition (Fuzzy Representation of Object)

Each object O in a fuzzy formal context K can be represented by a fuzzy set (O) as where {A1, A2,…, Am} is the set of attributes in K and µi is the membership of O with attribute Ai in K. (O) is called the fuzzy representation of O.



Generally, we can consider the attributes of a formal concept as the description of the concept.

Thus, the relationships between the object and the concept should be the intersection of the relationships between the objects and the attributes of the concept

Since each relationship between the object and an attribute is represented as a membership value in fuzzy formal context, the intersection of these membership values should be the minimum of these membership values, hence…



Definition (Fuzzy Formal Concept)Given a fuzzy formal context K =(G, M, I) and a confidence

threshold T, we define A*= {m M | g A: (g, m) T} for A G and B* = {g G | m B: (g,m) T} for B M. A fuzzy formal concept (or fuzzy concept) of a fuzzy formal context (G, M, I) with a confidence threshold T is a pair (Af =(A), B) where A G, B M, A* = B and B* = A. Each object g (A) has a membership g defined as

g = min (g,m) m B

where (g,m) = membership value between object g and attribute m defined in I. If B = {} then g = 1 for every g. A and B are the extent and intent of the formal concept ((A), B) respectively.



This version of FFCA as presented in these Definitions preserves differently continuous values of objects’ memberships, crucial for calculating concepts’ similarities.

In a formal context, a concept can have many superconcepts and subconcepts. However, the similarities of a concept to its superconcepts and subconcepts are different.

With fuzzy concept lattice, we can make use of the fuzzy set theory to calculate the similarities between a concept and its subconcepts.



Definition (Fuzzy Formal Concept Cardinality)

Since the fuzziness of a fuzzy formal concept is represented by membership values of objects of the concept, the cardinality of a fuzzy formal concept Kf = ((A), B) is defined as |Kf| = |(A)|.



Definition (Fuzzy Formal Concept Similarity)

The similarity of a fuzzy formal concept Kf1 = ((A1), B1) and its subconcept Kf2 = ((A2), B2) is defined as E(Kf1,Kf2) = E((A1), (A2)).



Traditional concept lattice generated from Table 1 without membership values

C1

{}

{D1,D2,D3}

{D1,D2}}{“Data Mining”} {“Fuzzy Logic”}{D1,D3}

{“Data Mining”,“Clustering”}

{D2}

{“Data Mining”,“Clustering”,

“Fuzzy Logic”}

{}

{D1} {“Data Mining”,“Fuzzy Logic”}

C2

C3 C4

C1

{}

{D1,D2,D3}

{D1(0.8),D2(0.9)}{“Data Mining”} {“Fuzzy Logic”}{D1(0.61),

D3(0.87)}


{D2(0.85)}


“Fuzzy Logic”}

{}

{D1(0.61)} {“Data Mining”,“Fuzzy Logic”}

0.00 0.00

0.50.410.35

0.00 0.00

C2

C3 C4

Fuzzy concept lattice generated from fuzzy formal context in Table 2 (similarities

between concepts shown) Fig. 2

Fig. 3


4. Fuzzy Ontology Generation Framework Overview




Fuzzy Ontology




Conversion

Semantic Web


4. Fuzzy Ontology Generation Framework Step 2 Concept Hierarchy Generation

Concept Hierarchy Generation clusters the fuzzy concept lattice generated by FFCA to construct a concept hierarchy in two steps: Fuzzy Conceptual Clustering and Hierarchical Relation Generation


4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering

Compared to traditional clusters, the conceptual clusters generated have the following properties:

Each conceptual cluster is considered as a human interpretable concept in the domain of the fuzzy concept lattice

Each conceptual cluster is a sublattice extracted from the fuzzy concept lattice

A formal concept must belong to at least one conceptual cluster e.g. a scientific document can belong to more than one research area



Conceptual clusters are generated based on the idea at if a formal concept A belongs to a conceptual cluster R, then its subconcept B also belongs to R if B is similar to A. We can use a similarity confidence threshold Ts to determine whether two concepts are similar or not.



Definition (Conceptual Cluster). A conceptual cluster of a concept lattice K

with a similarity confidence threshold Ts is a sublattice SK of K which has the following properties:

SK has a supremum concept CS that is not similar to any of its superconcepts.

Any concept C CS in SK must have at least one superconcept C’ SK so that E(C,C’) > Ts.



Fig. 5 shows the conceptual clusters generated from the fuzzy concept lattice given in Fig. 3 with similarity confidence threshold Ts = 0.5

C1

{}

{“Data Mining”}

{“Fuzzy Logic”}



“Fuzzy Logic”}

{“Data Mining”,“Fuzzy Logic”}

0.50.41

C2

C3

C4

0.000.00

0.000.00

0.35

CK1 CK2

CK3

Fig. 5


4. Fuzzy Ontology Generation Framework Step 2 b)Hierarchical Relation Generation

Fuzzy conceptual clustering generates a set of conceptual clusters SC. To construct a concept hierarchy from the conceptual clusters, we need to find the hierarchy relations from the clusters.

We first define a concept hierarchy Definition (Concept Hierarchy)A concept hierarchy is a poset (partially

ordered set) (H,) where H is a finite set of concepts, and is a partial order on H.



Definition of superconcept and subconcept relations on conceptual clusters assures that each conceptual cluster has at least one superconcept, unless it corresponds to the root node of the concept hierarchy generated. However, we must prove that the relation is a partial order.

Definition (Subconcept and Superconcept on a Concept Hierarchy)

Let C1 and C2 be two conceptual clusters corresponding to two sublattices L1 and L2 of a fuzzy concept lattice F (K). Let the fuzzy formal concept I be the supremum of L1, i.e. I = sup(L1). C1 is the subconcept of C2, denoted as C1 C2 , if I is the subconcept of any concept C’ L2, or I C’ where is the partial order defined on F (K). Equivalently, C2 is the superconcept of C1.



Figure 8(b) illustrates the hierarchical relations constructed from the conceptual clusters given in Figure 8(a). Each concept in the concept hierarchy is represented by a set of its attributes. The supremum and infimum of the lattice are considered as “Thing” and “Nothing” concepts, respectively.

C1

{}

{“Data Mining”}

{“Fuzzy Logic”}



“Fuzzy Logic”}


0.50.41

C2

C3

C4

0.000.00

0.000.00

0.35

CK1CK2

CK3

Figure 8(a). Conceptual clusters.

Thing

{“Fuzzy Logic”}{“Data Mining”,

“Clustering”}

Nothing


Figure 8(b). Concept hierarchy.






Fuzzy Ontology




Conversion

Semantic Web


4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation

This step constructs fuzzy ontology from a fuzzy context using the concept hierarchy created by fuzzy conceptual clustering.

This is done based on the characteristic that both FCA and ontology support formal definitions of concepts.

However, a concept defined in FCA has both extensional and intensional information in a balanced manner, whereas a concept in ontology emphasizes on its intensional aspect.

To construct the fuzzy ontology, we need to convert both intensional and extensional information of FCA concepts into the corresponding classes and relations of the ontology.

Thus, we define the fuzzy ontology as follows…



Definition (Fuzzy Ontology). A fuzzy ontology FO consists of 4 elements (C,AC,R, X), where C = set

of concepts; AC represents a collection of attributes sets, one for each concept; R = (RT, RN) represents a set of relationships, which consists of 2 elements: RN is a set of non-taxonomy relationships and RT is a set of taxonomy relationships. Each concept ci in C represents a set of objects, or instances, of the same kind. Each object oij of a concept ci can be described by a set of attributes values denoted by AC(ci). Each relationship ri(cp,cq) in R represents a binary association between concepts cp and cq, and the instances of such a relationship are pairs of (cp,cq) concept objects. Each attribute value of an object or relationship instance is associated with a fuzzy membership value between [0,1] implying the uncertainty degree of this attribute value or relationship. X is a set of axioms. Each axiom in X is a constraint on the concept’s and relationship’s attribute values or a constraint on the relationships between concept objects



Example (Fuzzy Ontology). the Scholarly Ontology OS = (C, AC, R, X) is a fuzzy ontology where its

components are as follows. C = {“Document”, “Research Area”} AC(“Document”) = {“Name” ,“Author”, “Title”, “Keywords”, “Abstract”,

“Body”, “Publisher”, “Publication Date”} AC(“Research Area”) = {“Name”,“Keyword”} RN = {belong-to(“Document”, “Research Area”), consist-of(“Research

Area”,”Document”)} RT = {superarea-of(“Research Area”, “Research Area”), subarea-

of(“Research Area”, “Research Area”)} X ={Implies(Antecedent(consist-of(I-variable(x1) I-variable(x2)))

Consequent(belong-to(I-variable(x2) I-variable(x1))))Implies(Antecedent(belong-to(I-variable(x1) I-variable(x2)))

Consequent(consist-of(I-variable(x2) I-variable(x1))))Implies(Antecedent(superarea(I-variable(x1) I-variable(x2)))

Consequent(subarea(I-variable(x2) I-variable(x1))))Implies(Antecedent(subarea(I-variable(x1) I-variable(x2)))

Consequent(superarea(I-variable(x2) I-variable(x1))))}



Figure 9. Fuzzy ontology generation process.

Ontology RelationClasses

Taxonomy RelationGeneration

Class MappingNon-Taxonomy

Relation Generation

Ontology Extentand Intent Classes

Fuzzy Context Concept Hierarchy

OntologyHierarchical Classes

InstancesGeneration

Fuzzy Ontology



Class Mapping furnishes C = {E, I} in which E and I are classes corresponding to extent and intent of the fuzzy context. For example, the extent class mapped from the extent of the fuzzy context given in Table 1(b) can be labeled manually as Document. We can use appropriate names to represent keyword attributes and use them to label the intent class names as well. For example, the class Research Area can be used to label the initial intent class.



Taxonomy Relation Generation furnishes RT = {superclass(I,I), subclass(I,I)}. Thus, the hierarchical relations between instances of intent classes are defined. Also, two rules are added to X accordingly:

superclass(X,Y):-subclass(Y,X). subclass(X,Y):-superclass(Y,X).



Non-taxonomy Relation Generation furnishes RN = {RIE(I,E), REI(E,I)}, in which REI is the relation between the extent class and intent class. RIE is the reversed relation of REI. However, we still need to label the non-taxonomy relation. For example, the relation between class Document and class Research Area can be labeled as belong-to, which implies that a document can belong to one or more research areas. Also, two rules are added to X accordingly:

REI(X,Y):- RIE(Y,X). RIE (X,Y):- REI (Y,X).



Instances Generation generates instances set I = {II, IE} where II and IE are instances of the intent and extent class.

Then, it furnishes membership values for the instances’ attributes and relationships






Fuzzy Ontology




Conversion

Semantic Web


4. Fuzzy Ontology Generation Framework Step 4 Semantic Representation Conversion

The generated fuzzy ontology provides a conceptual model of knowledge in the corresponding domain

However, to make such knowledge accessible and sharable, we must convert it into a semantic representation that can be embedded into the contents of Web pages.

In Semantic Web, ontology description language such as OWL can be used to annotate ontology.

Therefore, the generated fuzzy ontology can be automatically converted into the corresponding semantic representation in OWL, in which each class and instance is annotated as shown on the next slide…


4. Fuzzy Ontology Generation Framework Step 4 Semantic Representation Conversion

Ontology for the concept hierarchy represented by OWL <?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.owl-ontologies.com/unnamed.owl#" xml:base="http://www.owl-ontologies.com/unnamed.owl"> <owl:Ontology rdf:about=""/> <owl:Class rdf:ID="Concept_2"/> <owl:Class rdf:ID="Concept_1"/> <owl:Class rdf:ID="Concept_3"> <rdfs:subClassOf rdf:resource="#Concept_1"/> <rdfs:subClassOf rdf:resource="#Concept_2"/> </owl:Class> <owl:DatatypeProperty rdf:ID="Data_Mining"/> <owl:DatatypeProperty rdf:ID="DataMining"> <rdfs:domain rdf:resource="#Concept_1"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/> </owl:DatatypeProperty> <owl:DatatypeProperty rdf:ID="FuzzyLogic"> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/> <rdfs:domain rdf:resource="#Concept_2"/> </owl:DatatypeProperty> <Concept_2 rdf:ID="Document2"> <FuzzyLogic rdf:datatype="http://www.w3.org/2001/XMLSchema#float" >0.87</FuzzyLogic> </Concept_2> </rdf:RDF>


5. Scholarly OntologyOntology Generation

Collected scientific documents on the research area “Information Retrieval” published in 1987-1997 from ISI

Downloaded documents are preprocessed to extract related information such as the title, authors, citation keywords, and other citation information

Extracted information then stored in a citation database



First, we construct a fuzzy formal context Kf = {G,M,I}, with G as the set of documents and M as the set of citation keywords. The membership value of a document D on a citation keyword CK in Kf is computed as

where n1 is the number of documents that cite D and contain CK, and n2 is the number of documents that cite D

This formula is based on the premise that the more frequent a keyword occurs in the citing paper, the more important the keyword is in the cited paper.

2

1),(n

nCD K



Then, conceptual clustering is performed from the fuzzy formal context

Each generated conceptual cluster represents a research area

The generated conceptual clusters form a hierarchy of research areas of documents in the Citation Database, or Research Area Hierarchy


5. Scholarly Ontology Example of concept hierarchy generated

Each research area is represented by a set of most frequent keywords occurring in the documents that belong to that research area. In FFCA, sub-areas inherit keywords from their super-areas. Note that the inherited keywords are not shown in Figure 11 when labeling the concepts. Only keywords specific to the concepts are used for labeling.

{"SemanticSimilarity","Knowledge

Representation"}

{"User Interface","Browsing"}

{"Clustering","Neural Network"}

{"Information Retrieval","Query Processing",

"Searching"}

{"Recall","Precision"}

{“User Satisfaction","User Training","User Study"}

{"Online Search","Information Filtering"}

{"Retrieval Evaluation","System Training"}

{"Data Indexing"}

{"Expert System"}

{"Text Retrieval"}

{"Data Mining"}

Figure 11



The generated ontology contains scholarly information as a hierarchy of research areas as well as research areas for each document.

Taking advantages of the Semantic Web, such knowledge can be easily shared and reused by other systems for browsing or retrieval.

For example, we can use Protégé-2000 for browsing the scholarly ontology.


5. Scholarly Ontology Part of the generated concept

hierarchy of research areas

Fig. 12

We use the keyword that has the highest membership value to label the research area. Nevertheless, users can browse more information of each research area.


Performance of the ontology generation is evaluated based on the generated Research Area Hierarchy.

Firstly, we measure the typical recall, precision and F-measure to evaluate the clustering results.

Secondly, we use the relaxation error and the corresponding cluster goodness measure to evaluate the goodness of the conceptual clusters generated. We also show whether the use of fuzzy membership instead of crisp value can help improve cluster goodness.

Finally, we use the Average Uninterpolated Precision (AUP), which is a typical measure for evaluating a hierarchical construct, to evaluate the goodness of the generated concept hierarchy.

5. Scholarly Ontology Performance Evaluation


5. Scholarly Ontology Performance Evaluation Keyword attributes are descriptors for the generated

clusters, if more keywords are extracted and used, the more meaningful the cluster descriptors are constructed?

To verify this, we vary the number of keywords N extracted from documents from 2 to 10, and the similarity threshold Ts from 0.2 to 0.9 when performing conceptual clustering

We have classified the documents downloaded from ISI into classes based on their research themes. These classes are used as a benchmark to evaluate the clustering results in terms of recall, precision and F-measure.


5. Scholarly Ontology Performance Evaluation - Precision

Ts=0.2 Ts=0.3 Ts=0.4 Ts=0.5 Ts=0.6 Ts=0.7 Ts=0.8 Ts=0.9

N=2 0.64 0.64 0.64 0.64 0.63 0.62 0.62 0.62

N=3 0.66 0.66 0.66 0.66 0.64 0.62 0.62 0.62

N=4 0.73 0.77 0.78 0.79 0.74 0.69 0.68 0.68

N=5 0.8 0.84 0.84 0.85 0.81 0.75 0.75 0.75

N=6 0.9 0.9 0.9 0.9 0.86 0.8 0.79 0.8

N=7 0.96 0.94 0.93 0.93 0.9 0.86 0.84 0.84

N=8 0.95 0.94 0.92 0.93 0.9 0.86 0.83 0.83

N=9 0.94 0.93 0.92 0.92 0.89 0.86 0.83 0.83

N=10 0.93 0.92 0.91 0.91 0.89 0.85 0.83 0.83

Table 6. Performance results using precision measurement.

Precision implies accuracy of the clustering results. Table 6 shows that when

N is small, the precision is poor. It implies that “noisy” data in clusters.

The precision is improved when the number of extracted keywords is increased. However, this will also cause the recall to decrease as shown in Table 7.


5. Scholarly Ontology Performance Evaluation - Recall


N=2 0.99 0.99 0.99 0.99 0.99 0.98 0.98 0.98

N=3 0.99 0.99 0.99 0.99 0.98 0.98 0.97 0.97

N=4 0.98 0.98 0.97 0.97 0.94 0.95 0.94 0.94

N=5 0.89 0.87 0.87 0.88 0.87 0.89 0.89 0.89

N=6 0.8 0.81 0.83 0.83 0.83 0.85 0.85 0.85

N=7 0.81 0.8 0.82 0.82 0.83 0.84 0.86 0.86

N=8 0.79 0.79 0.81 0.82 0.82 0.84 0.85 0.85

N=9 0.76 0.77 0.8 0.8 0.81 0.83 0.84 0.84

N=10 0.73 0.75 0.78 0.78 0.79 0.81 0.83 0.83

Table 7. Performance results using recall measurement.

When the number of clusters is gradually increased, the efficiency of the clustering results will gradually be decreased.


5. Scholarly Ontology Performance Evaluation - F-measure


N=2 0.78 0.78 0.78 0.78 0.77 0.76 0.76 0.76

N=3 0.79 0.79 0.79 0.79 0.77 0.76 0.76 0.76

N=4 0.83 0.86 0.86 0.87 0.82 0.79 0.78 0.78

N=5 0.84 0.85 0.85 0.86 0.83 0.81 0.81 0.81

N=6 0.85 0.85 0.86 0.86 0.84 0.82 0.82 0.82

N=7 0.88 0.86 0.87 0.87 0.86 0.85 0.85 0.85

N=8 0.86 0.86 0.86 0.87 0.85 0.85 0.84 0.84

N=9 0.84 0.84 0.86 0.86 0.85 0.84 0.83 0.83

N=10 0.81 0.82 0.84 0.84 0.83 0.83 0.83 0.83

Average 0.83 0.83 0.83 0.84 0.82 0.81 0.8 0.8

Table 8. Performance results using F-measure measurement.

When N is low, the F-measure is quite poor. Nevertheless, the F-measure is stable and good when a sufficient number of keywords are extracted. The results also show that the F-measure tends to have the best performance when Ts = 0.5.


5. Scholarly Ontology Performance Evaluation – Relaxation Error

Relaxation error implies dissimilarities of items in a cluster based on attributes’ values.

Since conceptual clustering techniques typically use a set of attributes for concept generation, relaxation error is quite commonly used for evaluating the goodness of conceptual clusters.


5. Scholarly Ontology Performance Evaluation – Relaxation Error

The relaxation error RE of a cluster C is defined as

where A is the set of the attributes of items in C, P(xi) is the probability of item xi occurring in C and da(xi,xj) is the distance of xi and xj on attribute a.

The cluster goodness G of cluster C is defined as G(C) = 1 - RE(C).

),()()()(

1 1ji

Aa

n

i

n

j

aji xxdxPxPCRE


5. Scholarly Ontology Performance Evaluation – Relaxation Error Comparison of FFCA and COBWEB while the number of extracted keywords

is varied from 2 to 10

0

0.2

0.4

0.6

0.8

1

N=2 N=3 N=4 N=5 N=6 N=7 N=8 N=9 N=10

Fuzzy FCA

COBWEB

we vary the number of keywords extracted to observe the effect of the keyword generated on cluster goodness. Besides, since COBWEB is considered as one of the most popular techniques for conceptual clustering, we also apply COBWEB to the citation database to compare the performance. It shows that FFCA achieves better cluster goodness than COBWEB


5. Scholarly Ontology Performance Evaluation – AUP

Average Uninterpolated Precision (AUP) is defined as the sum of the precision value at each point (or node) in a hierarchical structure where a relevant item appears, divided by the total number of relevant items

Typically, AUP implies the goodness of a concept hierarchical structure.

For evaluating AUP, we have manually classified the downloaded documents into classes based on their research themes.

For each class, we extract 5 most frequent keywords from the documents in the class. Then, we use these keywords as inputs to form retrieval queries and evaluate the retrieval performance using AUP


5. Scholarly Ontology Performance Evaluation – AUP There are two ways to generate

document keywords. The first is to use the set of keywords, known as attribute keywords, from each conceptual cluster as the document keywords. The second is to use the keywords from each document as the document keywords. Then, we vectorize the document keywords and the input query, and calculate the vectors’ distance for measuring the retrieval performance.


5. Scholarly Ontology Performance Evaluation – AUP Two methods1. AUP measured using attribute keywords

Hierarchical Average Uninterpolated Precision (AUP(H)), as each concept inherits attribute keywords from its superconcepts.

2. AUP measured using keywords from documents Unconnected Average Uninterpolated Precision (AUP(U)).


5. Scholarly Ontology Performance Evaluation – AUP Fig. 14 shows the results for AUP(H) and AUP(U) using different numbers of

extracted keywords N.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

N=2 N=3 N=4 N=5 N=6 N=7 N=8 N=9 N=10

AUP (H)

AUP (U)

Fig. 14

It shows that when N gets larger, the performance

on AUP(H) and AUP(U) gets better. In addition,

performance on AUP(H) is generally better than

AUP(U). It means that the attribute keywords

generated for conceptual clusters are appropriate


6. Semantic Helpdesk Application

Introduction Developed in collaboration with a multinational company,

the Semantic Help-Desk Environment comprises the Web Service Requester, Matchmaking Agent and Web Service Provider.

The focus is on the fuzzy ontology generation process that generates Machine Service Ontology from a customer service database.

This approach enables individual machine service knowledge to be shared over the Semantic Web. Thus, machine service knowledge from different machines or models provided by different manufacturers can be shared and integrated. This is important as many customers may have different types of machines and models from different manufacturers.



Introduction - Web Service Requester A kind of Web Service that enables access to

customer support for machine services. Instances of the Web Service Requester can be

created from a Web Requester Server where its address is accessible for all users through the Web.

When encountering a problem, a user can use the Web to connect the Web Requester Server in order to create an instance of the Web Service Requester.

The created instance runs as a web-based program. That is, it can use the Web to interact with the user and other programs.



Introduction - Web Service Requester Through the Web, the Web Service Requester instance

provides an interface for the user to enter their reported problem.

Through the interface, the user can specify the encountered fault as a textual string. The user is also required to enter the code of the machine model. The given information is used to form a profile for the Web Service Requester.

The profile is then sent as a request to the Matchmaking Agent to seek a potential Web Service Provider for solving the problem



Introduction - Web Service Provider

It offers its machine service support as a Web Service extended with ontology capabilities.

There are probably many instances of a Web Service Provider existing concurrently on the Internet.

An instance of the Web Service Provider can be considered as a program that can access the Machine Service Ontology to retrieve machine service knowledge for a given reported problem.

An instance of the Web Service Provider can interact with other programs. That is, it can be called by other programs and return the outputs to the calling programs.

Instances of the Web Service Provider must be registered with a specific agent known as the Matchmaking Agent that serves as a registry and look-up service.



Introduction - Web Service Provider

Each instance of the Web Service Provider also provides a profile file that describes its parameters and capabilities. XML is used in most Web Services to represent the information contained in the profiles.

However, traditional XML lacks the capabilities of representing semantic information.

To overcome this problem, the Web Service Provider uses ontology-based service description language OWL-S (formerly DAML-S) to describe information in its profile. Hence, we describe the service as OWL ontology and its intentional information can be fully understood by other programs.



Introduction - Matchmaking Agent

When the Matchmaking Agent receives machine service requests from the Web Service Requester, it locates the appropriate Web Services that can fulfill the request



Overview

Client Web Browser

Internet

WebService

Requester

MatchmakingAgent

Client Web BrowserCustomer

Web ServiceProvider

Manufacturer

Machine ServiceOntologies

CustomerService

Databases

Web ServiceProvider

Manufacturer

CustomerService

Databases

Customer

Machine ServiceOntologies



Customer Service Database The customer service database contains 9000

service records, each record consists of fault-condition and checkpoint information

Fault-condition contains the service engineer’s description of the machine fault. Checkpoint information indicates the suggested actions to be carried out to repair the machine based on the occurred fault-condition given by the customer



Customer Service Database

Fault-condition 3008 PCB CARRY MISS ERROR. PCB WAS NOT TRANSFERRED BY THE CARRIER DURING LOADING BUT STAYED AT THE DETECTION POSITION OF PCB DETECTION SENSOR 2.

Checkpoint group: AVF_CHK003

Priority Checkpoint description Help file

1 CONFIRM WHETHER THE CARRY GUIDE PINS ARE IN LINE WITH PCB. AVF_CHK007-1.GIF

2 CONFIRM WHETHER THE PCB IS IN CORRECT DIRECTION. AVF_CHK007-2.GIF

3 CONFIRM THE POSITION OF THE GUIDE LOWER LIMIT SENSOR. (I/O 0165) AVF_CHK007-3.GIF

4 CONFIRM THE TIMING FOR PCB 2 DETECT SENSOR. AVF_CHK007-4.GIF



Machine Service Ontology Generation Apply FOGF to obtain Fuzzy Fault Concept

Lattice → Fault Concept Hierarchy → Machine Service Ontology

Any fault

{“Anvil”} {“Drive”} {“Cutter”} {“Component”}

{“Anvil”, “Joint”, “CannotEngage”}

{“Anvil”, “Shaky”, “Unit”}

{“Anvil”, “Drive”, “CannotOpen”,”Pitch”}

{“Cutter”, “Drive”, “CannotOpen”,”Axis”}

{“Cutter”, “Component”,“Cut”,”Insertion”}

{ “Component”,“Float”,”PCB”}

Part of the Fault Concept Hierarchy of the machine model AV_2011



Machine Service Ontology Generation The generation process creates classes, relations

and instances for the service ontology. The machine fault service knowledge stored in

the Customer Service Database is known as non-taxonomy knowledge, whereas the machine fault hierarchy knowledge from the Fault Concept Hierarchy is called taxonomy knowledge. These two types of knowledge are combined to form the Machine Service Ontology.



Machine Service Ontology in OWL

<rdf:RDF> xmlns:owl ="http://www.w3.org/2002/07/owl#" xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#" xmlns:daml="http://www.w3.org/2001/10/daml+oil#" <owl:Ontology rdf:about=””> <owl:versionInfo>v 1.0 2004-12-07 19:06:40 </owl:versionInfo> <rdfs:label> Machine Service Ontology </rdfs:label> </owl:Ontology> <owl:Class rdf:ID=”Machine”/> <owl:Class rdf:ID=”Check_point”> <owl:Class rdf:ID=”Machine_Fault_Cluster”> … <owl:Class rdf:ID=”Machine_Fault_Cluster_1”> <owl:rdfLabel=”Anvil”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/> </rdf:subClassOf>

<owl:ObjectProperty rdf:ID="Anvil"> <rdfs:range rdf:resource="&xsd;Float"/>

</owl:ObjectProperty> </owl:Class> <owl:Class rdf:ID=”Machine_Fault_Cluster_2”> <owl:rdfLabel=”Cutter”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/> </rdf:subClassOf>

<owl:ObjectProperty rdf:ID="Cutter"> <rdfs:range rdf:resource="&xsd;Float"/>

</owl:ObjectProperty> </owl:Class> <owl:Class rdf:ID=”Machine_Fault_Cluster_3”> <owl:rdfLabel=”Anvil_Cutter”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster_1”/> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster_2”/> </rdf:subClassOf> </owl:Class> … <owl:Class rdf:ID=”Machine_Fault”>

<owl:ObjectProperty rdf:ID="occur_on"> <rdfs:domain rdf:resource="#Machine"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="inspect_to"> <rdfs:domain rdf:resource="#Checkpoint"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="belong_to"> <rdfs:domain rdf:resource="#Machine_Fault_Cluster"/> </owl:ObjectProperty>

</owl:Class> </rdf:RDF>


6. Semantic Helpdesk Application Experiments Data stored in the database was divided into 10

subsets. Each subset was sequentially used as a testing set while others were used for generating conceptual clustering.

Keywords in fault conditions in each testing set were extracted and fuzzified as testing fuzzy queries.

To verify whether fuzzy queries can improve the retrieval performance, the keywords extracted are also used for retrieving without membership as crisp queries for comparison.


6. Semantic Helpdesk Application Experiments Manually classified faults in each machine

model into groups based on the machine components in which the fault occurred.

Retrieval accuracy is evaluated based on the number of the retrieved faults that are in the same classified group with the query.


6. Semantic Helpdesk Application Performance Measures Recall, Precision and F-measure

correctconditionsfaultofnumbertotal

correctandretrievedconditionsfaultofnumberrecall

retrievedconditionsfaultofnumbertotal

correctandretrievedconditionsfaultofnumberprecision

precisionrecall

precisionrecallmeasureF

**2



Retrieval Performance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Confidence Threshold

Re

call Crisp Query

Fuzzy Query

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Confidence Threshold

Pre

cis

ion Crisp Query

Fuzzy Query

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Confidence ThresholdF

-me

asu

re Crisp Query

Fuzzy Query


6. Semantic Helpdesk Application Performance Comparison Retrieval accuracy compared with four other

techniques Two variations of k-nearest neighbor (kNN)

technique. The first variation (kNN1) is based on vector’s normalized Euclidean distance to perform the retrieval. The second (kNN2) makes use of fuzzy-trigram technique to do so.

Two kinds of artificial neural networks (ANN): the supervised learning vector quantization (LVQ3) neural network and the unsupervised Self-Organizing Maps (SOM).


6. Semantic Helpdesk Application Performance Comparison

Retrieval Technique Retrieval Accuracy kNN1 81.4% kNN2 77.6% LVQ3 93.2% SOM 90.3%

FFCA with Crisp Query 84.6% FFCA with Fuzzy Query 93.0%

(Confidence Threshold = 0.2)

•FFCA with fuzzy query outperformed kNN. •LVQ3 performed marginally better, but requires prior expert knowledge for training, which would be a problem when dealing with large amounts of uncertainty information.•The proposed technique can generate a concept hierarchy from the clusters, which is important information for generating a corresponding meaningful ontology.


7. Summary

Proposed a framework for fuzzy ontology generation with uncertainty information

FOGF consists of the following steps: Fuzzy Formal Concept Analysis Fuzzy Conceptual Clustering Fuzzy Ontology Generation Semantic Representation Conversion


7. Summary

FOGF can represent uncertainty information and construct a concept hierarchy from the uncertainty information

Apart from constructing scholarly ontology from citation database, FOGF has also been used to generate Machine Service Ontology for Semantic Help-desk and Reuters News Topic Themes Ontology

Also, the scholarly ontology has been partially used to construct a Scholarly Semantic Web, a Semantic Web-based information retrieval system to support scholarly activities in the Semantic Web environment


References(Not intended to be

Exhaustive) Ontology Editors[1] http://protege.stanford.edu/[2] S. Bechhofer, I. Horrocks, P. Patel-Schneider, and S. Tessaris, "A proposal for a description

logic interface," in Proceedings of the International Workshop on Description Logics, pp. 33-36, 1999.

Large corpora[3] E. Morin, “Automatic acquisition of semantic relations between terms from technical corpora,"

in Proceedings of the Fifth International Congress on Terminology and Knowledge Engineering (TKE-99), (Vienna, Austria), 1999.

[4] M. Hearst, “Automatic acquisition of hyponyms from large text corpora," in Proceedings of the Fourteenth International Conference on Computational Linguistic, (France), 1992.

Knowledge base of rules [5] P. Compton and A. Jansen, Knowledge Acquisition, ch. A Philosophical Basis for Knowledge

Acquisition, pp. 241-257.

Statistical approaches [6] H. Suryanto and P. Compton, “Discovery of ontologies from knowledge bases," in Proceedings

of The 5th International Conference on Knowledge Capture (Y. Gil, M. Musen, J. Shavlik, and Victoria(, eds.), (Canada), pp. 171-178, 2001.

Semi-structured schemata based on Graphs[7] A. Deitel, C. Faron, and R. Dieng, “Learning ontologies from RDF annotations,“ in Proceedings

of the IJCAI Workshop in Ontology Learning, (Seattle,USA), 2001.



Exhaustive) Semi-structured schemata based on Statistics[8] C. Papatheodorou, A. Vassiliou, and B. Simon, “Discovery of ontologies for learning resources

using word-based clustering," in Proceedings of ED-MEDIA 2002, (Denver,USA), 2002.

LSD[9] A. Doan, P. Domingos, and A. Levy, “Learning source descriptions for data integration," in

Proceedings of the Third International Workshop on the Web and Databases, pp. 81-86, 2000.

Database schema [10] P. Johannesson, “A method for transforming relational schemas into conceptual schemas," in

Proceedings of the 10th International Conference on Data Engineering (M. Rusinkiewicz, ed.), (Houston, USA), pp. 115-122, IEEE Press, 1994.

[11] D. Rubin, M. Hewett, D. Oliver, T. Klein, and R. Altman, “Automatic data acquisition into ontologies from pharmacogenetics relational data sources using declarative object de¯nitions and XML," in Proceedings of the Paci¯c Symposium on Biology (R.B.Altman, A. Dunker, L. Hunter, K. Lauderdale, and T. Klein, eds.), (Lihue, HI), 2002.

NLP[12] D. Lonsdale, Y. Ding, D. Embley, and A. Melby, “Peppering knowledge sources with SALT;

boosting conceptual content for ontology generation," in Proceedings of the AAAI Workshop on Semantic Web Meets Language Resources, 2002.

[13] D. I. Moldovan and R. C. Girju, \An interactive tool for the rapid development of knowledge bases," International Journal on Arti¯cial Intelligence Tools (IJAIT), vol. 10, no. 1-2, 2001.



Exhaustive) Wordnet[14] http://wordnet.princeton.edu/wordnet/download/

Text-to-Onto[15] A. Maedche and S. Staab, “Ontology learning for the Semantic Web," IEEE Intelligent

Systems, Special Issue on the Semantic Web, vol. 16, no. 2, 2001.

Keyword frequencies[16] A. Faatz and R. Steinmetz, “Ontology enrichment with texts from the WWW,“ in In

Proceedings of Semantic Web Mining 2nd Workshop at ECML/PKDD-2002, (Helsinki, Finland), 2002.

[17] R. Navigli, P. Velardi, and A. Gangemi, “Ontology learning and its application to automated terminology translation," IEEE Intelligent Systems, vol. 18, no. 1, 2003.

Clustering / COBWEB[18] P. Clerkin, P. Cunningham, and C. Hayes, \Ontology discovery for the Semantic Web

using hierarchical clustering," in Proceedings of Workshop at ECML/PKDD-2001, (Germany), 2001.

Mo'K[19] G. Bisson and C. Nedellec, \Designing clustering methods for ontology building: The

Mo'K workbench," in Proceedings of the Workshop on Ontology Learning, 14th European Conference on Arti¯cial Intelligence, ECAI'00 (S. Staab, A. Maedche, C. Nedellec, and P. WiemerHasting, eds.), (Germany), 2000.



Exhaustive) ESKIMO[20] S. Kampa, T. Miles-Board, and L.Carr, \Hypertext in the Semantic Web," The ACM

Conference on Hypertext and Hypermedia, pp. 237-238, 2001.

Scholarly Ontology Project [21] V. Uren, S. Shum, C. Mancini, and G. Li, “Modelling naturalistic argumentation in

research literatures," in Proceedings of the 4th Workshop on Computational Models of Natural Argument, (Valencia, Spain), 2004.

OAI[22] http://www.openarchives.org/

FCA[23] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations.

ontology generation and applications dr. a.c.m. fong, ceng professor of computer engineering school...

Documents

logic layer

trust layer

rdf rdfschema layer

semantic concepts

existing knowledge

semantic query language

domain knowledge

rdf schema rdfs