using graphical ontologies for searching the (semantic)...

Using Graphical Ontologies for

Searching The (Semantic) Web

by

Leendert W. M. Wienhofen

University college of Østfold

August 2003

This Master Thesis was written as a part of the Computer Science Masters programme at the university college of Østfold, Norway. 2003 Leendert W. M. Wienhofen Disclaimer: All trademarks and copyrights mentioned in this thesis are the property of their respective owners.

2003 Leendert W. M. Wienhofen University college of Østfold, Norway

Using Graphical Ontologies for -i- Searching The (Semantic) Web

Preface The world of knowledge management opened up to me at the end of the year 1999 when I was looking for a graduation subject for my Bachelor of Engineering degree at Hogeschool Arnhem en Nijmegen, University of Professional education, The Netherlands. I found a company (a medium sized account and advisor firm in the Netherlands) that had an assignment that roughly said: “Find out what knowledge management is, how our company can benefit from this and implement a pilot system.” After a short introduction on knowledge management from the IT manager of the company in question, I decided that I really wanted to do this. It turned out to be an assignment for two people, so I asked my classmate and good friend Marc Dorsman what he thought about it. He said yes almost immediately, so we started on the assignment in January 2000. Now we are getting to the relevant part: relevance. We learned a lot about knowledge management and how to implement a pilot for the firm, yet we encountered two main (related) problems: relevance and ambiguity. Information relevant for the IT department may turn out to be completely irrelevant for the accountant department, or the other way around. Also, words like “implementing” turned out to have different meanings for the different departments. We did not find a solution to these problems in the 5 months that we had to complete the assignment, but my interest was definitely awakened. I moved to Norway, and met someone (my current manager, Dr. Robert Engels) who introduced me to CognIT a.s, a company that carries the motto “Serving the Knowledge Company”. My interest was awakened right away, and to my surprise CognIT actually could provide the solution to the main problems that I encountered for my Bachelor assignment! They actually manage to put words in context by using ontologies, something that was completely new for me at that time. After explaining what I had been working on I got offered a job, and have been working for the R&D department of CognIT a.s, Norway since November 2000. While working with knowledge management and intelligent systems I started thinking about new approaches for searching. I applied for a Master degree education at the University College of Østfold in Halden, Norway and the result of many late nights (as I have been writing this thesis while being employed by CognIT) filled with reading and writing is laying in front of you now. I hope my ideas for searching the (Semantic) Web by using graphical ontologies are interesting for you.


Using Graphical Ontologies for -ii- Searching The (Semantic) Web

The thesis consists of the following chapters: • Chapter 1: Introduction

This chapter provides a brief presentation of the background for ideas presented in this thesis.

• Chapter 2: Information Retrieval This chapter provides information about the research field of Information Retrieval (IR), and it explains how to define quality for IR efforts by using precision, recall and the F-measure.

• Chapter 3: Search methods for the Internet This chapter contains information about methods for searching the Internet by using keyword-based search-engines, as well as searching by using natural language. This chapter also includes a thorough introduction of CORPORUM technology, which is mentioned quite often in this thesis. A running example about the kings of Norway is introduced in this chapter, and this example will be used throughout the thesis in different forms.

• Chapter 4: The Semantic Web, an introduction As the title already points out, this chapter gives an introduction of the Semantic Web; it explains how it is built up and which languages are used to store this information. In addition to this, the “kings of Norway example” is adapted to show how searches on the Semantic Web can be carried out.

• Chapter 5: The Graphical Ontology Designer Environment This is the most important part of the thesis, where the ideas for the Graphical Ontology Designer Environment (GODE) are suggested. First, information is given on what graphical ontologies look like, how they can created and what the GODE could look like. Different application areas for different types of audience are presented, from a guided search where plain text can be used to pre-form a graphical ontology, to advanced graphical ontologies with all sorts of relation types that are built from scratch by expert users. Furthermore, some questions have arisen whether these ideas will work and whether building such graphical ontologies does not take too much time in comparison to the expected results.

• Chapter 6: Using graphical ontologies as a plug-in for existing search technologies The GODE does not have to be a stand-alone tool, it can also serve as a plug-in in existing search techniques. In this chapter we show some ideas on how to include the functionality of the GODE in existing tools like CORPORUM Knowledge Server and present how it may serve as a plug-in for RQL.

• Chapter 7: Conclusion and further work This chapter addresses the findings that have been mentioned in this thesis and gives directions for further work.

• Chapter 8: References This chapter provides a list of the literature referred to in this thesis.

• Appendix A CORPORUM Knowledge Server screenshots.

• Appendix B RQL examples with answers.

• Appendix C RDF and XML output from CORPORUM OntoExtract.


Using Graphical Ontologies for -iii- Searching The (Semantic) Web

Acknowledgements I would like to thank associate professor Roland Olsson, my supervisor at Østfold University College, for his interesting feedback throughout the creation of this thesis. Also, I would like to thank my colleagues at CognIT a.s, Norway for their support in the form of: reading, commenting, checking language, general inspiration, supplying software, interesting discussions, or any other type of support. In alphabetical order: Dr. Bernt A. Bremdal, Dr. Robert Engels, MSc Fred Johansen, Cand.Phil. Till C. Lech and MSc Christophe Spaggiari. Further I would like to thank one of best friends, Nico van Rijn, studying Business Communication at Nijmegen University, the Netherlands, for investing time in helping me to express myself in a correct way in English. Last but not least I would like to thank my girlfriend, who does not wish to have her name written here, for putting up with me in busy, sometimes stressful times. Leendert W.M. Wienhofen Oslo, Norway August 2003


Using Graphical Ontologies for -iv- Searching The (Semantic) Web


Using Graphical Ontologies for -v- Searching The (Semantic) Web

Abstract The main focus of this thesis is on information retrieval by means of graphical ontologies. This information is presented as a roadmap towards this new method of searching, based on aspects of currently available technology. The different technologies, which are used as a basis for the roadmap, are all introduced in this thesis. First of all, a short introduction to different techniques and the field of work are described, as well as examples for different ways of retrieving information from the Internet and the Semantic Web. In the main part of the thesis, ideas for a Graphical Ontology Designer Environment (GODE) are presented, along with ideas on how to make this technology as accessible as keyword search is today. GODE is an environment which will be suitable for searching the (Semantic) Web by means of graphical ontologies. A variety of difficulty levels are identified to make sure that everybody can benefit from this approach in different situations. Application areas are discussed for both the simple and the advanced version of the GODE, as well as the possibility to use this technology as a plug-in for existing information retrieval tools for searching both the WWW, as well as the Semantic Web. Of course possible pitfalls that may prevent this new approach from becoming popular are also discussed.


Using Graphical Ontologies for -vi- Searching The (Semantic) Web


Using Graphical Ontologies for -vii- Searching The (Semantic) Web

Contents Preface ............................................................................................................................i Acknowledgements ........................................................................................................iii Abstract .......................................................................................................................... v 1 Introduction............................................................................................................1 2 Information retrieval, a definition .........................................................................3

2.1 Defining quality for IR: Precision/Recall ......................................................4 2.2 Summary ........................................................................................................5

3 Search methods for the Internet .............................................................................7 3.1 Simple keyword search ..................................................................................8 3.2 Advanced keyword search ...........................................................................10 3.3 Natural language search...............................................................................12

3.3.1 AskJeeves.............................................................................................12 3.3.2 CORPORUM™ technology ................................................................13 3.3.3 CORPORUM™ Knowledge Server ....................................................14

3.4 Summary ......................................................................................................15 4 The Semantic Web, an introduction .....................................................................17

4.1 Ontologies ....................................................................................................20 4.2 RDF/RDFS...................................................................................................22 4.3 Querying The Semantic Web.......................................................................23

4.3.1 RQL......................................................................................................26 4.4 Summary ......................................................................................................28

5 The Graphical Ontology Designer Environment .................................................29 5.1 Representing an ontology ............................................................................29

5.1.1 CORPORUM OntoExtract ...............................................................30 5.1.2 CCA viewer .........................................................................................33

5.2 Existing Ontology development tools..........................................................35 5.3 GODE GUI and functionality proposal .......................................................37 5.4 Simple search ...............................................................................................38

5.4.1 Guided search.......................................................................................38 5.4.2 Building your own graphical ontology ................................................41 5.4.3 Application area of simple graphical ontologies .................................43 5.4.4 Intended audience for simple graphical search ....................................45

5.5 Advanced search ..........................................................................................46 5.5.1 Application area of advanced graphical ontologies .............................48 5.5.2 Intended audience for advanced graphical search ...............................49

5.6 Possible traps ...............................................................................................49 5.6.1 Invested time vs. Relevance of results .................................................49

5.7 Summary ......................................................................................................50 6 Using graphical ontologies as a plug-in for existing search technologies..........51

6.1 CORPORUM............................................................................................51 6.2 RQL..............................................................................................................52

6.2.1 Simple graphical search .......................................................................52 6.2.2 Advanced graphical search ..................................................................54

6.3 Summary ......................................................................................................54 7 Conclusions and further work..............................................................................55 8 References ............................................................................................................57


Using Graphical Ontologies for -viii- Searching The (Semantic) Web

Appendix A .....................................................................................................................a Appendix B ..................................................................................................................... c Appendix C.....................................................................................................................g

RDF output.................................................................................................................g XML output ...............................................................................................................p


Using Graphical Ontologies for -1- Searching The (Semantic) Web

1 Introduction Several approaches exist to find information on the Internet, of which keyword-based search is the most used approach. Keyword-based search is very easy to understand and very easy to use, yet it has one huge drawback: it returns a lot of irrelevant results. Search-engine creators try a number of different approaches to limit the number of irrelevant hits. Google for example relies partly on the number of web sites that point towards the web site that contains the entered keywords, and thereby assumes that the people who link to a certain resource know what they are linking to. Other search engines seem to simply list every web site that contains the entered key words, and place the web site that has the highest number of these words on top. Google’s approach seems to be much more effective, although it produces numerous irrelevant results as well. This is mainly so because of the lack of semantics, Google does not know how words relate to each other and therefore it cannot tell which word is relevant in which context. By adding semantics (or context if you wish) to the search input, many irrelevant results would be eliminated. Tim Berners-Lee (sometimes called “the inventor of the WWW”) has introduced the idea for a Semantic Web some years ago. A Web where all stored information is enriched with semantics that are machine-understandable1. This thesis shows a roadmap for technology (the GODE2) which makes it possible to search both the Internet and the Semantic Web by using graphical ontologies3. The main idea behind this is that by adding semantic information to a query, the results will become more relevant (and hopefully only relevant results will be returned). Since this technology will use semantically related concepts, represented in an ontology, as a query, most benefit will be reached by searching the Semantic Web, as the information is stored with machine-understandable semantically enriched information. For readability reasons, we will, if gender is not specified, address people as “he” rather than “he/she”. The author works for the R&D department of CognIT a.s, therefore some focus is laid on the technology and products of this company, especially since this has been available during the writing process.

1 See chapter 4 for more information. 2 See chapter 5 for more information. 3 See paragraph 4.1 for an introduction on ontologies.



2 Information retrieval, a definition The focus of this thesis is on how to retrieve relevant information from information sources by using graphical ontologies. This chapter will show the main ideas behind IR as well as a method for defining the quality of the retrieved information. Information Retrieval is defined as: “The study of systems for indexing, searching, and recalling data, particularly text or other unstructured forms.“ [Wei97] People store loads of information, be it in books, in databases, or on the Internet. How information is stored is really not the main issue; finding this information when it is needed however, is. Why bother to store a piece of information if you cannot find it back? To reduce the problem of not being able to find a certain resource when it is needed, some classifying methods have been established. In a library for example, the books are neatly categorized according to the contents of the book, philosophy, medicine, geography, romance, crime, etc. Also, one can ask a librarian for help when unsure where to find a book on a certain topic and get presented some suggestions. This of course requires the librarian to have knowledge about (contents of) the books available in the library. This is a daunting task, even for smaller libraries. On the Internet information is scattered around, mostly unorganised and unsorted. Also, no librarian is available to assist you in your search for information. Or is there? Of course search engines can help you a bit on your quest for information, but the success rate varies as the search engine cannot anticipate in your search and ask for more detailed information, as a librarian can. You have to rely on typing the correct key words in order to be pointed towards the piece of information you are looking for. The same scenario is valid in organisations (with an Intranet). If you cannot find references to a project, very similar to the one you are planning to launch, much time can go lost simply because something is being done twice. Re-inventing the wheel is one of the issues that can be prevented with proper Knowledge Management, which includes good methods for IR. Since most company information nowadays is stored digitally, we must be able to access this information in order to prevent a waste of resources by doing double work. Effective IR can reduce this problem (to eliminate it is virtually impossible with current technology).



The idea behind IR is really very simple: we have a set of documents and the need for information, which is or is not present in this document set. By “asking” the document set a question, we will be presented a result set from which we have to pick the relevant document. In case of an empty result set, either the question we asked was not good enough, or the information we were looking for does not exist. The latter is a satisfying answer, since one knows he is not re-inventing the wheel. Yet, checking if this really is the case will require more queries. Picking the relevant document is also not as easy as it looks. One would have to read through the entire document (or perhaps just the summary, but this way there is a risk that the information which is relevant for the searcher is not part of the summary) to make sure that this document really is relevant. This is a time consuming task which most people do not have time for, or they do not feel like doing this. IR has not yet come so far that computers can read and understand documents. Perhaps some programs can extract the most important parts of a document, but it will probably take many years before software really will understand text as well as humans do (including humour and irony). Relevance is a big issue in IR, as people tend to look for relevant answers (in the form of documents) to their questions, rather than irrelevant answers. When retrieving information we wish to retrieve as few irrelevant and as many relevant documents as possible. One way of defining the quality of an IR system, is the precision and recall method (see paragraph 2.1).

2.1 Defining quality for IR: Precision/Recall The standard measures for IR are recall and precision[DiSt]. When searching and retrieving documents from a large collection, there are four groups of documents:

- Relevant documents that were retrieved by the system (A) - Irrelevant documents that were retrieved by the system (B) - Relevant documents that the system missed (C) - Irrelevant documents that were not retrieved by the system (D)

Relevant Irrelevant Retrieved A B Not retrieved C D Precision = A/B Recall = A/C The higher the precision and recall values are, the better the IR effort has been. An ideal IR effort returns only the relevant hits, does not miss any relevant hits and does not return any irrelevant hits. Unfortunately, no IR tool can to this today, and it will probably take quite some years before near 100% values are reached.



Figure 1 shows a resource consisting of 1.000.000 documents, of which 100 are relevant (C). A search retrieves 1000 documents (B), of which 30 are relevant (A). This information is represented in Figure 1. D consists of the remaining 999.000 documents.

Figure 1: Precision/recall

In this example, the precision is 3% (the number of relevant retrieved documents divided by the number of retrieved documents: A/B = 30/1000) and the recall is 30% (the number of relevant retrieved documents devided by the number of relevant documents: A/C = 30/100 ). To combine recall and precision in a single efficiency measure, Van Rijsbergen introduced the F-measure in 1979 [Rij79]: F = 2 * (recall * precision) / (recall + precision). This F-measure is used to express the harmonic mean of precision and recall. For our example, this value is: 2(30*3)/(30+3) = 5,45. Of course, the higher this value is, the better the IR effort.

2.2 Summary Information Retrieval (IR) is “the study of systems for indexing, searching, and recalling data, particularly text or other unstructured forms“ [Wei97] and the quality of an IR effort is expressed by precision and recall, as well as the F-measure. The higher the values of these techniques are, the better the IR effort. Unfortunately, today’s systems are not capable of understanding resources (texts/documents) as well as humans do, and therefore it is quite possible that relevant documents are not retrieved while non-relevant documents are considered to be relevant by the IR software. By finding a better way of querying, precision, recall and the F-measure should increase.

D

B C

A



3 Search methods for the Internet Many companies have developed tools for searching the Internet (usually with a main focus for the WWW). In the paragraph 3.1 and 3.2, some of the most popular search tools are dealt with. In paragraph 3.3 somewhat more unknown search tools are highlighted. According to a press release from OneStat, a real-time website analysis software company [One02], on the 15th of April 2002, the list of most popular search engines is as follows: 1. Google4 (46.5%) 2. Yahoo5 (20.6%) 3. MSN Search6 (7.8%) 4. Altavista7 (6.4%) 5. Terra Lycos8 (4.6%) 6. Ixquick9 (2.4%) 7. AOL Search10 (1.6%) Each percentage depicts traffic from search engines to websites that use OneStat’s web statistics technology. As a running example for this thesis, the piece of information we are looking for is a list over the kings that have ruled Norway in the time period 1850-1950. Lists that show more than the 4 kings mentioned below are only considered to be partly correct (they do supply the needed information, but one still needs to look for it). The correct answer should include the notion that Norway has been in a union with Sweden until the year 1905, so the king of Sweden also was king of Norway. The correct answer is (retrieved from [KoN]): Oscar I King 1844-1859 King of Sweden and Norway Carl XV11 King 1859-1872 King of Sweden and Norway Oscar II King 1872-1905 King of Sweden and Norway

4 http://www.google.com 5 http://www.yahoo.com 6 http://www.msn.com 7 http://www.altavista.com 8 http://www.lycos.com 9 http://www.ixquick.com 10 http://www.aol.com 11 Please note that on [KoN] the heading is incorrect, it says Carl IV instead of Carl XV, which later on in the paragraph is mentioned correctly.



Haakon VII King 1905-1957 King of Norway This chapter is meant to reveal some of the shortcomings that arise while searching the WWW with currently available techniques and will be used as a reference for the next chapters.

3.1 Simple keyword search To give an illustration of some of the capabilities and shortcomings of today’s search-engines, we will show what the top two search-engines have to offer. Simple keyword search is the type of search that nearly all Internet users use frequently. Simply type in a couple of words and then start skimming through the results presented, hoping to find the resource you were looking for. Usually this type of search leads to more frustration than clarification because of the sheer volume of returned results. Depending on the search engine used, the word entered are taken either as a Boolean ‘AND’ or ‘OR’ search. The ‘AND’ search means that when two search words are used, both must exist on the web-page in order for the web-page to be displayed as a result. The ‘OR’ search means that either the first or the second word needs to be present on the web-page in order for it to be displayed in the result list. These Boolean operators can usually be forced by simply writing the word ‘AND’ or ‘OR’ between the two words. For example: 1) king AND Norway 2) king OR Norway The first option returns only results (be it documents, be it web-pages) containing both ‘king’ and ‘Norway’, while the second option returns results that include either ‘king’ or ‘Norway’ (or both for that matter). Both cases are not very efficient, as there is no guarantee that these words have something to do with each other. A web page may for example describe about the king of Belgium’s visit to Norway. In other words, it has nothing to do with the king of Norway. (Of course it may well be possible that the two kings actually met during this visit, but that is besides the point.) An intuitive starting point would be searching on the terms kings of Norway and kings of Norway list. On Google, these searches, result in respectively approximately 193.000 hits and approximately 43.000 hits. Trying the same searches12 on Yahoo results in respectively approximately 191.000 hits and approximately 41.300 hits.

12 These searches have been carried out in the beginning of July, 2003. Because of the dynamic nature of the Internet, similar results cannot be guaranteed. In the beginning of May 2003, these searches produced respectively 159.000, 40.300, 137.000 and 33.800 hits



Search term kings of Norway: On Google, most of the hits on the first result page, are about a book called “Heimskringla, The Chronicle of the Kings of Norway”. The 5th hit is the first to be relevant, a web site about the kings of Norway. However, to reach to information we are interested in, we do need to click on a button. The last hit on the first page (hit number 10), titled “Reigns of the Kings of Norway” is also relevant. But, on this page, results between 1412 and 1905 are only referred to, and not displayed. This is because Norway has been ruled by Denmark in the period 1412-1814, and by Sweden from 1814 to1905. So, both these two relevant hits lead us towards the information we are looking for, but it does not present it as a whole. On Yahoo the result page shows 20 hits per default. To keep the comparison fair, only the first 10 hits are taken into consideration. Just like with Google, most of the hits on Yahoo’s first result page are about a book called “Heimskringla, The Chronicle of the Kings of Norway”. Hit number 4 is the same as hit number 5 that Google presents, and hit number 9 is the same as Google’s hit number 10. Please refer to the paragraph above for the comments regarding these hits. Search term kings of Norway list: By adding ‘list’ to the search criteria, the number of hits is reduced to approximately 25% of the original number. But, the quality did not appear to improve. On Google, all except the 2nd hit describe books. The 2nd hit has a very nice list over all Norwegian kings from 872-presesent, with a clear notion about the Swedish (and Danish for that matter) union in certain eras. This is a very usable result, but it is still not exactly what we wanted; we only want the rulers between 1850-1950, so we can consider the kings that ruled from the year 872-1850, and 1950-present as noise (given a small overlap in time from 1844-1850 and 1950-1957). On Yahoo, the result are the same as on Google, the 2nd hit is equal to the 2nd hit on Google, and the rest of the hits are about books. Expanding the search in order to get the list mentioned in chapter 2, and only this list, will not be easy. The period 1850-1950 will most likely only cause irrelevant web sites to appear; it is impossible to declare a range of years in a simple search, so the year 1850 will be addressed as a year on its own, and so will 1950. Both of these years have nothing to do with the kings themselves. Calculating precision and recall values for this type of search is unfortunately not possible, since we do not know the exact number of relevant pages that exist on the Internet. One thing that we can conclude is that the number of retrieved pages is very high, and the number of retrieved relevant pages is low (for the hits that have been checked). This is one of the main weaknesses of this type of searching. The next paragraph will show options that the search-engines offer to narrow down the results.



3.2 Advanced keyword search Each search-engine has it’s own type of advanced search. Just as in the previous paragraph, only the top two search-engines will be high-lighted. This is to indicate what can be achieved by using advanced search methods instead of the simple search. The advanced search options of the two search-engines are very similar. In fact, the only option that Yahoo does not provide is searching for files stored in a specific file format. All options are mentioned in Table 1, and are pretty much self-explanatory. # Advanced search option shows results which are/have Google Yahoo1 All of the words √ √ 2 An exact phrase √ √ 3 Any of the words √ √ 4 Without the words √ √ 5 Written in a (one or more) specific language √ √ 6 Stored in a specific file format √ X 7 Updated in a certain time period √ √ 8 Words located at a certain point on the web-page/document √ √ 9 Located on a specific site or domain √ √ 10 Not filtered out because of explicit content √ √ 11 Similar to a certain web-page √ √ 12 A link to a certain web-page √ √ 13 Are about products that are for sale or fall in another category √ √ Table 1: Advanced search options of Google and Yahoo (√ = present, X = not present)

The main problem that we have seen in the results of the simple search is the presence of book references, which are irrelevant for us. Searching the WWW means balancing the query so that advertisements (for example for books) are filtered out, while useful information is not lost. This can prove to be a quite tricky task, as one never knows how pages are built up, and if a page/document is about more than one topic. An easy way of getting rid of book references is making sure the term “ISBN” is not part of the result set. There is a potential danger in this approach, since the page/document that actually supplies the information we are looking for might mention that the list is also contained in book X, with ISBN 123456789-0. In this case we would actually lose valuable information! A quick test shows that this approach is not effective, since the web sites that mention the book, do not mention “ISBN”. Again, we do not know the number of relevant pages that exist on the Internet, but yet again we can say that the more detailed our search is, the fewer pages are being returned. It is quite possible that relevant pages also are being omitted, which means that the results set could actually lose quality as well.



Other than excluding certain words, we could try two other options to narrow down the result set. In our case, option 5 and 10 (see Table 1) might be applicable. By indicating we only want to receive results written in English, we eliminate the possibility that a word may have different meanings in different languages, hence improving the result set. And by indicating we do not wish to receive results that include explicit content, we may again narrow down the result set (even though it is quite unlikely that the explicit content providers also started using the keywords king and Norway to attract/fool more customers). In other cases many of the other options mention may prove to be very useful. We will not go deeper into the advanced capabilities of these search-engines, as it adds no more valuable information for the graphical ontology search. If you are interested in these capabilities, please visit Google or Yahoo on the web, and navigate to the advanced options.



3.3 Natural language search Now that we have seen the possibilities and shortcomings of the simple search and (a very short overview over) advanced search features of both Google and Yahoo, we will show what natural language search tools have to offer. An example of a natural language search engine is AskJeeves13. It gives the user the option to ask a question, or type a sentence to carry out a search. Another product capable of carrying out natural language searches is CORPORUM Knowledge Server14. CORPORUM™ is a generic name for a group of products supplied by CognIT a.s, Norway15, helping to increase access to and the sorting of relevant information.

3.3.1 AskJeeves AskJeeves offers the possibility to write a question to search the Internet, so instead of trying to think of a smart way of combining search terms, we can simply write a question the way we are used to in day-to-day communication, which seems to be much more convenient. The tested query is as follows: Who were the kings of Norway in between the year 1850 and 1950? However, the time period seems to be a confusing factor for AskJeeves, none of the hits on the first page provide the information we are looking for. Instead, the information presented is about things that happened in the year 1850 and 1950, in other words, the range is not detected as a range. Since AskJeeves also supports sentences that are not formed as a question, we should also test the imperative form (since we already found out that ranges are not detected, we try without a range): List the kings of Norway! This did not help a lot either; the 7th hit provides a link to a link where a full list of kings of Norway is shown (the same list as mentioned earlier in the simple search paragraph). The main problem with all of the search options we have seen so far is the lack of semantics. Words that are part of the query are not seen in relation to each other, rather as separate entities. This causes the result page to be ‘polluted’ with non-relevant information (each of these results includes a combination of the words ‘kings’, ‘Norway’, ‘list’, etc. but there is not necessarily a relation between these terms).

13 http://www.ask.com 14 A live demo: http://www.corporum.com 15 http://www.cognit.no



3.3.2 CORPORUM™ technology The introduction for CORPORUM™ technology is much more elaborate than the previously mentioned technology types. This since references to CORPORUM™ and Mímír technology exist throughout this thesis. In general, CORPORUM™ uses advanced language and knowledge technology based on semantic analysis of text and cognitive models. Chunks of the information presented in this paragraph are adopted from [BJ00]. The CORPORUM™ products are powered by a technology, called MÍMÍR16. The MÍMÍR core is based on two basic concepts: A concept extraction facility Concept extraction focuses on the semantics of a text, and is capable of finding relations between the concepts presented in the text. MÍMÍR looks at the concept behind the term; the concept extraction effort combines natural language processing and knowledge intensive methods. A resonance algorithm The resonance algorithm enables comparison between the contents of two texts. This implies that the conceptual structures of the two texts are analyzed against each other. The resonance metaphor stems from the idea that the match process triggers violent reaction if the frequency of the emitted signal (the text that is compared with the other text) is close to the natural undamped frequency of the objects themselves. In other words if there is a good match then there is high resonance with respect to the content of the text found. The difference in amplitude determines the degree of resonance. The information model that represents the semantic content of the text that is being analysed will yield a broad, albeit a minor response if there is a form of resonance. The set of amplitudes generated reflect the degree of match and determine the contextual and thematic aspect of a text. This means that it is possible to identify relevant sections of a document without necessarily having to refer to the document as a whole. Another feature is that searches, cataloguing, summaries and extractions can be performed according to specific interests or user profiles. These profiles are expressed in natural language in a manner that mimics the process where a person asks a librarian for help. The agents17 function as the librarian who knows the user's interest profiles and then uses that knowledge to recommend information that is relevant to the user's defined interests. The analogy is like the good old librarian who tells a user "I know you're particularly interested in this topic. Here I have a new book that I think will interest you." In this thesis we will mention the following 3 CORPORUM™ products:

- CORPORUM™ Summarizer - CORPORUM™ OntoExtract - CORPORUM™ Knowledge Server

16 MÍMÍR technology is invented by CognIT. 17 See paragraph 3.3.3 for an explanation of how agents are used in CORPORUM™ Knowledge Server



The name “CORPORUM™ Summarizer” already describes what this product does, it summarizes texts. There are two different approaches to summarizing. The first approach uses the capabilities of MÍMÍR to extract the most interesting parts of a text and display this as a summary. The other approach uses a user-defined interest model. A user describes what he is most interested in and MÍMÍR creates a summary taking into consideration this interest model. An elaborate description of CORPORUM™ OntoExtract can be found in paragraph 5.1.1.

3.3.3 CORPORUM™ Knowledge Server CORPORUM™ Knowledge Server (KS)18 uses MÍMÍR to extract the most valuable information from a source text and to compare this semantic information with results found on the Internet/Intranet, or in a database. KS can be operated in two different ways. The first is specifying a software agent for the particular search, and the second is to use a pre-indexed knowledge base. In this paragraph we will focus on the use of software agents for searching. Refer to the user manual of the product for an elaborate explanation of both options. An agent specification includes writing a text in plain natural language, specifying what to look for as well as a number of URLs to use as a starting point for the search. URLs are defined with an internal and external recursion19. An agent (or web-crawler) will crawl and analyse each of the encountered pages/documents by running these through the MÍMÍR core. The resonance algorithm compares the semantic information of the found information to the contents of the agent. The amplitude is converted to a score and the most important concepts of the found page/document are displayed on the result page. These results can be viewed as a histogram to make it very easy to find where in on the page/document the important information resides. Since KS is not meant as an Internet search engine20 as we are familiar with, but rather as a "librarian" system, we have to use another way of showing the capabilities of this product. We will use the results of the previously mentioned search-engines as a starting point for the agent definition and let KS filter out the relevant information from this sub-set of the Internet, rather than crawling the whole Internet (this is too resource demanding). Each of the encountered pages/documents are matched with the query. By using this approach , we rely on the fact that the search-engines at least come up with some relevant pages, which then are picked up by KS and sorted by relevance.

18 Screenshots can be found in Appendix A. 19 Internal and external recursion are terms used by CognIT a.s to describe link ‘depth’. Say that one of the specified URLs is http://www.cognit.no and has an internal recursion of 5 and an external recursion of 2. The web crawler (or agent if you wish) will then go 5 links in ‘depth’ of that site. In case it finds links to external resources (meaning web site outside of the domain http://www.cognit.no), it can continue looking there as well. In case this external web site has a link to yet another web site, it continues looking there, but can not go any further than that since the recursion level is only set to 2. 20 Is it possible to use CORPORUM™ Knowledge Server as an Internet search engine, yet this would require large investments in bandwidth and hardware in order to support the processing (indexing) and storage. The real power of MÍMÍR lies in the resonance algorithm, which cannot be used to its full extent when pre-indexing the whole WWW.



3.4 Summary Typical search engines for the WWW, for example Google and Yahoo, have the problem that they usually return incredibly large amounts of results out of which a very large part is irrelevant for the person doing this search. Most search engines offer advanced features to help the user get more relevant results, yet they still lack the ability to deal with semantics. AskJeeves tries to include some semantics in the query, by letting a user define a question rather that some key words, yet the results are not semantically matched based on the query, so still there are irrelevant results. The problem with all three of these search engines is that too many results are presented, most of which are irrelevant as an answer to the query. CORPORUM™ Knowledge Server is not a standard type of search engine, as it does not start to look for answers before the question is asked. The query in this case can be typed as plain natural language, which forms the basis for an information retrieval effort by means of semantically checking if a retrieved page/document is relevant for the query. Since the search effort is time bound, there is a chance not all relevant sites are visited, although the sites that have been visited and contain relevant information are displayed and arranged according to semantic relevance. CORPORUM™ can serve as an intermediate step towards to Semantic Web (see chapter 4), since it uses semantics (concepts and associations) for search.



4 The Semantic Web, an introduction After presenting the current WWW and its search methods, we would like to introduce the ‘next version of Internet’: The Semantic Web. Before explaining what The Semantic Web means, it is best to break down the term first. The following definitions are from the Longman dictionary [Lon98]: se-man-tic /adj of meaning in language se-man-tics /n the study of the meaning of words and other parts of languages World Wide Web, the /abbrev. WWW, also the Web The system for making information available, anywhere in the world, to computer users who are connected to the Internet. Users can surf the Web or surf the Net (=search for information by going from one information page to another) by using a computer program called a browser. Combining these expressions, we get something where users can search for information by meaning. Or, in other words: “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [BHL01] Today’s Internet is based purely on content, the only structure being grammar valid for the language the content is written in, if you’re lucky! In other words, the current web is man-readable, but it is not exactly easy for machines to understand what is published, be it writing, images, or video streams. In chapter 2, we have already identified some of the shortcomings of the current WWW regarding retrieving information, the main one being the fact that no semantic information is available. Current search engines are keyword based, which gives high recall, but low precision. Keyword based searches are sensitive to vocabulary, if one uses a synonym of a word that is mentioned on a web page, this page will not be found to be relevant. The Semantic Web will make searching easier by adding meaning to the concepts used on a web page. The Semantic Web is all about creating a Web that is understandable by both man and machine. Computer users will still have the information presented in the way they are used to, but for the machines The Semantic Web is a breakthrough. Now machines don’t have to reason based on grammar and mark-up languages meant to be interpreted by humans (and not computers) anymore, because the semantic structure of the text is already included. With The Semantic Web it will be a lot easier to find what you are looking for, since everything is already placed in context.



The Semantic Web: • Allows effective combination of the independent work of diverse

communities. • Supports the ability to add new information without insisting that the old be

modified. • Provides communities the ability to resolve ambiguities and clarify

inconsistencies. Uses descriptive conventions that can expand as human understanding expands. [BHL01]

“The Semantic Web will globalise knowledge representation, just as the WWW globalised hypertext” (W3C-director Tim Berners-Lee) Soon machines can process and “understand” the information that now is only displayed; information like the list over the kings of Norway in a certain time period can easily be extracted with query languages for The Semantic Web (see chapter 4.2).

Figure 2: The Semantic Web architecture21 (Tim Berners Lee, XML 2000).

The Semantic Web is a stack of languages, each of which adds a bit to The Semantic Web. In the ultimate situation all layers of this stack are used to ensure the best possible security and information value level. Until that time, these layers can be used one after one. In the table below, a short, non-detailed, explanation is given for each layer.

21 Numbers added for reference in Table 2.

7

4

5

6

3

2

1

8



Layer Description

1 Unicode is a way of representing text on a computer. URI stands for Uniform Resource Identifier, a generic set of all names and addresses that refer to resources (a resource can be literally anything).

2 XML and XML schema describe semi-structured data to give machine accessible meaning to a piece of information, by defining a schema for, for example, a CV. Then this CV can be chopped up in some main parts like “name”, “education”, “experience” and “private”. This way the machine already knows the context of the information which makes it easier to process the text entered in these main parts.

3 Since layer 2 lacks the possibility to create a domain specific ontological vocabulary (among other things), the RDF and RDF Schema (RDFS)22 layer provide a metadata layer and a domain specific library.

4 W3C defines ontologies as: “Ontologies figure prominently in the emerging Semantic Web as a way of representing the semantics of documents and enabling the semantics to be used by web applications and intelligent agents. Ontologies can prove very useful for a community as a way of structuring and defining the meaning of the metadata terms that are currently being collected and standardized. Using ontologies, tomorrow's applications can be "intelligent," in the sense that they can more accurately work at the human conceptual level.” [Hef03] The ontology vocabulary can be defined in DAML+OIL [Hor+01] (or OWL [Hef03]). It extends RDFS with logical expressions, data typing, cardinality and quantifiers. This enables wide interoperability.

5 This layer can include any system that can validate proofs. It will not assume one standard engine, which means that inference capabilities differ.

6 Note: Not much information is available about this layer online (or in papers or books for that matter). Proof can be interpreted in two different ways. The first one is about proving that a person is who he says he is; Who/what to trust on The Semantic Web. Anybody can say they are the ruler of the universe and therefore have access rights to anything, yet without proving this somehow, they will not be believed. The second interpretation is about proving that an information source provides correct information (part of this is explained in trust).

This table is continued on the next page.

22 For more information on RDF and RDFS, refer to paragraph 4.2.



Layer Description

7 Note: Not much information is available about this layer online (or in papers or books for that matter). Which person or which resource can you trust on the web? Perhaps it is possible to prove that a person or resource actually are what they say they are, yet the information they provide does not necessarily have to be true. If a person writes that a whale is a fish, he is giving false information, since a whale is a mammal. If two people say different things about the same subject, for example a whale being a fish or a mammal, who do we trust and why? A possible solution for this is a so called “web of trust” where you can define who you trust. Since this person is likely to trust some people as well, you can use these people as a base of trust as well, hence the web of trust. More information on the web of trust can be found in the presentation of Tim Berners Lee23.

8 Digital signatures are not a layer but a support part. They make sure that the information presented in the adjacent layers can be verified to be coming from the institution or person it claims to be coming from, as well as it was not tampered with in transfer. It is possible to use existing technology for this, for example MD5, PGP, etc.

Table 2: Short explanation of Semantic Web layers

4.1 Ontologies “Ontologies figure prominently in the emerging Semantic Web as a way of representing the semantics of documents and enabling the semantics to be used by web applications and intelligent agents.” (W3C). Longman dictionary [Lon98]: on-tol-o-gy /n the branch of philosophy concerned with the nature of existence. Unfortunately, the Longman dictionary only gives the ‘well-known’ meaning of the word. Artificial-intelligence and Web researchers have co-opted the term for their own jargon. For them an ontology is a document or file that formally defines the relations among terms. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules. [BHL01] Item 4 in Table 2 on page 19 shows W3Cs definition of an ontology an how it is used on The Semantic Web, while the following definition is also widely accepted.

23 See http://www.w3.org/2002/Talks/04-sweb/



“An ontology is an explicit specification of a conceptualisation. The term is borrowed from philosophy, where an ontology is a systematic account of existence. For AI systems, what "exists" is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, in the context of AI, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory.”[Gru93] Context is very important for knowing in which sense a word is used. Ontologies represent not only the words, but also their context, by binding words together. By knowing in which sense a word is used, it is much easier to communicate with like-minded people, or in our case documents. Ontologies can describe relations between two concepts with representational terms such as “is-a”, “has-a”, “is-subclass-of”, “has-part”, etcetera, to name just a few. A more exhaustive overview will be presented in paragraph 5.5. Ontologies are used as a common vocabulary for a domain that is relevant for certain groups of people, be it researchers, be it professionals from whichever field. Domain ontologies can serve as a conceptual model for the application area. Well-defined (domain) ontologies can serve as a knowledge base and enable intelligent knowledge retrieval and knowledge management. The Semantic Web will become some sort of huge knowledge base where many different points of view (= ontologies) will be represented, just as the WWW today is a huge collection of information with many different points of view. However, the main advantage of the Semantic Web is that everything is machine-understandable and all information (or knowledge) is represented semantically. Some reasons to build ontologies are:

• To share common understanding of the structure of information among people or software agents

• To enable reuse of domain knowledge • To make domain assumptions explicit • To separate domain knowledge from the operational knowledge • To analyze domain knowledge

Note: These points are taken from [NM01], an elaborate explanation of these points can be found in that paper. So, when knowledge is explicit and available, we must be able to draw use of it by being able to search this information/knowledge. In this thesis searching information stored as ontologies by using self made small graphical ontologies is suggested. These searches will be conducted on The Semantic Web. More about this method of searching is shown in paragraphs 5.4 and 5.5.



4.2 RDF/RDFS Information on the Semantic Web is often expressed in RDF (Resource Description Format) or RDFS (RDF Schema), languages that can express ontologies. This paragraph is meant to give a short overview of what RDF and RDFS are and what we can do with them. For an elaborate description of the complete RDF and RDFS syntax as well as more examples, refer to respectively [LS99] and [BG03]. “RDF started as framework for metadata; providing interoperability between applications that exchange machine-understandable information on the Web. RDF is a universal format for data on the Web, which allows structured and semi-structured data to be mixed, exported and shared across different applications. ” [MSB03] The basic RDF data model consists of three object types [LS99]:

• Resources: A resource can really be anything, a (part of a) web page, a collection of documents, even a book in a library (the resource does not have to be present on the web). Resources are identified by a Universal Resource Identifier (URI). Since a resource can be anything, anything can have an URI.

• Properties: A property determines a certain aspect, characteristic, attribute or relation to describe a resource.

• Statements: A specific resource together with a named property plus the value of that property for that resource is an RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate, and the object.

With these object types we can create:

• Schemas and Ontologies • Statements about Properties • Statements about Statements

RDF statements are expressed in so called triples, a triple may look like the following:

This triple states that Oscar I (subject) was the ruler of (predicate) Norway (object). “RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources. RDF however, provides no mechanisms for describing these properties, nor does it provide any mechanisms for describing the relationships between these properties and other resources. That is the role of the RDF vocabulary description language, RDF Schema. RDF Schema defines classes and properties that may be used to describe classes, properties and other resources.” [LS99]

<rdf:RDF> <rdf:Description about="Oscar_I"> <s:rulerOf>Norway</s:rulerOf> </rdf:Description>

</rdf:RDF>



4.3 Querying The Semantic Web Of course it is nice to have a Web where all information is available in a machine-understandable form, but it’s quite pointless if there’s no way of accessing this information. Over time, some Semantic Web query languages have been developed, yet in this chapter we will only focus on RQL, as, at this point in time, this seems24 to be becoming the standard query language for The Semantic Web. To give an impression on how these queries work, the web page [KoN] that was used for the WWW search introduction (chapter 2) is not used, since this would only include one page in one namespace25. The real power of the Semantic Web and Semantic Web queries becomes clear when using multiple namespaces. Hence, we use some entries of an online encyclopedia26 as an example. Each of the previously mentioned kings is presented on his on web page: Oscar I http://www.encyclopedia.com/html/O/Oscar1.asp Carl/Charles XV http://www.encyclopedia.com/html/C/Charles15S1we.asp Oscar II http://www.encyclopedia.com/html/O/Oscar2.asp Haakon VII http://www.encyclopedia.com/html/H/Haakon7.asp To save space in Schema 1, in some cases references to URIs and Properties are used instead of the full URI or Property. The following tables can be used as a key: Name URI Reference Oscar I http://www.encyclopedia.com/html/O/Oscar1.asp URI1 Carl/Charles XV http://www.encyclopedia.com/html/C/Charles15S1we.asp URI2 Oscar II http://www.encyclopedia.com/html/O/Oscar2.asp URI3 Haakon VII http://www.encyclopedia.com/html/H/Haakon7.asp URI4 Sweden http://www.encyclopedia.com/html/S/Sweden.asp URI5 Norway http://www.encyclopedia.com/html/N/Norway.asp URI6 Table 3: URI references for Kings of Norway example

24 This is an assumption by the author, and is not based on facts found in literature. 25 Namespaces are a way to tie a specific use of a word in context to the dictionary (schema) where the intended definition is to be found. (http://www.w3.org/TR/REC-rdf-syntax/#schemas) 26 http://www.encyclopedia.com



Property Reference kings:rulerOf P1 kings:countryName P2 kings:title P3 kings:successor_nr P4 kings:ruledFrom P5 kings:ruledUntil P6 kings:firstName P7 kings:lastName P8 kings:marriedWith P9 kings:siblingOf P10 Table 4: Property references for Kings of Norway example

In short, Schema 1 (see page 25) shows the following information:

- King Oscar I, was the king of both Norway and Sweden and ruled from 1844 to 1859. Further we can derive that he has (at least) two siblings, Carl/Charles and Oscar.

- King Carl XV, was the king of both Norway and Sweden and ruled from 1859 to 1872. He was the sibling of king Oscar I.

- King Oscar II, was the king of Norway and ruled this country from 1872 to1905 and king of Sweden, which he ruled from 1872 to 1907. He was the sibling of king Oscar I.

- King Haakon VII, was the king of Norway and ruled from 1905 to 1957.



Schema 1: Kings of Norway example

RD

F/R

DFS

laye

r an

d na

mes

pace

A

pplic

atio

n sp

ecifi

c sc

hem

a an

d na

mes

pace

A

pplic

atio

n sp

ecifi

c ac

tual

dat

a (p

artia

l)

subClassOf instanceOf relation

P10

URI2

URI1

URI4

URI3

king

Oscar

I

king

Carl/Charles

XV

king

Oscar

II

king

Haakon

VII

P3 P7

P4

P3 P7

P4

P3 P7

P4

P3 P7

P4

P10

Norway

URI6 1905

1957

Norway

1859

1872

Sweden

URI6

URI5

1859

1872

Norway

1872

1907

Sweden

URI6

URI5

1872

1905

Norway

1844

1859

Sweden

URI6

URI5

1844

1859

P1

P1

P1

P1

P1

P2 P5

P6

P2 P5

P6

P2 P5

P6

P2 P5

P6

P2 P5

P6

P2 P5

P6

P2 P5

P6

P1

P1

kings:Area

kings:Man

kings:Person

kings:Woman

kings:King kings:Queen

kings:Regent

rdfs:Resource

kings:firstName

kings:title

rdfs:Property rdfs:Class

kings:ruledUntilkings:ruledFrom

kings:rulerOf

kings:marriedWith kings:siblingOf

kings:successor nr

kings:lastName

kings:countryName

kings:Country



4.3.1 RQL RQL (RDF Query Language) is a declarative query language for RDF. Currently there are two implementations available of RQL, the original implementation is developed by ICS-FORTH Greece27, which is the foundation for the second implementation (which features better compliance to W3C standards [DFH03]) by Aidministrator the Netherlands28. The ICS-FORTH RQL implementation has its roots in the query language OQL (Object Query Language), a query language relying on a functional approach. However, this language, as any other existing query language, is unable to handle the many peculiarities of RDF. For details about these peculiarities, refer to [Kar+01]. An RQL query looks very similar to an SQL query, yet the usage differs a quite bit. It has three clauses: select, from and where. How to use RQL will be explained in the next paragraph.

4.3.1.1 RQL queries With RQL we can execute many different types of basic queries. Some of these queries will be shown help understand how RQL queries can be used. The basic queries listed in Table 5 can be used on their own, or as a part of a more complex query. All of these basic queries can also be put in a SQL resembling form. In Appendix B, for each of the queries mentioned in Table 5, we will present the basic query, the SQL version of this query, as well as the answer, based on the Kings of Norway example. In the examples we use prefixes to distinguish between the different variables. Class variables are prefixed with a $ and the property variables with a @. This paragraph is based on the information presented in [Kar+R01], [BK01] and [KC], so some similarities may occur. For an elaborate explanation of the queries, please refer to [Kar+01], [BK01] and [KC].

27 http://www.ics.forth.gr 28 http://www.aidministrator.nl



# Query Retrieves 1 Class All known classes in the repository 2 Property All known properties in the repository 3 The full name of a class, for

example: King All known instances of this class. In this case, all know instances of the class “King” and all sub-classes.

4 The full name of a class, prefixed by a ^, for example: ^King

All proper instances of this class. A proper instance of a class C is an instance that is not also an instance of any subclass of C.

5 The full name of a property, for example: rulerOf

The source and target values of this property. In this case, all known resources that have a property “rulerOf”, or a sub-property of “rulerOf”, and the values to which the property connects the resource. In short it means that this query retrieves the subject and the object of all RDF data triples that have “rulerOf” or any sub-property of “rulerOf” as their predicate.

6 The full name of a property, prefixed by a ^, for example: ^rulerOf

The resources that have a property “rulerOf”, and the target value of that property. Sub-properties are not part of the result.

7 subClassOf(NameOfClass) All known subclasses of the class NameOfClass (replace this by the class name)

8 subClassOfˆ(NameOfClass) Only the direct subclasses of the class NameOfClass (replace this by the class name)

9 ˆsubClassOf(NameOfClass) All proper instances of all known subclasses of the class NameOfClass (replace this by the class name)

Table 5: Basic RQL queries



Of course all these queries can also be combined to more complex queries to narrow down the result set. This usually does require the user to have some knowledge about the resource (which can be obtained as a result of a more simple query). An example of such a combine query is the following: select W from {X} firstName {Y}, {Z} rulerOf {W}. countryName {Q} where X = Z and Y = "Oscar" This expresses the following query: “Find the countries that are ruled by someone named “Oscar” “. Wildcard searches are also allowed in RQL, which is a big help when a person does not have enough knowledge about the used schemas. Example (taken from [BK01]): “Find the target of properties whose name matches ”*name”, and the class of that target” select Y, $Y from {X} @P {Y : $Y} where @P like "*name"

4.4 Summary Creating a Semantic Web can solve the irrelevance problem that we described in the previous chapter. The Semantic Web will be a successor of the current WWW, which will include information in a machine-understandable form. Semantics are stored and ready to be processed by computers. Ontologies play a large role on the Semantic Web as they are used to represent the semantics of any text or document. Of course, such a Semantic Web, where information is neatly structured and the semantics are included to make machine-interpretation possible, is quite pointless if there is no way of retrieving this stored information. Some query languages and inference engines have been developed for this task, of which we have described the RDF Query Language, RQL. RQL is used to retrieve information stored on the Semantic Web.



5 The Graphical Ontology Designer Environment This chapter shows different building blocks and ideas which can be used as a foundation for building the Graphical Ontology Designer Environment (GODE). Mostly, the functionality and capabilities of the GODE will be described, but some ideas for the GUI will also be presented to give the reader an impression of what of what the GODE might look like. A number of ontology development tools already exist, so these platforms can serve as an experience board or as a test bed. An overview of these tools is presented in paragraph 5.2. The problem with these tools is that they are mostly aimed at the knowledge engineer, and not the ‘normal’ WWW user (who probably is not even aware of any Semantic Web yet). The same goes mostly for the query languages described in chapter 4.2. For a user who is used to simply typing some words and then get presented a list of results, these approaches are far too difficult and unintuitive. GODE will make sure that these users as well may benefit fully from the Semantic Web. Perhaps not as easy as simply typing the search concepts and push a button, is carrying out a search based on a graphically presented ontology. This chapter will introduce the GODE as a platform for this type of search, along with the possibilities and potential dangers. Before moving to the explanation of the GODE, first the meaning of an ontology will be explained and an algorithm that can be used for representing relations between concepts will be shown.

5.1 Representing an ontology Ontologies can be represented using various methods, in XML, RDF, with topic maps29, etc. However, these are text-based and usually rather voluminous. In addition ontologies can also be displayed using UML, hyperbolic view or as a tree, but for representing them in a graphical way, the spring embedder algorithm provides an answer. The spring embedder algorithm [Ead84] is a heuristic approach to graph drawing based on a mechanical system in which a graph's edges are replaced by springs and vertices are replaced by rings. From the initial configuration of ring positions, the system oscillates until it stabilizes at a minimum-energy configuration. [KF96] During the OnToKnowledge project30, a whole range of different tools have been used and developed. Two of these tools are OntoExtract [EB01b] and the CCA viewer31. The CCA viewer has only been used in the OTK project to a very limited degree, and has never left the project boundaries. Hence, no public references exist. This paragraph will describe these two tools to give a better understanding of the following paragraphs.

29 Topic Maps will not be explained in this thesis, refer to [TOP] and [Pep00] for more information. 30 On-To-Knowledge is a project in the European Union’s Information Society Technologies (IST) Program for Research, Technology Development & Demonstration under the 5th Framework Program. The project runs from 1999 to 2002. Project web site http://www.ontoknowledge.org. See also [Fen+00] and [Fen+02]. 31 OntoExtract has been developed by CognIT a.s, Norway, and the CCA viewer, by AIdministrator, the Netherlands.



5.1.1 CORPORUM OntoExtract CORPORUM OntoExtract (with the Ontology generator as a GUI), usually referred to as OntoExtract or OE, is based on the MÍMÍR core32 and has the capability to extract ontologies from a piece of natural language text and display it in XML or RDF (dependant on settings). The ontology generator program (which uses the OntoExtract technology) has a very straightforward interface (see Figure 3). It has an input pane, an output pane and a CCA view pane (which is not in use in version 0.3). Instead of viewing the graph view in the CCA pane, we display the graphs in the original CCA viewer (see paragraph 5.1.2). One can load a file, or type text that will serve as input on the input pane. The user can choose from different output options, 3 different XML outputs or the RDF OIL output. As mentioned before, these output types are designed for use in the OTK project and are not necessarily compatible with other programs. The XML v0.1 standard is compatible with the author’s CCA viewer and will therefore be used to illustrate. However, some RDF OIL output will also be shown to illustrate relations in RDF. When RDF OIL output is selected, one can chose to include background knowledge in the ontology output. Background knowledge means including meronyms or hypernyms33 for each noun. WordNet34 provides these meronyms and hypernyms. By running a noun through WordNet, the ontology is enriched by adding extra meaning around the noun, making it easier to find related resources when these use a slightly different description of the same noun. We will show excerpts of the RDFS output (without background knowledge) as well as the XML output, both accompanied by an explanation. The full output can be found in Appendix C.

32 Refer to paragraph 3.3.2 CORPORUM™ technology for more information on the MÍMÍR core. 33 These, and other, relation types will be explained in paragraph 5.5. 34 WordNet® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets. [WN] See http://www.cogsci.princeton.edu/~wn/ for more information.



Figure 3: Ontology generator, an OntoExtract interface

The RDFS output describes the ontology as extracted by OE. Yet, before the ontology is described, the RDFS output contains Dublin Core Based Ontology Metadata35, followed by the schema and property definitions. The ontology information first describes all concepts that are present in the ontology and defines them as a subclass of either #TOP or #MISC, as shown in the excerpt below.

35 Which contains fields for: title, creator, subject, description (a summary), publisher, date, type, format and language. .See http://dublincore.org/ for an elaborate explanation.

<oe:Concept rdf:about="#resource"> <rdfs:label xml:lang="en">resource</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#background_knowledge"> <rdfs:label xml:lang="en">background knowledge</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept>



Then specifiers for the concepts are given. The excerpt below states that “#interface_1” has the properties “straightforward” and “very”: In the same part we define sub-classes, for example that “#CCA_view_pane” is a subclass of “#pane”, as shown in the excerpt below. Since the CCA viewer requires XML as input, showing how the XML output is built up is shown next. First a list of all important concepts found in the input is created. This is shown in the excerpt below (note that these are just some and not all of the extracted concepts.)

Then, the relation strength between these concepts is displayed. Relation strength is presented as a triple, shown in the excerpt below.

<RELATION> <CONCEPT>MÍMÍR_core</CONCEPT> <STRENGTH>0.8800</STRENGTH> <CONCEPT>MÍMÍR</CONCEPT> </RELATION>

<CONCEPTLIST> <CONCEPT>language</CONCEPT> <CONCEPT>text</CONCEPT> <CONCEPT>display</CONCEPT> <CONCEPT>MÍMÍR_core</CONCEPT> <CONCEPT>MÍMÍR</CONCEPT> <CONCEPT>core</CONCEPT> <CONCEPT>ontology_generator</CONCEPT> <CONCEPT>ontology</CONCEPT> <CONCEPT>generator</CONCEPT> </CONCEPTLIST>

<rdf:Description rdf:about=""> <oe:isAbout> <ns1:interface rdf:about="#interface_1"> <oe:hasSomeProperty xml:lang="en">straightforward</oe:hasSomeProperty> <oe:hasSomeProperty xml:lang="en">very</oe:hasSomeProperty> </ns1:interface> </oe:isAbout> </rdf:Description>

<oe:Concept rdf:about="#CCA_view_pane"> <rdfs:label xml:lang="en">CCA view pane</rdfs:label> <rdfs:subClassOf rdf:resource="#pane"/> </oe:Concept>



Relation strength is bi-directional, as one concept usually is not related to another concept with the same relation strength. This means that MÍMÍR can be stronger related to MÍMÍR_core than the other way around (just like whale is more related to an ocean than an ocean to a whale). The relation strength is used by the CCA viewer to create a graphical representation of the ontology. However, the CCA viewer is required to use the average relation strength to create the output. Information on how the CCA viewer works in shown in the next paragraph.

5.1.2 CCA viewer The CCA viewer interprets the XML output from OntoExtract and creates edges between each concept that is related to another concept (see Figure 4). Then, this network of concepts is run through the spring embedder algorithm, and displays the ontology graphically (see Figure 5). Closely related concepts are placed near each other, less related concepts are placed further away. Figure 4: Screenshot of CCA viewer before running the Spring Embedder algorithm



Figure 5: Screenshot of CCA viewer

after running the Spring Embedder

algorithm.

The spring embedder algorithm is based on the functioning of a spring, where nodes are connected to each other by means of strings. The spring strength will attract or repel the nodes until minimum energy strength is reached [Ead84]. When one imagines the link strength between concepts as the strength of the springs, it is possible to graphically display the relation between words. However, words are not always equally related to each other (e.g. A king is more related to Norway than Norway to a king), therefore some information may be lost when average link strength is used. The parallel with the springs will make it easier to understand how to create your own graphical ontology, the more related concepts are, the stronger the spring and the closer these concepts are to each other. And when concepts are related less to each other, the spring is weaker and therefore the distance between the concepts will increase.



5.2 Existing Ontology development tools A wide variety of ontology development tools are already available. The OntoWeb consortium36 has created a survey on ontology tools [OWC02]. This paragraph will show a summary for each of the ontology tools, based on this survey. For the full survey, including the evaluation parameters, please refer to the OntoWeb survey [OWC02]. In this survey all URLs to the vendors are available. In addition to these ontology tools, the KAON tool is included. Name of ontology tool Description Apollo Apollo supports all the basic principles of knowledge

modelling: ontologies, classes, instances, functions and relations. Implemented in Java.

Linkfactory LinkFactory is a formal ontology management system, designed to build and merge very large and complex language-independent formal ontologies. Implemented in Java.

OILEd OILEd is a graphical ontology editor that allows the users to build ontologies using DAML+OIL. The main task that OILEd is targeted at is editing ontologies or schemas. It provides a consistency check for DAML+OIL. Implemented in Java.

OntoEdit OntoEdit supports the development and maintenance of ontologies by graphical means. OntoEdit comes in a free and in a professional version; only the professional version of this program has to ability to work with graphical ontologies.

Ontolingua Server Ontolingua Server is a set of tools and serves that support the building of shared ontologies between distributed groups. The ontologies are accessible via OKBC, which enables remote editors to browse and edit ontologies.

OntoSaurus OntoSaurus consists of two modules: an ontology server and an ontology browser server that automatically creates HTML pages to display the ontology hierarchy. The ontology can be edited by HTML forms.

OpenKnoME KnoME is a large suite of tools for the collaborative development of ontologies in the GRAIL concept modelling language. One of the important tools is Tigger, which is developed for rapid acquisition of knowledge from domain experts who are untrained in ontology engineering. Implemented in Smalltalk.

This table is continued on the next page.

36 OntoWeb, project number IST-2000-29243, is a project in the European Union’s Information Society Technologies (IST) Program for Research, Technology Development & Demonstration under the 5th Framework Program. Project web site http://www.ontoweb.org.



Name of ontology tool Description Protégé-2000 Protégé-2000 provides a graphical and interactive ontology-

design and knowledge-base-development environment. It helps knowledge engineers and domain experts to perform knowledge management tasks. Tree controls allow quick and simple navigation through a class hierarchy.

SymOntoX SymOntoX is a software prototype for the management of domain ontologies. Domain concepts and relation are modelled according to OPAL, a methodology for ontology representation.

WebODE WebODE is an ontological engineering workbench that provides varied ontology related services and covers and gives support to most of the activities involved in the ontology development process and in the ontology usage.

WebOnto WebOnto supports collaborative browsing, creation and editing of ontologies, which are represented in the knowledge modelling language OCML. Ontologies can be managed using a graphical interface.

KAON KAON is an open-source ontology management infrastructure targeted for business applications. It includes a comprehensive tool suite allowing easy ontology creation and management, as well as building ontology-based applications. An important focus of KAON is on integrating traditional technologies for ontology management and application with those used in business applications, such as relational databases. [KAON]

As mentioned before, these tools are aimed at the knowledge engineer and are meant as (complex) ontology creation and management suits. The meaning of the tools is to enable the knowledge engineer to create an ontology representing a whole domain. What is suggested here is a simple environment to create graphical ontologies for searching the (Semantic) Web. The next paragraphs describe the environment, as well as the search functionality of the GODE.



5.3 GODE GUI and functionality proposal The interface for the Graphical Ontology Designer Environment should contain:

- A canvas for drawing the ontologies, using a graph editor37 (see Figure 6 for an impression)

o This canvas includes two main circles, one for the main focus, and one for background knowledge. Both circles can be adjusted in size

o The canvas part not covered by the circles can serve as a “temporary canvas” where concepts can be put “on hold” (meaning they are not part of the search)

- Result page - Load and save function - Tools for

o Making concepts o Editing concepts o Removing concepts o Creating links between concepts o Editing links between concepts (relation strength, type of relation, etc) o Removing links between concepts

- Example ontologies - A semantic knowledge base, or connection to a semantic knowledge base - Concepts suggestion pane, to show related concepts - A help function - Algorithms to translate the graphical queries to:

o Advanced search queries for the WWW o Semantic Web queries (RQL, TMQL, etc.)

Figure 6: Example canvas

37 A graph editor has the possiblity to add, edit, move or remove the nodes and edges on the graphical ontology. The CCA viewer is a graph viewer and is limited to viewing the graphical ontology.

Legend for Figure 6: 1 – Canvas 2 – Background knowledge concepts 3 – Main concepts Note: The canvas area can be used to

temporarily store concepts that will not be part of the search.

1

2 3



5.4 Simple search The simple search version of the graphical ontology search will not take into consideration the type of relations between the search concepts (with the exception of numbers), hence the “simple search”. Since the simple search does not take these relation types into consideration, this type of search technique can also be carried out on the WWW. Paragraph 5.5 explains the advanced search scenario, where these relation types are taken into consideration. Basically there are two possible scenarios. Probably the most user-friendly one is guided search, which will be shown first, to get accustomed to the concept of graphical searching. Guided search in this case means that the user will be able to type a query in plain natural language, and this will be automatically transformed to a graphical ontology, which on its turn can be edited to better suit the searchers needs. The second option is building a graphical ontology from scratch, which of course requires some familiarity with the concept before one can effectively make use of the power. Either of the two options already have a huge benefit compared to the search described in paragraph 3.1, that is context. Since words are related to each other and thereby placed in context, references to for example books should be eliminated in case they do not exactly describe what we are looking for.

5.4.1 Guided search As shown in paragraph 5.1, tools for making graphical ontologies already exist. However, they are not perfect, just like the beginning graphical searcher needs a hand to get started, these programs need a human hand in order to perform better [DFH03]. The main idea with guided search is that a user simply types a sentence or perhaps a whole paragraph or document to base the graphical query on. GODE then runs this piece of text through OntoExtract and represents the graphical output (it will automatically process the XML output from OntoExtract and run it through the spring embedder algorithm). Now, the user can take this graphical ontology as a starting point and add, remove, edit and/or move the nodes38. With today’s potential building blocks for GODE, the result for the input “I am looking for the names of the kings of Norway in the time period 1850 to 1950.” is as shown in Figure 7 (see Appendix C for intermediate XML results).

38 Note that the CCA viewer only visualises the graph and does not give the user the possibility to add, remove, edit and/or move the nodes. Using a graph editor instead of the CCA viewer gives the user the possibilty to execute the mentioned tasks.



Figure 7: CCA view of kings of Norway example (with GODE canvas)

We can see right away that this alone, without human interference, will most likely not lead to any good results as the integer representation of the time period disappeared completely. The reason for this is that the author of this paper has used a version of the natural language processing core of OntoExtract that does not yet support number handling (later versions do have this capability). Other than that, ’kings’ turned into ‘king’, this because the natural language processing core uses state-of-the-art truncation and stemming algorithms (internally the original concept is always kept). Collocations (“time_period” and “king_norway”) have higher information value than single concepts, and therefore they have a more central position. The collocations are also split up into single concepts, which, for such a small ontology this is not necessary, since the single concepts are not used in combination with other concepts. For example, if we, by using the same query, also wanted to know who the king of Sweden was in this period, king would not only be connected to “king_norway", but also to a concept named “king_sweden”, which in its turn would be connected to the concept “sweden”. As mentioned, the CCA viewer can only display the nodes, and no information can be altered. Therefore we will continue representing the ontologies in a different way, namely with MS Visio drawings. Figure 8 shows an edited version of Figure 7, superfluous information is removed (an inference engine can see for itself that a collocation consists of two words) and the numbers are reintroduced in the query. All concepts are equally important and therefore they are placed with equal length/strength edges. Even though concepts are now somehow semantically related to each other, the search algorithm needs extra information about the relation of the numbers. The lines between the concepts depict only that there is some sort of relation between these concepts, not which kind of relation it is (read more about relation



types in paragraph 5.5). For concepts in the simple search scenario this is not the intention either. There are two exceptions to this, however: negations and numbers. Edges between two concepts can have a negation value, which simply means that these two concepts are not related to each other. Negations serve as a very strong ambiguity eliminator. Numbers are the second case; edges between two numerical values can indicate a range value. To prevent the search algorithm from assuming that numbers are meant explicitly, we must indicate that it is a range, which can be done by creating a ‘between’ property for this particular edge.

Figure 8: Edited guided search example

The making of an ontology can be simplified when a direct connection to a semantic knowledge base is present. Upon typing a concept, the program will supply semantically related concept. The user can then select this word to become part of the ontology. Let’s assume Figure 7 as the knowledge base. It includes the following concepts:

- “king_norway”, related to “king”, “norway” and “time_period” - “king”, related to “king_norway” - “norway”, related to “king_norway” - “time_period”, related to “king_norway”, “time” and “period” - “time”, related to “time_period” - “period”, related to “time_period”

When a user types “time_period” as the first concepts, the following list of alternatives will be represented:

- “time” - “period” - “king_norway”

Now, the user can simply pick the concepts he/she thinks are necessary for the search and arrange these concepts. If a concept is missing, the user can simply define a new concept.



5.4.2 Building your own graphical ontology Building your own graphical ontology from scratch is not as simple as typing a couple of keywords and get presented a list with results. First, begin by making nodes for the words/concepts that you normally would use in any ‘normal’ search-engine. Where applicable, connect these concepts with one other by adding an edge. Then decide where to place the concepts in relation to each other. Bear in mind the spring embedder approach to guide you through the placement. Strong links are strong edges and therefore placed close to each other, weaker related edges are placed further away from each other. When all concepts have equal distance between them (only possible in very small ontologies), then each concept is equally important. A semantic knowledge base, as mentioned in paragraph 5.4.1 can also here help the user pick concepts faster and (perhaps) more accurately. As an example we can use the following question: Are Norwegian royalty descendants of the French emperor Napoleon? We need to start by filtering out the important concepts, as we tend to do when we want to make a normal keyword-based search. The main concepts here are: “Norwegian”, “royalty”, “descendants”, “Napoleon” and possibly also “French” and “emperor”. But, the instead of using ‘loose’ concepts, it could be nice to combine the collocations, hence we get this list: “Norwegian royalty”, “descendants” and “French emperor Napoleon”. Three main concepts are easier to work with and define what we are looking for much more specifically. Now, how to connect these concepts? Connecting these 3 concepts in the un-weighted approach as shown in Figure 9 includes a hazard; the query could be interpreted the other way around (i.e. asking if Napoleon is a descendant of Norwegian royalty). To prevent this, we can use the weighted edge option of the advanced search. By creating a one-way weighted edge (as indicated by the arrow in Figure 10) with a property “descendants”, this direction problem is eliminated. For both options, the collocations will automatically be “unfolded” if the need arises; what this might look like is shown in Figure 11.

Figure 9: Graphical ontology example: Napoleon

Figure 10: Advanced graphical ontology example: Napoleon



Figure 11: Graphical ontology with "unfolded" collocations (with GODE canvas)

5.4.2.1 Ontology checking When you build your own graphical ontology without any guidance, you may be in doubt of the quality of your ontology. Therefore it would be very nice to get some feedback about which part of the ontology should be adjusted in case the result page is empty or contains irrelevant results. The ontology checking can be based upon either a semantic knowledge base, or based on the results after a search. These ideas will only reflect Semantic Web searches, and not WWW searches39. Some general advice regarding the graphical ontology can be given before running a query or evaluation based upon a knowledge base. This advice can take into consideration (this is a non-exhaustive list, partly inspired by information found in [NM01]):

- Too few concepts. This makes the search too general. - Too many concepts placed too closely to each other. Makes it hard to focus on

which concepts should have the highest weight in the search. - More than one numerical range. When wrongly related to concepts this can

cause trouble. - Repeated concepts. When a concept is mentioned once, it can always be

related to other concepts in some way. - Incorrectly spelled concepts. A simple spelling checker can supply

alternatives. - Wrong collocation concepts. A combination of words that can never form a

collocation will never be a concept.

39 See paragraph 5.4.3 for information on the application areas of the graphical search.



When the ontology is checked after running a search on The Semantic Web, information can easily be gathered based upon the results. The result set may for example reveal that a concept is never related to a concept that the user chose to relate it to, meaning that one can assume this relation does not exist. Checking based upon a semantic knowledge base includes a potential extra source of error in case the knowledge base is rather small. Relations assumed by the user which are not present in the knowledge base might exist in large numbers on The Semantic Web, yet these relations will be marked as a potential problem, meaning the ontology is evaluated wrongly.

5.4.3 Application area of simple graphical ontologies Simple graphical search, with or without guidance, can be used for searching both the WWW, as well as The Semantic Web. Simple graphical search will, unlike advanced graphical search, not unleash the full power of The Semantic Web, since valuable relation information might be missing. When using simple graphical search for the WWW, probably the best result will be achieved using a two-step method. First, translate relations to advanced queries for WWW search engines (as mentioned in paragraph 3.2), and then run this result set through CORPORUM technology to make sure the concepts are semantically related instead of simply being present on the same web page. When searching the WWW, unfortunately, any number range information will get lost. The first of the two steps will consist of creating a Boolean query out of the concepts that are contained in the main concept ring40. Concepts in the second ring, as well as on the remainder of the canvas are not used at all in this iteration. All concepts in the main concept ring that are connected with an edge, will be bound by the AND parameter. Negated edges will get the negation tag valid for the chosen search engine. If we take Figure 12 as an example, the Boolean query for the first iteration will be: “time period” AND “king Norway”.

40 See Figure 6: Example canvas



Figure 12: Example of graphical ontology on canvas

A user-defined number of results that are presented by the search engine can then serve as a basis for further processing by CORPORUM technology. These result pages will be loaded into CORPORUM technology, which extracts relevant information from this page and compares it to the original graphical ontology41. The results will be re-arranged according to the semantic relevancy. In case the domain is already known (for example, one is looking for course information at some university), the first step can be omitted and CORPORUM can crawl through the domain directly and compare each page directly with the graphical ontology. This will take some more time, since the web pages are not pre-indexed. However, results will be more relevant and reliable. When translating to Semantic Web queries, concept relation strength will be calculated based on the length of the edges and the placement of concepts in relation to each other. Relation strength will be equal from concept A to B and B to A. This is valid for all concepts except the ones placed on the square part of the canvas (the “on hold” part). More on translating graphical queries to Semantic Web queries can be found in chapter 6. Graphical ontologies can also be used as an interest model for CORPORUM Summarizer. Interest models serve as background knowledge for creating more relevant summaries. At present, it is possible to create a summary of a text, based on an interest profile written in natural language. The semantic information of the text is extracted by CORPORUM and then used as a background for deciding which sentences have the highest information value (based on the number of relevant concepts in a sentence).

41 See paragraph 3.3 for more information about CORPORUM technology



So, instead of having CORPORUM parse the natural langue, one can define a (re-useable) graphical interest model. Especially for news items it is nice to use interest models to make sure the summary of the news items suit the persons needs in the best possible way. Furthermore, simple graphical ontologies can be used as a reader’s aid. Instead of reading a summary of a paper, one can simply take a look at the ontology and judge by the core concepts and their relations if an article is worth reading. Of course, these ontologies should be larger than the example provided in this paragraph, a reasonable minimum amount is approximately 15 concepts. The simple graphical ontology can be produced either manually or by the steps described in the beginning if this chapter.

5.4.4 Intended audience for simple graphical search Simple graphical search is intended for users who want to become familiar with the strength of The Semantic Web, as well as users who want to get more relevant results on the WWW. Little a-priori knowledge is needed to start using simple graphical search, however it is necessary to have some sense of logic and language. In general users should experience some parallels with the keyword search they are used to, especially when using the guided search.



5.5 Advanced search It already became obvious that the simple search alternative is meant for the general, non-specialised audience; it lacks the possibility to make full use of the semantic information provided on The Semantic Web. Hence, an advanced search alternative that will unleash the full potential of The Semantic Web is presented in this paragraph. The advanced search will include bi-directional relation strengths and typed edges (e.g. negation and range) to indicate the type of relation between two concepts and much more. The environment for advanced search will be much like the simple search, yet many more options will be available. Edges can have the following properties or types: Bi-directional relation strength As mentioned before, concepts are usually not equally related to each other. For example, a king is more related to Norway than Norway to a king; The same goes for a whale, which is more related to an ocean than an ocean to a whale. Negation An edge can indicate that a concept is not related to another concept. This is a much stronger indicator than simply not relating these two concepts to each other. Range This property can only be used for edges that connect at least one numerical value or date. A range can be either one of the following:

- X is larger than X: X > Y - X is larger or equal than X: X >= Y - X is smaller than X: X < Y - X is smaller or equal to X: X <= Y - Between X and Y: X..Y

Some restrictions apply when using these properties; please refer to the table below for an overview. X Possible properties Y Numerical value “>”, “>=”, “<”, “<=”, “..” Numerical value Numerical value “>”, “>=”, “<”, “<=” Date Numerical value “>”, “>=”, “<”, “<=” Concept Date “>”, “>=”, “<”, “<=” Numerical value Date “>”, “>=”, “<”, “<=”, “..” Date Date “>”, “>=”, “<”, “<=” Concept Concept “>”, “>=”, “<”, “<=” Numerical value Concept “>”, “>=”, “<”, “<=” Date Concept Void Concept



Synonym Synonyms are concepts that have the same meaning or nearly the same meaning. One would typically use the synonym property to indicate background knowledge. For example, making sure that a word that can be spelled in two ways (e.g. “aluminum” and “aluminium”) is taken into consideration when the ontology is being matched. Meronym Meronyms indicate that a concept is a part or a member of another concept. E.g. “a steering wheel is a meronym of a car”. X is a meronym of Y if X is a part of Y. Holonym A holonym indicated the whole of which the meronym names a part. It is in other words the opposite of meronym. E.g. “a car is the holonym of a steering wheel” Y is a holonym of X if X is a part of Y. Hypernym (super class) The generic term used to designate a whole class of specific instances. E.g. “Vehicle is a hypernym of Mercedes_01” Y is a hypernym of X if X is a (kind of) Y. Hyponym (sub class) The specific term used to designate a member of a class, it is the opposite of hypernym. E.g. “Mercedes_01 is a hyponym of vehicle” X is a hyponym of Y if X is a (kind of) Y. Also with the advanced search, a type of guidance can be built in. The guided search as mentioned in paragraph 5.4.1, can also be used to get a kick-start with advanced search. However, only a relation between the concepts is established, the user must define the type of relation between these concepts. It is not always easy to express the correct meaning of a concept; WordNet can assist users by supplying all known meanings of the concepts. The user then simply picks the correct meaning and adds this as a relation to the concept: disambiguation in practice! WordNet can also assist in finding the correct relation type between concepts, or carry out a search on e.g. all meronyms of the concept “car”. Note however that WordNet is being continuously worked on, and it is aiming at providing lexical references for all known words, yet it will take time before this ultimate goal is reached.



5.5.1 Application area of advanced graphical ontologies It would be rather pointless to invest a lot of time creating nice ontologies with all sorts of advanced relations and to simply strip all this information in order to apply it to a WWW search. Therefore, advanced graphical search is solely intended for use on The Semantic Web or in corporate situations where resources are semantically annotated. The first application method for advanced graphical ontologies is searching on The Semantic Web. With all advanced properties on edges it should be relatively easy to find relevant information on a topic. Methods for translating graphical ontologies to Semantic Web queries are mentioned in chapter 6 and will not be discussed in detail in this paragraph. Large(r) advanced graphical ontologies can also be used for knowledge management. A large corporation can for example define an ontology for each business unit, work process and the skills of each employee. This way, skills management can be put in practice. The human resource department will be able to find employees internally to fill a vacancy, instead of sending out a job description. The whole can also identify skill gaps in the organisation, enabling effective hiring of people. Furthermore, these ontologies can be very useful for finding relevant documentation on the Semantic Web or on the (semantically annotated) corporate intranet. The problem with information is that it is often available, yet not visible. By making information visible, time can be saved and employees can work more effectively. Also, when an ontology is defined for each business unit, there will no longer be ambiguity between concepts that are the same in writing, yet different in meaning. Take for example a medium sized account and advisor firm, large enough to have it’s own IT department. For accountants, advisors and/or lawyers42, a whole different vocabulary is used than for the IT department. Take for example “to implement something”, for an employee in the IT department it means something different than for a lawyer or account. When the background of the employee is know, it can be used as background knowledge for searching, and the concept “implement” is automatically enriched by the knowledge that this employee is interested in the IT point of view on this word. In other words, by using advanced graphical ontologies to define the context of a person once (much like the interest model in CORPORUM Summarizer), one can already enable better result sets, as ambiguous words are already put in context.

42 Probably the vocabulary of these three groups itself is diverse enough to define an ontology for every single one, but this is not the point in this case.



5.5.2 Intended audience for advanced graphical search Advanced graphical search can be used by people that became familiar to the concept of graphical searching, and simply want even better results. Knowledge engineers and medium or large corporations with a lot of procedures and business units are other candidates. As mentioned, a human resource department in a company could benefit greatly from having the possibility to query a semantic database43 that contains an ontology describing each employee.

5.6 Possible traps Simple graphical search still has the problem that it does not support relations like is-a, has-a, etc., so the full power of The Semantic Web cannot be used. Implicit relation strengths44 are a challenge to calculate. Figure 8 shows 4 concepts, each of which related to each other. However, the diagonal edges are longer than the other edges, yet all edges from this example ideally have equal strength. The spring embedder algorithm would place the concepts in the same way, as this in the minimum energy situation. Now a translating algorithm could take the wrong decision and give the diagonally connected concepts a lower strength, which in turn could lead to unwanted results. Since this example is quite small, the effect wouldn’t have much impact, yet larger ontologies may suffer from eventual miscalculations. The advanced option can be a real challenge to master, as many different ways of representing an ontology are possible. It may take quite a bit of training in order to be able to create advanced graphical ontologies.

5.6.1 Invested time vs. Relevance of results It has already been mentioned that building your own graphical ontology may be a time demanding and complex task, so some questions arise:

• Is the effort of investing time to build a graphical ontology worth the trouble? • Are the results that will be provided really all relevant? • Is it faster to build a graphical ontology and skim through relevant results than

it is to sifting through irrelevant results of another types of search? • Where is the break-even point?

Before the information retrieval by means of graphical ontologies that are presented in this thesis are available in the form of a computer program, it is impossible to answer these questions conclusively. It is therefore our recommendation that future work in this direction evaluates these questions, given a concrete instance of such a graphical ontology tool that we have demonstrated the possibility of implementing. The evaluation should be based on a set of test cases to see to what extent we can answer the above questions positively, including where the break-even point is.

43 Knowledge engineers can create this semantic database by creating graphical ontologies. The semantic relations of all concepts are then stored in a semantic database, for example the Sesame database [BKH01]. 44 Meaning strength based on placement of concepts, rather than an explicitly mention relation between two concepts.



These test cases can be really quite straightforward, give two groups of people an equal amount of training in how to search using graphical ontologies, give both groups two different sets of assignments and let them find the solution to the assignments using both traditional search engines, as well as by means of graphical ontology search. Take the time of the effort, from the reading of the assignment until the results is given, and then make the groups swap the assignments and start over again, this to have a control factor. To make it a bit more realistic, since people usually want to get their information from more than one source to make sure it is true, perhaps the first 5 to 10 relevant results need to be presented. This should be easy (if the technology works as well as assumed) for the graphical search, as the all hits should be relevant. For the traditional type of search, time will get lost in sifting through the irrelevant results. Time will tell what the most relevant approach is.

5.7 Summary The Graphical Ontology Designer Environment is meant to give the users that use today’s search-engines a possibility to draw full use of the Semantic Web as well. By gently introducing the concept by means of a guided search, where users can type a natural language text of which a graphical ontology is created, people get the opportunity to explore the Semantic Web with a low threshold. In a later phase users can construct their own graphical ontologies from scratch and run it through an ontology-checking algorithm to make sure the ontology is of good quality and does not contain any contradictions or “impossible” situations. And as soon as the users are really up to it, they can start using the advanced options of the GODE and define relation types between the search concepts. All this should have as a result that the searcher only would see relevant results in the result list, something that is of course heavily dependant on the quality of the ontology. Yet, will this new approach be faster than existing approaches? And what do people prefer, invest little time in creating a query and spending time finding the relevant hits from the results, or are they more likely to invest more time in creating a graphical query to eliminate the frustration of sifting through irrelevant results? Unfortunately these questions cannot be answered before the GODE is in place.



6 Using graphical ontologies as a plug-in for existing search technologies

In the previous chapter, an introduction has been given about ontologies and how to use these for searching on both the WWW and The Semantic Web. This chapter will contain some fairly simple algorithms as an example, which could be used to put the GODE technology in practice on current and future systems.

6.1 CORPORUM CORPORUM Knowledge Server assumes plain natural language as a basis for initiating an information extraction effort, and creates an ontology out of this on the background. From this ontology a maximum of the 10 concepts with the highest information value are displayed. The semantic relation between these concepts is not at all visualised. If one of these concepts turns out to be “wrong”, the user has to re-write (a part of) the original query and see if the unwanted/incorrect concept disappeared. Since CORPORUM Knowledge Server is meant for searching the WWW, we assume the simple graphical search alternative in this paragraph. Using the advanced alternative would be a waste of effort, as the relation types are not supported on the WWW. We suggest creating a plug-in for CORPORUM Knowledge Server that would make it possible to use graphical ontologies as input. Two different scenarios are possible, the first one resembles the guided search, the second one resembles simple search. In the first case, users can define an agent by writing a piece of text, just as they are used to in the current version, yet instead of displaying the concepts with the highest information value, we can view an editable graphical ontology. If editing is necessary, the user can edit the ontology according to the needs, and use this as the basis for the search. The second option is to skip the text writing part and start off by building a graphical ontology from scratch to base the search on.



6.2 RQL Simple graphical search as well as advanced graphical search can be transformed to RQL queries. The effect of the simple graphical search may be somewhat limited since relations between concepts only indicate that a relation exists, not what type this relation is. This however is already a big step forward compared to the WWW where the only relation between two concepts is that they exist on the same web page or in the same document.

6.2.1 Simple graphical search The graphical ontology for kings of Norway example can be expressed as shown in Figure 8 (for convenience, Figure 13 is a copy of Figure 8). Relations exist between all concepts, and the property of the edge between 1850 and 1950 is “between” to indicate a time period between 1850 and 1950, instead of explicitly stating 1850 and 1950 as separate concepts. Since (bi-directional) relation strength of the edges is not part of simple graphical search, these must be calculated. As mentioned in paragraph 5.6, this can be a challenge when relation strength is not mentioned on the edges themselves. We must take the shortest edge in the ontology and assume this edge has the strongest relation and assign length 1 to it. From this, we can derive that the diagonal edges are √2 in length, meaning they have a weaker relation. Or do they? In this specific case it is not possible to say for sure that the relation strength between the diagonal edges is not equal to the horizontal and vertical edges, since even if the diagonal edges have the same strength as the others, the spring embedder algorithm would find its minimum tension in this position.

Figure 13: Copy of Figure 8: Edited guided search example



To calculate the relation strength, divide 1 by the edge length and multiply by 100. This way, for this ontology, the strength is as follows: “time_period”, “1850” = 100% “time_period”, “1950” = 100% “time_period”, “king_norway” = 70,7% “king_norway”, “1850” = 100% “king_norway”, “1950” = 100% “king_norway”, “time_period” = 70,7% “1850”, “time_period” = 100% “1850”, “king_norway” = 100% “1850”, “1950” = 70,7% “1950”, “time_period” = 100% “1950”, “king_norway” = 100% “1950”, “1850” = 70,7% The relations that do not have a specific property on an edge can make use of the definition that some RDF Schemas have. They define relation strength in the form “weaklyRelatedTo”, “relatedTo” and “stronglyRelatedTo”, so we have to decide which strength percentages match these definitions. We must find out if the concepts that are mentioned in the graphical ontology exist in the repository. To do this, we can simply use each concept as a query45 and find out if it returns something. If so, we know that this concept exists and we can continue with it later on. If not, and the concept is a collocation, we automatically break up the collocation in singular concepts, “king_norway” will become “king” and “norway”, and try again. In case there are still no hits, we can try running the word through WordNet in order to find out about synonyms. After confirming that the concepts exist, we can continue by finding out how these concepts are related to each other in the repository. For the property “between”, which lies between “1850” and “1950”, we can use the operators “>” and “<”. Now, there is a problem, as we did not define what were looking for, the query can be interpreted as “Did Norway have a king in the time period 1850-1950”. This is not what we mean, we want the names of the kings. So, here we identified a problem in the query, as it does not state what we are looking for. By skimming through the attributes that are related to “king_norway”, we will most likely find the attribute “firstName”. By selecting this, we put our query back in context and the remainder of the query can be converted to RQL. Exactly how to do this can be considered part of further work. However, it may be noticed that a user has to look through an intermediate list to find out which answer we are looking for, and may think: “here we go again, browsing different options”. Yet, this approach differs quite a lot from the Internet search-engine result lists, as the information is semantically related to the concept, rather than simply a word on a page or in a document. By selecting an attribute, we actually narrow down the result set quite a lot, leaving more relevant results.

45 See Table 5 query 3 and 5.



6.2.2 Advanced graphical search The conversion of advanced graphical search to RQL will to a large degree resemble the conversion of the simple graphical search to RQL. There are however some differences, the most important one being the degree of complexity of the edge properties or types. The complexity of the edges should make it more straightforward to create an RQL query out of it. Bi-directional relation strength values eliminate the need to calculate strengths, making the RQL statement more precise.

6.3 Summary GODE does not have to be a stand-alone tool, it can also server as a plug-in in existing search techniques. This chapter gives some ideas on how searching with graphical ontologies can be used in combination with existing search techniques. It does not describe these methods in detail, as this chapter is not meant to give a solution as to how to implement this, rather just some ideas on how people could benefit from this technology while using various types of inference engines. As long as the information extraction tool uses some type of semantics, it should be possible to create a plug-in for the technology, so that users can search by using graphical ontologies.



7 Conclusions and further work New technology such as the Semantic Web is emerging on the Internet, yet no search-engines that are user-friendly enough for beginners have been developed for this new technology. This thesis presents a roadmap for building a Graphical Ontology Designer Environment, an environment which gives users the ability to execute queries that would be (near) impossible to carry out using today’s search-engine techniques. Today, some tools that can serve as building blocks for the GODE are already available. By putting these tools together and creating some additional features, it is possible to create the GODE. OntoExtract can serve as the basis for the guided search alternative. For displaying intermediate results, the CCA viewer can be used, however a suitable graph editor needs to be found or created to be able to edit the graph. In addition to this, algorithms for ontology checking can be put in place to give the user some feedback on the quality of the ontology. By implementing the GODE as a plug-in for existing technology such as CORPORUM Knowledge Server, users can already get accustomed to the new way of searching to get ready for the next step: the Semantic Web. When the Semantic Web is becoming more mature, the GODE plug-in for RQL will make sure the Semantic Web is as available for the audience as the WWW is today.



8 References Note: The correct functioning of the specified URLs cannot be guaranteed! [Ale+00] S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, K.

Tolle, B. Amann, I. Fundulaki, M. Scholl, and A-M. Vercoustre, “Managing rdf metadata for community webs”, in Proc. of the 2nd International Workshop on The World Wide Web and Conceptual Modelling (WCM'2000), Salt Lake City, U. US, October 2000.

[BJ00] Bremdal, B and Johansen, F., “CORPORUM technology and

applications”, White paper, CognIT a.s, 2000 [BG03] Brickley, Dan, Guha, R. V., “RDF Vocabulary Description Language

1.0: RDF Schema”, W3C Working Draft 23 January 2003 See http://www.w3.org/TR/rdf-schema/

[BHL01] Berners-Lee, Tim, Hendler, James and Lassila, Ora, “The Semantic

Web”, Scientific American, May 2001 issue. [BK01] Broekstra, Jeen and Kampman, Arjohn, “Query Language Definition

Technical Report describing the query language proposal for Sesame”, OnToKnowledge Project Deliverable 9, EU-IST On-To-Knowledge IST-1999-10132, May 2001. See http://sesame.aidministrator.nl/publications/del9.pdf

[BKH01] J. Broekstra, A. Kampman, and F. van Harmelen, ”Sesame: An

architecture for storing and querying RDF data and schema information”, in H. Lieberman D. Fensel, J. Hendler and W. Wahlster, editors, Semantics for the WWW. MIT Press, 2001.

[Chr00] Christophides, Vassilis, “Community Webs (C-Webs): Technological

Assessment and System Architecture”, C-WEB IST-1999-13479, September 2000

[Chr+00] V. Christophides, D. Plexousakis, G. Karvounarakis, and S. Alexaki,

“Declarative languages for querying portal catalogs”, in Proceedings of the DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries, pages 115--120, 2000.

[Cog02] CognIT a.s, Norway, “CORPORUM Knowledge Server Product

Information”, 2002 See http://www.cognit.no [DFH03] Davies, John, Fensel, Dieter, Harmelen, Frank van, “Towards The

Semantic Web, Ontology-Driven Knowledge Management”, ISBN 0470 84867 7, Wiley, 2003



[DiSt] Diagnostic Strategies, “Quantitative Issues in Information Retrieval”, Date unknown.

See http://www.diagnosticstrategies.com/info_retrieval.htm [DSN02a] Decker, Stefan, Sintek, Michael and Nejdl, Wolfgang, “TRIPLE: A

Logic for Reasoning with Parameterized Views over Semi-Structured Data”, 2002

[DSN02b] Decker, Stefan, Sintek, Michael and Nejdl, Wolfgang, “The Model-

Theoretic Semantics of TRIPLE”, May 2003 [Ead84] Eades, P., “A heuristic for graph drawing”, Congressus Numerantium,

42, 149-160. 1984 [EB01a] Engels, R and Bremdal, B, “Information extraction: state-of-the-art

report”, deliverable 5 of the EU 5th framework Project OnToKnowledge (IST-1999-10132), CognIT a.s, 2000

[EB01b] Engels, R and Bremdal, B, “Ontology Extraction Tool”, Deliverable 6

of the EU 5th framework Project OnToKnowledge (IST-1999-10132), CognIT a.s, 2001

[Fen+00] Fensel, D., van Harmelen, F., Klein, M., Akkermans, H., Broekstra, J.,

Fluit, C., van der Meer, J., Schnurr, H.-P., Studer, R., Hughes, J., Krohn, U., Davies, J., Engels, R., Bremdal, B., Ygge, F., Lau, T., Novotny, B., Reimer, U., and Horrocks, I., ”On-To-Knowledge: Ontology-based Tools for Knowledge Management”, In Proceedings of the eBusiness and eWork 2000 (EMMSEC2000) Conference, Madrid, Spain, 2000.

[Fen+02] Fensel, Dieter, Harmelen, Frank van, Ding, Ying, Klein, Michel,

Akkermans, Hans, Broekstra, Jeen, Kampman, Arjohn, Meer, Jos van der, Sure, York, Studer, Rudi, Krohn, Uwe, Davies, John, Engels, Robert, Iosif, Victor, Kiryakov, Atanas, Lau, Thorsten, Reimer, Ulrich, Horrocks, Ian, “On-To-Knowledge in a Nutshell”, IEEE Computer, 2002

[Gru93] Gruber, T. R., “A translation approach to portable ontologies”, in

Knowledge Acquisition, 5(2):199-220, 1993 See http://www-ksl.stanford.edu/kst/what-is-an-ontology.html and http://ksl-web.stanford.edu/KSL_Abstracts/KSL-92-71.html

[Hef03] Heflin, Jeff, “Web Ontology Language (OWL) Use Cases and

Requirements Ontology definition”, W3C, February 2003 See http://www.w3.org/TR/2003/WD-webont-req-20030203/#onto-def

[Hor+01] Horrocks, I., van Harmelen, F., Patel-Schneider, P., Berners-Lee, T.,

Brickley, D., Connoly, D., Dean, M., Decker, S., Fensel, D., Hayes, P.,



Heflin, J., Hendler, J., Lassila, O., McGuin-ness, D., and Stein, L. A, “DAML+OIL”, by The Joint United States / European Union ad hoc Agent Markup Language Committee, 2001 See http://www.daml.org/2001/03/daml+oil-index.html

[KAON] KAON See http://kaon.semanticweb.org/ [Kar+00a] Karvounarakis, G., Christophides, V., Plexousakis, D., and Alexaki, S.,

“Querying community web portals. Technical report, Institute of Computer Science, FORTH, Heraklion, Greece”, 2000. See http://www.ics.forth.gr/proj/isst/RDF/RQL/rql.pdf.

[Kar+00b] Karvounarakis, Greg et al., ”Querying Semistructured (Meta)Data and

Schemas on the Web: The case of RDF & RDFS”, 2000 [Kar+01] Karvounarakis, Greg, et. al, “RQL: A Declarative Query Language for

RDF”, November 2001. [KC] Karvounarakis, Greg and Christophides, Vassilis, “The RQL v1.5 User

Manual”, No date. See http://139.91.183.30:9090/RDF/RQL/Manual.html

[KF96] Kumar, Aruna and Fowler, Richard H. “A Spring Modeling Algorithm

to Position Nodes of an Undirected Graph in Three Dimensions”, Technical report Department of Computer Science University of Texas, 1996. See http://www.cs.panam.edu/info_vis/spr_tr.html

[KoN] “Kings of Norway”

Main site http://www.warholm.nu/Kongeno List over “All the kings of Norway”

http://www.warholm.nu/Kingnor.html [LS99] Lassila, Ora and Swick, Ralph R., “Resource Description Framework

(RDF) : Model and Syntax Specification” Recommendation World Wide Web Consortium, February 1999. See http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

[LGH02] S. Little, J. Geurts and J. Hunter. "Dynamic Generation of Intelligent

Multimedia Presentations through Semantic Inferencing", in 6th European Conference on Research and Advanced Technology for Digital Libraries, September 2002

[Lon98] “Longman Dictionary of English language and Culture”, ISBN 0 582

30203 X , Second edition 1998



[NM01] Noy, Natalya Fridman and McGuinness, Deborah L. “Ontology Development 101: A Guide to Creating Your First Ontology”, Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001.

[MSB03] Miller, Eric, Swick, Ralph, Brickley, Dan “Resource Description

Framework (RDF)”, W3C, 1997-2003 (continuously updated) See http://www.w3.org/RDF/

[Nej02] W. Nejdl. “Semantic web and peer-to-peer technologies for distributed

learning repositories”, in 17th IFIP World Computer Congress, Intelligent Information Processing /IIP-2002

[One02] “Google is the most popular search engine on the web according to

OneStat.com”, press release by OneStat, April 15, 2002 See http://www.onestat.com/html/aboutus_pressbox3.html

[OWC02] OntoWeb Consortium, “Deliverable 1.3: A survey on ontology tools”,

OntoWeb, IST-2001-29243, May 2002 See http://www.aifb.uni-karlsruhe.de/WBS/ysu/./publications/ OntoWeb_Del_1-3.pdf

[Pep00] Pepper, Steve, “The TAO of Topic Maps: Finding the Way in the Age

of Infoglut”, June 2000 See http://www.ontopia.net/topicmaps/materials/tao.html

[Rij79] Rijsbergen, C.J. van, ”Information Retrieval”, Butterworths, London,

1979. [RLS98] Robie, Jonathan (Texcel, Inc.), Lapp, Joe (webMethods, Inc.), Schach,

David (Microsoft Corporation), “XML Query Language (XQL)”, September 1998. See http://www.w3.org/TandS/QL/QL98/pp/xql.html

[SWS03] The Semantic Web Seminar, Oslo, held Friday 17 January 2003.

Presentations available from http://semweb.cognit.com [TOP] Topicmaps.org homepage. See http://www.topicmaps.org/ [Wie03] Wienhofen, Leendert W. M., “The Semantic Web for non-scholars

A guide to the use(fullness) of a graphical ontology environment”, February 2003. See http://www.cognit.no/leendert/Spesialpensum/The_Semantic_Web _for_non-scholars_LWM_Wienhofen_Feb_2003.pdf

[WN] “WordNet, a lexical database for the English language”

See http://www.cogsci.princeton.edu/~wn/



[Wei97] Weiss, Scott, “Glossary for Information Retrieval”, January ‘97 See http://www.cs.jhu.edu/~weiss/glossary.html


Using Graphical Ontologies for -a- Searching The (Semantic) Web

Appendix A

Figure 14: Agent definition

Figure 15: Agent targets


Using Graphical Ontologies for -b- Searching The (Semantic) Web

Figure 16: Agent concepts

Figure 17: Agent scheduling


Using Graphical Ontologies for -c- Searching The (Semantic) Web

Appendix B 1) Find all classes in the repository Basic query: Class Complex query: select $C1 from $C1 $C1 Class Person Man Regent Woman King Queen Area Country 2) Find all properties in the repository Basic query: Property Complex query: select @P1 from @P1 @P1 Property RulerOf countryName Title Successor_nr RuledFrom RuledUntil FirstName LastName marriedWith siblingOf


Using Graphical Ontologies for -d- Searching The (Semantic) Web

3) Find all known instances of a class Basic query: Person Complex query: select $C1, $C2 from {$C1}Person{$C2} $C1 $C2 Person Man Person Regent Person Woman Person King Person Queen 4) Find all proper instances of a class Basic query: ^Person Complex query: select $C1, $C2 from {$C1}^Person{$C2} $C1 $C2 Person Man Person Regent Person Woman 5) Find all resource and target values of a property Which classes can appear as domain and range of the property rulerOf? Basic query: rulerOf Complex query: select $C1, $C2 from {$C1}rulerOf{$C2} $C1 $C2 Regent Area Regent Country King Area King Country Queen Area Queen Country


Using Graphical Ontologies for -e- Searching The (Semantic) Web

6) Find direct resource and target values of a property Basic query: ^rulerOf Complex query: select $C1, $C2 from {$C1}^rulerOf{$C2} $C1 $C2 Regent Area Regent Country King Area King Country Queen Area Queen Country 7) Find all known subclasses of a class Basic query: SubClassOf(Person) Complex query: select $C1, $C2 from {$C1}SubClassOf(OfPerson){$C2} $C1 $C2 Person Man Person Regent Person Woman Person King Person Queen 8) Find the direct subclasses of a class Basic query: SubClassOf^(Person) Complex query: select $C1, $C2 from {$C1}SubClassOf^(OfPerson){$C2} $C1 $C2 Person Man Person Regent Person Woman


Using Graphical Ontologies for -f- Searching The (Semantic) Web

9) Find all proper instances of all known subclasses Basic query: ^SubClassOf(Person) Complex query: select $C1, $C2 from {$C1}^SubClassOf(Person){$C2} $C1 $C2 Person Man Person Regent Person Woman

Using Graphical Ontologies for -g- Searching The (Semantic) Web

Appendix C

RDF output

<?xml version="1.0" encoding="ISO-8859-1"?>  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/qualifiers/1.1/" xmlns:oe="http://ontoserver.cognit.no/otk_rdf#" xmlns:ns1="#">   <rdf:Description rdf:about=""> <dc:Title xml:lang="en">CORPORUM OntoExtract</dc:Title> <dc:Creator> CMCogLib%3A 1.0.4.74</dc:Creator> <dc:Subject xml:lang="en"></dc:Subject> <dc:Description xml:lang="en">CORPORUM OntoExtract. CORPORUM OntoExtract (with the Ontology generator as a GUI), usually referred to as OntoExtract or OE, is based on the MÍMÍR core and has the capability to extract ontologies from a piece of natural language text and display it in XML or RDF (dependant on settings). The ontology generator program (which uses the OntoExtract technology) has a very straightforward interface (see Figure 3). The user can choose from different output options, 3 different XML outputs or the RDF OIL output. </dc:Description> <dc:Publisher>local workstation</dc:Publisher> <dc:Date>2003-08-17</dc:Date> <dc:Type>text</dc:Type> <dc:Format>text/plain</dc:Format> <dc:Language>en</dc:Language> </rdf:Description> 

Continued on next page

Using Graphical Ontologies for -h- Searching The (Semantic) Web

 <rdfs:Class rdf:about="http://ontoserver.cognit.no/otk_rdf#Concept" rdfs:label="Concept"> <rdfs:subClassOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> </rdfs:Class> <oe:Concept rdf:about="#Top"> <rdfs:label xml:lang="en">Top level concept</rdfs:label> <rdfs:subClassOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/> </oe:Concept> <oe:Concept rdf:about="#MISC"> <rdfs:label xml:lang="en">Untyped concepts (extracted from RelatedTo statements)</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept>  <rdf:Property rdf:about="http://ontoserver.cognit.no/otk_rdf#hasSomeProperty"> <rdfs:domain rdf:resource="#Top"/> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/> </rdf:Property> <rdf:Property rdf:about="http://ontoserver.cognit.no/otk_rdf#isAbout"> <rdfs:domain rdf:resource="#Page"/> <rdfs:range rdf:resource="#Top"/> </rdf:Property>    <oe:Concept rdf:about="#piece"> <rdfs:label xml:lang="en">piece</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#language"> <rdfs:label xml:lang="en">language</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#text"> <rdfs:label xml:lang="en">text</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#display"> <rdfs:label xml:lang="en">display</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept>


Using Graphical Ontologies for -i- Searching The (Semantic) Web

<oe:Concept rdf:about="#core"> <rdfs:label xml:lang="en">core</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#ontology"> <rdfs:label xml:lang="en">ontology</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#generator"> <rdfs:label xml:lang="en">generator</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#interface"> <rdfs:label xml:lang="en">interface</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#technology"> <rdfs:label xml:lang="en">technology</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#program"> <rdfs:label xml:lang="en">program</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#view"> <rdfs:label xml:lang="en">view</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#pane"> <rdfs:label xml:lang="en">pane</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#output"> <rdfs:label xml:lang="en">output</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#input"> <rdfs:label xml:lang="en">input</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#viewer"> <rdfs:label xml:lang="en">viewer</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#type"> <rdfs:label xml:lang="en">type</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept>


Using Graphical Ontologies for -j- Searching The (Semantic) Web

<oe:Concept rdf:about="#option"> <rdfs:label xml:lang="en">option</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#author"> <rdfs:label xml:lang="en">author</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#background"> <rdfs:label xml:lang="en">background</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#knowledge"> <rdfs:label xml:lang="en">knowledge</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#meronym"> <rdfs:label xml:lang="en">meronym</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#hypernym"> <rdfs:label xml:lang="en">hypernym</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#noun"> <rdfs:label xml:lang="en">noun</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#description"> <rdfs:label xml:lang="en">description</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#capability"> <rdfs:label xml:lang="en">capability</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#setting"> <rdfs:label xml:lang="en">setting</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#figure"> <rdfs:label xml:lang="en">figure</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#use"> <rdfs:label xml:lang="en">use</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept>


Using Graphical Ontologies for -k- Searching The (Semantic) Web

<oe:Concept rdf:about="#version"> <rdfs:label xml:lang="en">version</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#load"> <rdfs:label xml:lang="en">load</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#user"> <rdfs:label xml:lang="en">user</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#standard"> <rdfs:label xml:lang="en">standard</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#relations"> <rdfs:label xml:lang="en">relations</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#wordnet"> <rdfs:label xml:lang="en">wordnet</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#resource"> <rdfs:label xml:lang="en">resource</rdfs:label> <rdfs:subClassOf rdf:resource="#Top"/> </oe:Concept> <oe:Concept rdf:about="#same_noun"> <rdfs:label xml:lang="en">same noun</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#background_knowledge"> <rdfs:label xml:lang="en">background knowledge</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#different_description"> <rdfs:label xml:lang="en">different description</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#RDF_OIL"> <rdfs:label xml:lang="en">RDF OIL</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#GUI"> <rdfs:label xml:lang="en">GUI</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept>


Using Graphical Ontologies for -l- Searching The (Semantic) Web

<oe:Concept rdf:about="#OntoExtract"> <rdfs:label xml:lang="en">OntoExtract</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#OE"> <rdfs:label xml:lang="en">OE</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#XML"> <rdfs:label xml:lang="en">XML</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#piece_natural_language_text_display"> <rdfs:label xml:lang="en">piece natural language text display</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#MÍMÍR_core"> <rdfs:label xml:lang="en">MÍMÍR core</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#ontology_generator"> <rdfs:label xml:lang="en">ontology generator</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#CORPORUM_OntoExtract"> <rdfs:label xml:lang="en">CORPORUM OntoExtract</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#CCA_view_pane"> <rdfs:label xml:lang="en">CCA view pane</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#output_pane"> <rdfs:label xml:lang="en">output pane</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#input_pane"> <rdfs:label xml:lang="en">input pane</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#OTK"> <rdfs:label xml:lang="en">OTK</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#output_type"> <rdfs:label xml:lang="en">output type</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept>


Using Graphical Ontologies for -m- Searching The (Semantic) Web

<oe:Concept rdf:about="#3_different_XML_output"> <rdfs:label xml:lang="en">3 different XML output</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#RDF"> <rdfs:label xml:lang="en">RDF</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#author_CCA_viewer"> <rdfs:label xml:lang="en">author CCA viewer</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#OntoExtract_technology"> <rdfs:label xml:lang="en">OntoExtract technology</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#RDF_OIL_output"> <rdfs:label xml:lang="en">RDF OIL output</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#output_option"> <rdfs:label xml:lang="en">output option</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#ontology_output"> <rdfs:label xml:lang="en">ontology output</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#ontology_generator_program"> <rdfs:label xml:lang="en">ontology generator program</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#different_output_option"> <rdfs:label xml:lang="en">different output option</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#OIL"> <rdfs:label xml:lang="en">OIL</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#RO"> <rdfs:label xml:lang="en">RO</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#CORPORUM"> <rdfs:label xml:lang="en">CORPORUM</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept>


Using Graphical Ontologies for -n- Searching The (Semantic) Web

<oe:Concept rdf:about="#CO"> <rdfs:label xml:lang="en">CO</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#view_pane"> <rdfs:label xml:lang="en">view pane</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#CCA_pane"> <rdfs:label xml:lang="en">CCA pane</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#meronym_hypernym"> <rdfs:label xml:lang="en">meronym hypernym</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#type_text"> <rdfs:label xml:lang="en">type text</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#original_CCA_viewer"> <rdfs:label xml:lang="en">original CCA viewer</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#natural"> <rdfs:label xml:lang="en">natural</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#MÍMÍR"> <rdfs:label xml:lang="en">MÍMÍR</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#3_different"> <rdfs:label xml:lang="en">3 different</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#CCA"> <rdfs:label xml:lang="en">CCA</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#very_straightforward_interface"> <rdfs:label xml:lang="en">very straightforward interface</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#original"> <rdfs:label xml:lang="en">original</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept>


Using Graphical Ontologies for -o- Searching The (Semantic) Web

<oe:Concept rdf:about="#straightforward_interface"> <rdfs:label xml:lang="en">straightforward interface</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#different"> <rdfs:label xml:lang="en">different</rdfs:label> <rdfs:subClassOf rdf:resource="#MISC"/> </oe:Concept> <oe:Concept rdf:about="#piece_natural_language_text_display"> <rdfs:label xml:lang="en">piece natural language text display</rdfs:label> <rdfs:subClassOf rdf:resource="#display"/> </oe:Concept> <rdf:Description rdf:about=""> <oe:isAbout> <ns1:interface rdf:about="#interface_1"> <oe:hasSomeProperty xml:lang="en">straightforward</oe:hasSomeProperty> <oe:hasSomeProperty xml:lang="en">very</oe:hasSomeProperty> </ns1:interface> </oe:isAbout> </rdf:Description> <oe:Concept rdf:about="#ontology_generator_program"> <rdfs:label xml:lang="en">ontology generator program</rdfs:label> <rdfs:subClassOf rdf:resource="#program"/> </oe:Concept> <rdf:Description rdf:about=""> <oe:isAbout> <ns1:interface rdf:about="#interface_2"> <oe:hasSomeProperty xml:lang="en">straightforward</oe:hasSomeProperty> </ns1:interface> </oe:isAbout> </rdf:Description> <oe:Concept rdf:about="#CCA_view_pane"> <rdfs:label xml:lang="en">CCA view pane</rdfs:label> <rdfs:subClassOf rdf:resource="#pane"/> </oe:Concept> <oe:Concept rdf:about="#output_option"> <rdfs:label xml:lang="en">output option</rdfs:label> <rdfs:subClassOf rdf:resource="#option"/> </oe:Concept> <rdf:Description rdf:about=""> <oe:isAbout> <ns1:output_option rdf:about="#output_option_1"> <oe:hasSomeProperty xml:lang="en">different</oe:hasSomeProperty> </ns1:output_option> </oe:isAbout> </rdf:Description>

Using Graphical Ontologies for -p- Searching The (Semantic) Web

XML output

<?xml version="1.0" encoding="ISO-8859-1"?>    <!DOCTYPE CONCEPTGRAPH []> <CONCEPTGRAPH> <CONCEPTLIST> <CONCEPT>time_period</CONCEPT> <CONCEPT>time</CONCEPT> <CONCEPT>period</CONCEPT> <CONCEPT>king_norway</CONCEPT> <CONCEPT>king</CONCEPT> <CONCEPT>norway</CONCEPT> </CONCEPTLIST>

<rdf:Description rdf:about=""> <oe:isAbout> <ns1:noun rdf:about="#noun_1"> <oe:hasSomeProperty xml:lang="en">same</oe:hasSomeProperty> </ns1:noun> </oe:isAbout> </rdf:Description> <rdf:Description rdf:about=""> <oe:isAbout> <ns1:description rdf:about="#description_1"> <oe:hasSomeProperty xml:lang="en">different</oe:hasSomeProperty> </ns1:description> </oe:isAbout> </rdf:Description>  </rdf:RDF>

Continued on next page


Using Graphical Ontologies for -q- Searching The (Semantic) Web

<RELATIONLIST> <RELATION> <CONCEPT>time_period</CONCEPT> <STRENGTH>0.7500</STRENGTH> <CONCEPT>time</CONCEPT> </RELATION> <RELATION> <CONCEPT>time_period</CONCEPT> <STRENGTH>0.7500</STRENGTH> <CONCEPT>period</CONCEPT> </RELATION> <RELATION> <CONCEPT>time_period</CONCEPT> <STRENGTH>0.6314</STRENGTH> <CONCEPT>king_norway</CONCEPT> </RELATION> <RELATION> <CONCEPT>time</CONCEPT> <STRENGTH>0.1000</STRENGTH> <CONCEPT>time_period</CONCEPT> </RELATION> <RELATION> <CONCEPT>period</CONCEPT> <STRENGTH>0.3000</STRENGTH> <CONCEPT>time_period</CONCEPT> </RELATION> <RELATION> <CONCEPT>king_norway</CONCEPT> <STRENGTH>0.7500</STRENGTH> <CONCEPT>king</CONCEPT> </RELATION> <RELATION> <CONCEPT>king_norway</CONCEPT> <STRENGTH>0.7500</STRENGTH> <CONCEPT>norway</CONCEPT> </RELATION> <RELATION> <CONCEPT>king_norway</CONCEPT> <STRENGTH>0.6314</STRENGTH> <CONCEPT>time_period</CONCEPT> </RELATION> <RELATION> <CONCEPT>king</CONCEPT> <STRENGTH>0.1000</STRENGTH> <CONCEPT>king_norway</CONCEPT> </RELATION> <RELATION> <CONCEPT>norway</CONCEPT> <STRENGTH>0.3000</STRENGTH> <CONCEPT>king_norway</CONCEPT> </RELATION> </RELATIONLIST> </CONCEPTGRAPH>

using graphical ontologies for searching the (semantic)...

Documents