knowledge organization ko - ergon-verlag · with the aid of the cmaptool and woped graphic...

97
Knowl. Org. 34(2007)No.4 KNOWLEDGE ORGANIZATION KO Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation Contents International Society for Knowledge Organization. 11th General Assembly 2008. Agenda ........................... 196 Articles Fulvio Mazzocchi, Melissa Tiberi, Barbara De Santis, and Paolo Plini. Relational Semantics in Thesauri: Some Remarks at Theoretical and Practical Levels........ 197 Guglielmo Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning in Professional Online Communities............................. 215 Jody L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries .................................. 227 Koraljka Golub, Thierry Hamon, and Anders Ardö. Automated Classification of Textual Documents Based on a Controlled Vocabulary in Engineering ........ 247 Book Reviews Murtha Baca, Patricia Harping, Elisa Lanzi, Linda McCrea, and Ann Whiteside (eds.). Cataloging Cultural Objects: A Guide to Describing Cultural Work and Their Images. Chicago: American Library Association, 2006. 396 p. ISBN 978-0-8389-3564-4 (pbk.) ..........................264 Patrick Lambe. Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness. Oxford: Chandos, 2007. xix, 277 p. ISBN 978-1-84334-228-1 (hbk.); 978-1-84334-227-4 (pbk.)................................................266 ISKO News ......................................................................268 Knowledge Organization Literature 34 (2007) No.4 .................................................................269 Personal Author Index 34 (2007) No.4 .................................................................282

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4

KNOWLEDGE ORGANIZATION KO Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents

International Society for Knowledge Organization. 11th General Assembly 2008. Agenda ........................... 196 Articles Fulvio Mazzocchi, Melissa Tiberi, Barbara De Santis, and Paolo Plini. Relational Semantics in Thesauri: Some Remarks at Theoretical and Practical Levels........ 197 Guglielmo Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning in Professional Online Communities............................. 215 Jody L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries .................................. 227 Koraljka Golub, Thierry Hamon, and Anders Ardö. Automated Classification of Textual Documents Based on a Controlled Vocabulary in Engineering........ 247

Book Reviews Murtha Baca, Patricia Harping, Elisa Lanzi, Linda McCrea, and Ann Whiteside (eds.). Cataloging Cultural Objects: A Guide to Describing Cultural Work and Their Images. Chicago: American Library Association, 2006. 396 p. ISBN 978-0-8389-3564-4 (pbk.) ..........................264 Patrick Lambe. Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness. Oxford: Chandos, 2007. xix, 277 p. ISBN 978-1-84334-228-1 (hbk.); 978-1-84334-227-4 (pbk.)................................................266 ISKO News ......................................................................268 Knowledge Organization Literature 34 (2007) No.4 .................................................................269 Personal Author Index 34 (2007) No.4 .................................................................282

Page 2: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4

KO KNOWLEDGE ORGANIZATION Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents pages

Mazzocchi, Fulvio, Tiberi, Melissa, De Santis, Barbara, and Plini, Paolo. Relational Semantics in Thesauri: An Over-view and Some Remarks at Theoretical and Practical Levels. Knowledge Organization, 34(4), 196-213. 39 refer-ences. ABSTRACT: A thesaurus is a controlled vocabulary de-signed to allow for effective information retrieval. It con-sists of different kinds of semantic relationships, with the aim of guiding users to the choice of the most suitable in-dex and search terms for expressing a certain concept. The relational semantics of a thesaurus deal with methods to connect terms with related meanings and are intended to enhance information recall capabilities. In this paper, fo-cused on hierarchical relations, different aspects of the re-lational semantics of thesauri, and among them the possi-bility of developing richer structures, are analyzed. Thesauri are viewed as semantic tools providing, for opera-tional purposes, the representation of the meaning of the terms. The paper stresses how theories of semantics, hold-ing different perspectives about the nature of meaning and how it is represented, affect the design of the relational semantics of thesauri. The need for tools capable of repre-senting the complexity of knowledge and of the semantics of terms as it occurs in the literature of their respective subject fields is advocated. It is underlined how this would contribute to improving the retrieval of information. To achieve this goal, even though in a preliminary manner, we explore the possibility of setting against the framework of thesaurus design the notions of language games and her-meneutic horizon. Trentin, Guglielmo. Graphic Tools for Knowledge Repre-sentation and Informal Problem-Based Learning in Pro-fessional Online Communities. Knowledge Organization, 34(4), 215-226. 24 references. ABSTRACT: The use of graphical representations is very common in information technology and engineering. Al-though these same tools could be applied effectively in other areas, they are not used because they are hardly known or are completely unheard of. This article aims to discuss the results of the experimentation carried out on graphical approaches to knowledge representation during research, analysis and problem-solving in the health care sector. The experimentation was carried out on conceptual

mapping and Petri Nets, developed collaboratively online with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research, both pertaining to the Local Health Units in Tuscany. One community is made up of head phy-sicians and health care managers whilst the other is formed by technical staff from the Department of Nutrition and Food Hygiene. It emerged from the experimentation that concept maps are considered more effective in analyzing knowledge domain related to the problem to be faced (de-scription of what it is). On the other hand, Petri Nets are more effective in studying and formalizing its possible so-lutions (description of what to do to). For the same rea-son, those involved in the experimentation have proposed the complementary rather than alternative use of the two knowledge representation methods as a support for profes-sional problem-solving. DeRidder, Jody L. The Immediate Prospects for the Ap-plication of Ontologies in Digital Libraries. Knowledge Organization, 34(4), 227-246. 53 references. ABSTRACT: The purpose, scope, usage, methodology, cross-mapping and encoding of ontologies is summarized. A snapshot of current research and development includes available tools, ontologies, and query engines, with their applications. Benefits, problems, and costs are discussed, and the feasibility and usefulness of ontologies is weighed with respect to potential and current digital library arenas. The author concludes that ontology application potentially has a huge impact within knowledge management, enter-prise integration, e-commerce, and possibly education. Outside of heavily funded domains, feasibility depends on assessment of various evolving factors, including the cur-rent tools and systems, level of adoption in the field, time and expertise available, and cost barriers. Golub, Koraljka, Hamon, Thierry, and Ardö, Anders. Automated classification of textual documents based on a controlled vocabulary in engineering. Knowledge Or-ganization, 34(4), 247-263. 33 references. ABSTRACT. Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase

Page 3: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4

KNOWLEDGE ORGANIZATION KO Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a con-trolled vocabulary, which does not require training docu-ments–instead it reuses the intellectual work put into cre-ating the controlled vocabulary. Terms from the Engineer-ing Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and en-

richment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of in-dividual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms. These contents pages may be reproduced without charge.

Page 4: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4

KO KNOWLEDGE ORGANIZATION Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

KNOWLEDGE ORGANIZATION This journal is the organ of the INTERNATIONAL SOCIETY FOR KNOWLEDGE ORGANIZATION (General Secretariat: H. Peter OHLY, Social Science Information Center, Lennestr. 30, D-53113 Bonn, Germany.

Editors Dr. Richard P. SMIRAGLIA (Editor-in-Chief), Palmer School of Library and Information Science, Long Island University, 720 Northern Blvd., Brookville NY 11548 USA. Email: [email protected]

Dr. Clément ARSENAULT (Book Review Editor), École de bi-bliothéconomie et des sciences de l’information, Université de Montréal, C.P. 6128, succ. Centre-ville, Montréal (QC) H3C 3J7, Canada. Email: [email protected]

Dr. Ia MCILWAINE (Literature Editor), Research Fellow. School of Library, Archive & Information Studies, University College London, Gower Street, London WC1E 6BT U.K. Email: [email protected]

Dr. Nancy WILLIAMSON (Classification Research News Edi-tor), Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6 Canada. Email: [email protected]

Hanne ALBRECHTSEN, Institute of Knowledge Sharing, Bu-reauet, Slotsgade 2, 2nd floor DK-2200 Copenhagen N Denmark. Email: [email protected]

Gabriel MCKEE (Editorial Assistant), Palmer School of Library and Information Science, Long Island University.

Consulting Editors Prof. Clare BEGHTOL, Faculty of Information Studies, Univer-sity of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6, Canada. Email: [email protected]

Dr. Gerhard BUDIN, Dept. of Philosophy of Science, University of Vienna, Sensengasse 8, A-1090 Wien, Austria. Email: [email protected]

Prof. Jesús GASCÓN GARCÍA, Facultat de Biblioteconomia i Documentació, Universitat de Barcelona, C. Melcior de Palau, 140, 08014 Barcelona, Spain. Email: [email protected]

Claudio GNOLI, University of Pavia, Mathematics Department Library, via Ferrata 1, I-27100 Pavia, Italy. Email: [email protected]

Dr. Rebecca GREEN, Assistant Editor, Dewey Decimal Classifi-cation, Dewey Editorial Office, Library of Congress, Decimal Classification Division , 101 Independence Ave., S.E., Washing-ton, DC 20540-4330, USA. Email: [email protected]

Dr. Birger HJØRLAND, Royal School of Library and Informa-tion Science, Copenhagen Denmark. Email: [email protected]

Dr. Barbara H. KWASNIK, Professor, School of Information Studies, Syracuse University, Syracuse, NY 13244 USA, (315) 443-4547 voice, (315) 443-4506 fax. Email: [email protected]

Dr. Jens-Erik MAI, Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6, Canada. Email: [email protected]

Ms. Joan S. MITCHELL, Editor in Chief, Dewey Decimal Classi-fication, OCLC Online Computer Library Center, Inc., 6565 Frantz Road, Dublin, OH 43017-3395 USA. Email: [email protected]

Dr. Widad MUSTAFA el HADI, URF IDIST, Université Charles de Gaulle Lille 3, BP 149, 59653 Villeneuve D’Ascq, France

H. Peter OHLY, IZ Sozialwissenschaften, Lennestr. 30, 53113 Bonn Germany. Email: [email protected]

Dr. Hope A. OLSON, School of Information Studies, 522 Bolton Hall, University of Wisconsin-Milwaukee, Milwaukee, WI 53201 USA. Email: [email protected]

Ms. Annelise Mark PEJTERSEN, Systems Analysis Dept., Risoe National Laboratory, P.O. Box 49, DK-4000 Roskilde, Denmark

Dr. M. P. SATIJA, Guru Nanak Dev University, School of Li-brary and Information Science, Amritsar-143 005, India

Prof. Dr. J.F. (Jos) SCHREINEMAKERS, School of Sciences, Department of Mathematics and Computer Science, Section Busi-ness Informatics / Informatiekunder, Vrije Universiteit Amster-dam, De Boelelaan 1081a, U3.56, 1081 HV Amsterdam, Nether-lands. Email: [email protected]

Dr. Otto SECHSER, In der Ey 37, CH-8047 Zürich, Switzerland

Dr. Windfried SCHMITZ-ESSER, Salvatorgasse 23, 6060 Hall, Tirol, Austria.

Dr. Dagobert SOERGEL, College of Information Studies, Horn-bake Bldg. (So. Wing), Room 4105, University of Maryland, Col-lege Park, MD 20742. Email: [email protected]

Dr. Eduard R. SUKIASYAN, Vozdvizhenka 3, RU-101000, Mos-cow, Russia.

Dr. Joseph A. TENNIS, School of Library, Archival and In- formation Studies, University of British Columbia, 301 - 6190 Agronomy Road, Vancouver, BC V6T 1Z3, Canada. Email: [email protected]

Dr. Martin van der WALT, Department of Information Science, University of Stellenbosch, Private Bag X1, Stellenbosch 7602, South Africa. Email: [email protected]

Prof. Dr. Harald ZIMMERMANN, Softex, Schmollerstrasse 31, D-66111 Saarbrücken, Germany

Founded under the title International Classification in 1974 by Dr. Ingetraut Dahlberg, the founding president of ISKO. Dr. Dahl-berg served as the journal's editor from 1974 to 1997, and as its publisher (Indeks Verlag of Frankfurt) from 1981 to 1997.

Page 5: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4

KNOWLEDGE ORGANIZATION KO Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation Publisher ERGON-Verlag, Grombühlstr. 7, GER-97080 Würzburg Phone: +49 (931) 280084; FAX +49 (931) 282872 E-mail: [email protected]; http://www.ergon-verlag.de

Editor-in-chief (Editorial office) Dr. Richard P. SMIRAGLIA (Editor-in-Chief), Palmer School of Library and Information Science, Long Island University, 720 Northern Blvd., Brookville NY 11548 USA. Email: [email protected]

Instructions for Authors Manuscripts should be submitted electronically (in Word,

WordPerfect, or RTF format) in English only to the editor-in-chief and should be accompanied by an indicative abstract of 100 or 200 words. Submissions via email are preferred; submissions will also be accepted via post provided that submissions are ac-companied by a 3.5” diskette encoded in Word, WordPerfect, or RTF format.

A separate title page should include the article title and the au-thor’s name, postal address, and E-mail address, if available. Only the title of the article should appear on the first page of the text. To protect anonymity, the author’s name should not appear on the manuscript, and all references in the body of the text and in foot-notes that might identify the author to the reviewer should be re-moved and cited on a separate page. Articles that do not conform to these specifications will be returned to authors.

Criteria for acceptance will be appropriateness to the field of the journal (see Scope and Aims), taking into account the merit of the contents and presentation. The manuscript should be concise and should conform as much as possible to professional standards of English usage and grammar. Manuscripts are received with the understanding that they have not been previously published, are not being submitted for publication elsewhere, and that if the work received official sponsorship, it has been duly released for publication. Submissions are refereed, and authors will usually be notified within 6 to 10 weeks. Unless specifically requested, manuscripts and illustrations will not be returned.

The text should be structured by numbered subheadings. It should contain an Introduction, giving an overview and stating the purpose, a main body, describing in sufficient detail the materials or methods used and the results or systems developed, and a con-clusion or summary.

Reference citations within the text should have the following form: (author year). For example, (Jones 1990). Specific page numbers are optional, but preferred when applicable, e.g. (Jones 1990, 100). A citation with two authors would read (Jones & Smith, 1990); three or more authors would be: (Jones et al., 1990). When the author is mentioned in the text, only the date and op-tional page number should appear in parenthesis – e.g. According to Jones (1990), …

References should be listed alphabetically by author at the end of the article. Author names should be given as found in the sources (not abbreviated). Journal titles should not be abbreviated. Multiple citations to works by the same author should be listed chronologically and should each include the author’s name. Arti-

cles appearing in the same year should have the following format: “Jones 2005a, Jones 2005b, etc.” Issue numbers are given only when a journal volume is not through-paginated. Examples: Dahlberg, Ingetraut. 1978. A referent-oriented, analytical concept

theory for INTERCONCEPT. International classification 5: 142-51.

Howarth, Lynne C. 2003. Designing a common namespace for searching metadata-enabled knowledge repositories: an inter-national perspective. Cataloging & classification quarterly 37n1/2: 173-85.

Pogorelec, Andrej and Šauperl, Alenka. 2006. The alternative model of classification of belles-lettres in libraries. Knowledge organization 33: 204-14.

Schallier, Wouter. 2004. On the razor’s edge: between local and overall needs in knowledge organization. In McIlwaine, Ia C. ed., Knowledge organization and the global information society: Proceedings of the Eighth International ISKO Conference 13-16 July 2004 London, UK. Advances in knowledge organization 9. Würzburg: Ergon Verlag, pp. 269-74.

Smiraglia, Richard P. 2001. The nature of ‘a work’: implications for the organization of knowledge. Lanham, Md.: Scarecrow.

Smiraglia, Richard P. 2005. Instantiation: Toward a theory. In Vaughan, Liwen, ed. Data, information, and knowledge in a networked world; Annual conference of the Canadian Association for Information Science … London, Ontario, June 2-4 2005. Available http://www.cais-acsi.ca/2005proceedings.htm. Footnotes are not permitted; all narration should be included

in the text of the article. Illustrations should be kept to a necessary minimum and

should be submitted electronically when possible. Photographs (including color and half-tone) should be scanned with a mini-mum resolution of 600 dpi and saved as .tif files (Tagged Image File Format preferred). Tables and figures should be embedded within the document or, alternatively, saved as separate files with clear instructions indicating their placement in the text. Tables should contain a number and title at the top, and all columns and rows should have headings. All illustrations should be cited in the text as Figure 1, Figure 2, etc. or Table 1, Table 2, etc. Illustrations submitted in hard copy only should be marked to indicate their placement in the text.

Upon acceptance of a manuscript for publication, authors must provide a wallet-size photo and a one-paragraph biographical sketch. The photograph should be scanned with a minimum reso-lution of 600 dpi and saved as a .tif file (Tagged Image File For-mat).

Advertising Responsible for advertising: Dr. H.-J. Dietrich, ERGON-Verlag, Grombühlstr. 7, 97080 Würzburg (Germany).

© 2007 by ERGON-Verlag Dr. H.-J. Dietrich. All Rights reserved. KO is published quarterly by ERGON-Verlag. The price is € 115,00/ann. including airmail delivery.

Page 6: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4

KO KNOWLEDGE ORGANIZATION Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Scope

The more scientific data is generated in the impetuous present times, the more ordering energy needs to be expended to control these data in a retrievable fashion. With the abun-dance of knowledge now available the questions of new solu-tions to the ordering problem and thus of improved classifica-tion systems, methods and procedures have acquired unfore-seen significance. For many years now they have been the fo-cus of interest of information scientists the world over.

Until recently, the special literature relevant to classifica-tion was published in piecemeal fashion, scattered over the numerous technical journals serving the experts of the various fields such as:

philosophy and science of science science policy and science organization mathematics, statistics and computer science library and information science archivistics and museology journalism and communication science industrial products and commodity science terminology, lexicography and linguistics

Beginning in 1974, KNOWLEDGE ORGANIZATION

(formerly INTERNATIONAL CLASSIFICATION) has been serving as a common platform for the discussion of both theoretical background questions and practical application problems in many areas of concern. In each issue experts from many countries comment on questions of an adequate struc-turing and construction of ordering systems and on the prob-lems of their use in opening the information contents of new literature, of data collections and survey, of tabular works and of other objects of scientific interest. Their contributions have been concerned with

(1) clarifying the theoretical foundations (general ordering

theory/science, theoretical bases of classification, data analysis and reduction)

(2) describing practical operations connected with index-ing/classification, as well as applications of classification systems and thesauri, manual and machine indexing

(3) tracing the history of classification knowledge and methodology

(4) discussing questions of education and training in classi-fication

(5) concerning themselves with the problems of terminol-ogy in general and with respect to special fields.

Aims

Thus, KNOWLEDGE ORGANIZATION is a forum for all those interested in the organization of knowledge on a uni-versal or a domain-specific scale, using concept-analytical or concept-synthetical approaches, as well as quantitative and qualitative methodologies. KNOWLEDGE ORGANIZA-TION also addresses the intellectual and automatic compila-tion and use of classification systems and thesauri in all fields of knowledge, with special attention being given to the prob-lems of terminology.

KNOWLEDGE ORGANIZATION publishes original articles, reports on conferences and similar communications, as well as book reviews, letters to the editor, and an extensive annotated bibliography of recent classification and indexing literature.

KNOWLEDGE ORGANIZATION should therefore be available at every university and research library of every coun-try, at every information center, at colleges and schools of li-brary and information science, in the hands of everybody in-terested in the fields mentioned above and thus also at every office for updating information on any topic related to the problems of order in our information-flooded times.

KNOWLEDGE ORGANIZATION was founded in 1973 by an international group of scholars with a consulting board of editors representing the world’s regions, the special classification fields, and the subject areas involved. From 1974-1980 it was published by K.G. Saur Verlag, München. Back issues of 1978-1992 are available from ERGON-Verlag, too.

As of 1989, KNOWLEDGE ORGANIZATION has be-come the official organ of the INTERNATIONAL SOCI-ETY FOR KNOWLEDGE ORGANIZATION (ISKO) and is included for every ISKO-member, personal or institu-tional in the membership fee (US $ 55/US $ 110).

Rates: From 2006 on for 4 issues/ann. (including indexes) € 115,00 (forwarding costs included). Membership rates see above. ERGON-Verlag, Grombühlstr. 7, GER-97080 Würzburg; Phone: +49 (931) 280084; FAX +49 (931) 282872; E-mail: [email protected]; http://www.ergon-verlag.de

The contents of this journal are indexed and abstracted in Refera-tivnyi Zhurnal Informatika and in the following online databases: Information Science Abstracts, INSPEC, Library and Information Science Abstracts (LISA), Library Literature, PASCAL, Sociologi-cal Abstracts, and Web Science & Social Sciences Citation Index.

Page 7: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.2 ISKO 2008 – Montréal. Call for Papers

196

International Society for Knowledge Organization

11th General Assembly 2008

Agenda

I am most pleased to invite you to the 11th ISKO General Assembly which will take place in August 2008 in Montréal, Canada, at the 10th ISKO Conference. All ISKO members are encouraged to attend both the Conference and the General Assembly The proposed Agenda is as follows: 1. Opening: Election of General Assembly Chair and the Secretary 2. Approval of & Additions to the Agenda 3. Report of the President 4. Report of the Secretary and Treasure 5. Report of the editor of the journal Knowledge Organization 6. New ISKO Chapters 7. Reports of the Representants of ISKO Regional and National Chapters 8. The Eleventh International ISKO Conference 9. Elections of members for the Executive Committee

a. Election of Secretary/Treasure b. Election of two EC members

10. Any other business I look forward very much to seeing as many ISKO members as possible at the 10th In-ternational Conference in Montréal and at this 11th General Assembly. María J. López-Huertas, ISKO President.

Page 8: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

197

Relational Semantics in Thesauri: Some Remarks at Theoretical and Practical Levels

Fulvio Mazzocchi*, Melissa Tiberi **, Barbara De Santis ***, Paolo Plini ****

* / *** / ****Institute for Atmospheric Pollution of CNR, Via Salaria km 29, 300 Monterotondo staz., 00015 (RM), Italy,

*<[email protected]> ***<[email protected]> ****<[email protected]>

**Central National Library of Florence, Piazza dei Cavalleggeri, 1, I-50122 Florence, Italy, <[email protected]>

Fulvio Mazzocchi works as a researcher at the Institute for Atmospheric Pollution of the Italian Na-tional Research Council in Monterotondo (RM). He has studied biologic sciences and philosophy at ‘La Sapienza’ University in Rome. He has participated in a number of projects concerned with the de-sign and the implementation of thesauri for the environmental domain, such as EARTh and GEMET. Among his current research interests there are epistemological foundations of and semantics in rela-tion to knowledge organization.

Melissa Tiberi has obtained a degree in philosophy at ‘La Sapienza’ University in Rome. At present, she is working as an external consultant for the National Central Library in Florence, where she is tak-ing part in the development of the Thesaurus of the Nuovo Soggettario. In the past, by making research on the different kinds of semantic relationships and by implementing them in the thesaurus, she has collaborated to the development of EARTh, too.

Barbara De Santis obtained a degree in interpreting and translating (languages: English and German) at the Bologna University. At present she is working as an external consultant for the Italian National Research Council, at the development of the EARTh project, concentrating manly on multilingual as-pects within thesauri.

Paolo Plini was born in 1960 in Rome. He graduated in 1984 in Natural Sciences from the University of Rome. Since 1994 he is researcher at the Italian National Research Council. At present he is the sci-entific responsible of the Environmental Knowledge Organisation Laboratory of the Institute for At-mospheric Pollution. His main activities are focused on the design and management of EARTh (Envi-ronmental Applications Reference Thesaurus) and of other thesauri on specific environmental topics. Mazzocchi, Fulvio, Tiberi, Melissa, De Santis, Barbara, and Plini, Paolo. Relational Semantics in The-sauri: An Overview and Some Remarks at Theoretical and Practical Levels. Knowledge Organiza-tion, 34(4), 197-214. 39 references. ABSTRACT: A thesaurus is a controlled vocabulary designed to allow for effective information re-trieval. It consists of different kinds of semantic relationships, with the aim of guiding users to the choice of the most suitable index and search terms for expressing a certain concept. The relational se-mantics of a thesaurus deal with methods to connect terms with related meanings and are intended to enhance information recall capabilities. In this paper, focused on hierarchical relations, different as-pects of the relational semantics of thesauri, and among them the possibility of developing richer struc-tures, are analyzed. Thesauri are viewed as semantic tools providing, for operational purposes, the rep-

Page 9: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

198

resentation of the meaning of the terms. The paper stresses how theories of semantics, holding different perspectives about the nature of meaning and how it is represented, affect the design of the relational semantics of thesauri. The need for tools capable of representing the complexity of knowledge and of the semantics of terms as it occurs in the literature of their respective sub-ject fields is advocated. It is underlined how this would contribute to improving the retrieval of information. To achieve this goal, even though in a preliminary manner, we explore the possibility of setting against the framework of thesaurus design the notions of language games and hermeneutic horizon. A thesaurus is a controlled vocabulary designed to allow for successful information retrieval (IR). It in-cludes different types of semantic relationships that guide indexers and searchers to the selection of the most suitable terms for expressing given con-cepts/queries (Dextre Clarke 2001). The relational semantics of a thesaurus are concerned with methods to connect terms with related meanings and consti-tuted by the set of meaning relationships. The basic relationships which typify a traditional thesaurus are three: hierarchical, associative and of equivalence. Being functional and not semantic tools strictu sen-su, in most cases thesauri do not provide a complete and precise definition of the meaning of terms (Schmitz Esser 1991). The relational structure is de-signed, in fact, mainly to enhance the information recall performance (Svenonius 2000). Nonetheless, thesauri can still be regarded as (operational) seman-tic tools in the sense that thesaurus relations are se-mantic relations and that a thesaurus provides the conceptual structure of a subject field (Hjørland 2007).

A number of scholars have stressed the impor-tance of semantic research in relation to information science (IS), and in particular to its subfield of knowledge organization, which is concerned with “the construction, use, and evaluation of semantic tools for IR” (Hjørland 2007, 369). The kind of meaning understanding can have, in fact, a consider-able impact on how knowledge organization systems (KOSs), as a thesaurus, and their relational semantics are designed and implemented. The primary relation-ships employed in a thesaurus, in fact, although at some levels they reflect certain basic cognitive incli-nations of the human form of life (as the one to-wards classification and hierarchization), are not ‘given’ as such—and thus necessarily and universally valid—but ‘constructed’ and defined within a certain (cultural and) theoretical tradition. In some cases, they are even based on assumptions rooted in the centuries of the history of philosophy (Hjørland 2007), as occurs with the notion of genus and species whose origin can be traced back to Aristotle and which is based on an idea of meaning that has been predominant in the Western culture.

A more detailed discussion on such a topic is be-yond the scope of this paper and would concern a further investigation on the nature of semantic rela-tions as being mostly theoretical constructs because built within the framework of a cultural form of life (Wittgenstein 1953), this latter being, however, ex-pression of a most basic human form of life, which defines our primary cognitive means and other basic characteristics as being members of the same species. A number of models of conceptualization of the world have been crystallized and with them also cer-tain ways to consider meaningful the relationships between words. In the Western culture, some of these relations (genus-species, synonyms, antonyms, etc.) are common to all knowledge fields. Others are more specific to particular domains (in a thesaurus they can be represented as associative relationship sub-kinds). However, the implementation of any re-lation always depends on the conceptual and linguis-tic knowledge of the domain they refer to (in a the-saurus it depends on operational concerns, as well).

Thus, in order to acquire a deeper understanding of KOSs as operational semantic tools, it is impor-tant to investigate which theories are behind the principles determining how the relations have to be established. At the same time, it is also important to explore if other theoretical approaches exist and if they can provide useful insights for such issues. A chance to deepen this topic is offered by a new trend in the panorama facing thesauri. In recent years thesauri have entered a larger area of application in-cluding knowledge and language engineering. As a consequence, in this new framework and for present and future information retrieval and intelligent proc-essing needs, the thesaurus relational structure is likely to require an enlargement and a refinement of its definition. In order to achieve these goals, a more thoughtful exploration of the theoretical bases that guide its development appears to be necessary.

Analyzing different aspects of the relational se-mantics of thesauri (the focus will be restricted to the hierarchical relationship) is the subject of this paper, structured as follows. Section 1 presents the basic roles of relational semantics in thesauri as well as the actual trend towards its refinement. After hav-

Page 10: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

199

ing introduced in section 2 the difference between the instance and the generic relationships, in section 3 we investigate a number of issues involved in meaning representation occurring in thesauri through the classificatory and taxonomic aspects of their relational semantics, such as the criteria upon which the construction of the (logical) hierarchical trees are normally based and the distinction between genus-species and perspective hierarchies. In this framework, what insights may be gained from the perspective of hermeneutics and from Wittgenstein’s notion of language game is explored, too, together with their possible practical implications for the re-trieval of information. Section 4 analyzes the parti-tive relationship and the possibility of its refinement, through a differentiation into distinct subkinds. An overview of existing taxonomies of partitive relations is presented, too. Taking the partitive relationship as a case study, a more general discussion concerning the factors on which the choice of the kind of rela-tions, as well as their implementation depend, is also outlined.

1. Relational semantics in thesauri:

its role and possible refinement 1.1 The (general) role of the relational semantics

Thesauri are tools designed for the purpose of im-proving information retrieval. They are based on a natural language that is transformed, however, by means of certain semantic treatments, into an ‘artifi-cial’ and normalized language where terms are basi-cally monosemic and relations among them are made explicit. Two different semantic structures are used in order to achieve this scope: the referential and the relational semantics (Svenonius 2000). Referential semantics consists of methods to limit the meanings or referents of thesaurus terms: homonyms and polysemes are disambiguated in order to improve precision in IR.

It is through the relational semantics of a thesau-rus, that is the object of interest of this paper, that terms are connected to each other when related meanings are identified, devising in this way the rela-tional structure that enhances the information recall performance, although it can also contribute to im-prove precision by suggesting more specific terms that can refine the search and help to eliminate un-wanted information. The network of relations of a thesaurus plays a semantic role since by means of it a further representation of the meaning of each the-

saurus term and a structured representation of the general understanding of a subject area are provided. As stated by Soergel (1995, 369), in fact, “a good thesaurus provides, through its hierarchy augmented by associative relationships between concepts, a se-mantic road map for searchers and indexers and any-body else interested in an orderly grasp of a subject field”.

1.2 Trend towards a refinement of the

relational semantics

Bearing in mind these important functions of the re-lational structure, it is then necessary to define the degree of complexity on the basis of which the the-saurus is conceived, in order to ensure its effective-ness for information indexing and retrieval. Methods to measure its richness have already been developed. Examples can range from the number of relation types to more sophisticated indicators, e.g. the ratio of the number of semantic relations and the number of terms which are included in a thesaurus (Van Slype 1976). The traditional thesaurus format—which stems from the more than twenty year old recommendations of the Standard for thesaurus de-velopment—has been created to cope with informa-tion needs in the library and archival fields (Schmitz Esser 1991).

However, many things have changed and are pres-ently changing (this has been partially reflected in the development of new Standards like ANSI/NISO Z.39.19.2005). Technological advance, which has also brought a larger and differentiated community to search for information on a computer basis, has es-tablished a different framework, which requires reas-sessing prior assumptions and reconsidering whether the existing types of relationships still cope with the current needs of information organization. And ac-tually, a rather widespread opinion is that the tradi-tional thesaurus format is no longer the best-suited means of dealing with these needs. It seems that a richer and hierarchically organized set of relations would be more clearly apt to face them and, as stated by Milstead (2001, 65):

There is reason to expect that provision of se-mantic relationships in controlled vocabularies will become much more extensive in a future standard, though this does not automatically mean that users will need to be aware of all kinds of relationships in order to use a particu-lar vocabulary.

Page 11: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

200

Despite the general trend towards an expansion of the semantic structure, the outcome of some past experiments comparing systems that incorporate dif-ferent degrees of semantic structure seems somehow to question the equation more structure- more effec-tiveness. Besides, in order to evaluate the effective-ness of a semantic structure in IR, other factors should be considered, too, such as the comprehen-siveness of the language or the manipulation in re-trieval of the subject language (Svenonius 2000). This refinement is necessary to enhance thesaurus suitability for uses in the artificial intelligence (AI) and the Semantic Web environments, as well as to in-crease possibilities for IR. In particular, AI applica-tions are creating a demand for more elaborated KOS able to ensure higher expressive capabilities in order to allow inference (Dextre Clarke 2001). In such a setting, the traditional relational structure is considered insufficiently detailed and lacking of a well-defined semantics. “All the well-know relation-ships are fuzzy in most thesauri. We could afford to allow them to be fuzzy as long as their only purpose was to achieve the desired degree of order in our documents, which is a modest requirement com-pared with what we need for Language and Knowl-edge Engineering” (Schmitz Esser 1991, 145).

Hence, along gaining a higher (conceptual and le-xical) user interaction with the KOS in that the re-finement of the relational semantics might improve query formulation and subject browsing, examples of new applications for which such refinement is ad-vocated include supporting automated processing; query expansion; RDF representations of thesauri for the Semantic Web; and interoperability among different KOSs (Soergel et al., 2004; Tudhope et al., 2001).

Finally, the adoption of more expressive semantic relations is advised also to improve the degree of in-ternal structural consistency. In many cases, in fact, the standard set of relationships has not been consis-tently applied (for instance, many links, labelled as hierarchical, could be best resolved through an asso-ciative relationship). For some authors, this is ex-actly a consequence of the fact that thesaurus rela-tionships are not provided with a precise semantics (Soergel et al., 2004).

Some advanced thesauri are developing or have al-ready included—mainly in the medical domain as UMLS or MeSH—richer sets of semantic relation-ships. A further example is the Italian CNR’s EARTh project (Mazzocchi & Plini, 2005). Other

projects, such as the FAO’s AGROVOC, are instead more concerned with the reengineering of thesauri into ontologies. They aim at developing an enriched set of relationships—the latter would be explicitly labelled and applied with specification of rules and constraints—on the basis of a more fully concept-oriented organizational model, where concepts are regarded as independent from and preceding their designation (Soergel et al., 2004). Indeed, the ap-proach towards building thesauri with an extended relational structure partially converges with the idea and work behind ontology development. An investi-gation on ontologies, however, is not the focus of the present paper, even though a number of assump-tions that are normally associated with them are part of the discussion.

The idea of developing thesauri and other KOSs with a more precise and rich semantics, or of using formal logic methods, and employing a notion of concept as if it were an a priori entity, can somehow be viewed as expressions of the same theoretical point of view, based on logical positivism. What is searched for is creating the conditions for an unam-biguous interpretation of terms and relationships mainly to make KOSs suitable for AI applications. According to Svenonius (2004, 585):

The knowledge representations resting upon the epistemological foundations of logical posi-tivism in its operationalist and representational approaches to meaning are … formalized to a greater degree and as such are simpler, more uniform, and relatively free from subjective in-terpretation. The objectivity they provide through definitional rigor is essential for auto-mated applications in retrieval.

This idea of objectivity, however, conflicts with the fact that meanings and semantic structures in KOSs are always established within a given horizon (reflect-ing certain theoretical views and applied to specific knowledge domains and operational contexts).

While, of course, the choice to reduce the com-plexity of reality for operational purposes can be made, and attempts of narrowing it down to such an extent that it becomes manageable are not rare in the AI tradition, a better refinement and specification of relations or the adoption of a logicist view of seman-tics does not eliminate as such the issues posed by this complexity.

The role played by human judgement in such a task and the multiplicity of different contexts in

Page 12: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

201

which all of this can occur cannot, in fact, be ig-nored. And this is something that we will try to demonstrate throughout the whole paper, with spe-cial focus, though, on the hierarchical relationships.

2 An introductory note on the hierarchical

relationship in thesauri

The hierarchical relationship connects pairs of terms when the scope of the broader term (BT) fully in-cludes the scope of the narrower term (NT). Gener-ally speaking, the purpose of the hierarchical rela-tionship is to provide a semantic tree pathway, which can be useful both as a tool for semantic control and specification—the meaning of each term is, in fact, (partially) identified by its position within the tree—and as a navigational aid, by offering users the possi-bility to choose the terms to employ, when referring to a certain concept, among a range situated at dif-ferent levels of specificity (Dextre Clarke 2001). This relation comprises the following three different kinds: generic, instantial and partitive. In a restricted number of thesauri they are distinguished as follows:

BTG/NTG: generic BTP/NTP: partitive BTI/NTI: instantial

The next section will first introduce the generic and instance relationships. Then, a discussion about the main features of the generic relation and a compari-son with perspective hierarchies will follow. Special emphasis will be placed on how any given classifica-tion or hierarchization of a term depends on which of its conceptual features are made salient in the light of a given perspective. Section 4, instead, will analyze the partitive relationship.

3 The generic and instance relationships

The generic relationships--named also inclusion, subsumption or hyponymy—connects a genus with its species (e.g., animals—mammals). An important property of this relation, also used as a criterion for its identification, is the inheritance of properties: any attributes of the genus (hypernym) must also be at-tributable to the species (hyponym). In this sense, the meaning of the hyponym derives from the mean-ing of the hypernym, plus some additional features. Chaffin et al. (1988) distinguished four kinds of in-clusion according to the type of concept involved: natural object-kind; artefact-kind; state-kind; and ac-

tivity-kind. In the instance relationship the narrower terms are nor parts neither types, but individual in-stances of the broader terms. In a thesaurus, this characteristic of individuality is expressed through a proper name (e.g., deserts—Sahara desert).

At this stage, the distinction between generic rela-tionship and instantiation seems clearly stated. No-netheless, Milstead (2001) has emphasized that in the standards for thesauri there is no method used to determine the genus-species relationship that could not be applied also to the instance relationship. For example, the ‘all-and-some’ test—which is used to assess the validity of the generic links (ISO 1986)—can be applied to both cases (if grammatical differ-ences in number are admitted). The same is true also for ‘is a’ attribution:

1a. All mammals are animals / Some animals are

mammals 1b. All (although only one exists) Sahara desert

are deserts / Some (one) deserts are (is) Sahara desert

2a. a mammal is a animal 2b. the Sahara desert is a desert

All of this may also lead to conceive the instance re-lationship as a variant of the genus-species relation-ship. However, unlike the generic one (concept-to concept relationship) the instance relationship points to a change of ‘logical level’ (individual-to-concept relation).

3.1 Associative, perspective and logically-based

hierarchies

The hierarchical relationship, and particularly the generic kind, is perhaps the most important within a thesaurus and its proper application plays a key role in ensuring the quality of a structured vocabulary. But can we estimate such aptness in an abstract sense? It is true that in many thesauri this relation-ship has been implemented in quite an inconsistent way, often resulting in unpredictable semantic struc-tures (Dextre Clarke 2001).

As mentioned before, a higher degree of rigour is thus advocated to improve the level of structural consistency. Nonetheless, different contexts may re-quire different solutions, each having its own impli-cations. Furthermore, it is of the utmost importance to investigate the underlying assumptions that the generic relationship, on which basis hierarchical trees

Page 13: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

202

are built, entails not only to deepen our understand-ing of it, but also to have the chance to critically ana-lyze these assumptions in the light of a comparison with alternative models.

3.1.1 RT-kind version of hierarchy

Many existing thesauri have labelled as hierarchical relations between terms not belonging to the same conceptual category. An example of it can be found in the GEMET thesaurus where the term Recycling ratio (a parameter) is considered to be a Narrower Term of Recycling (an operation). Relationships like this have been established according to a definition of hierarchy that is of a ‘pragmatic’ nature and ori-ented towards the function of the search process: “Concept A is broader than concept B whenever the following holds: in any inclusive search for A all items dealing with B should be found. Conversely B is narrower than A” (Soergel 1974, 79).

Using such a version of the hierarchical relation can be useful to manage certain databases. But if it may somehow function efficiently at local levels, i.e. in a specific operative context, in a different and wi-der framework, this choice may result unsatisfactory, since a so-developed hierarchy would suffer lack of consistency with other structures, not being con-form to the standard thesaurus format. Moreover, confusion may also arise if RT-kind (associative) hi-erarchies, like the above example, are labelled in the same way as the genus-species relation (or in any case as a hierarchical kind).

3.1.2 Genus-species and perspective hierarchies

In developing the thesaural relational structure, and thus hierarchies, Foskett (1980) emphasized the im-portance of the logical perspective: a thesaurus would benefit if the choice of terms and relation-ships reflected the logical structure of a subject field, instead of being a scarcely systematized gathering of terms extracted from the literature. Other authors as Maniez (1988) stressed that the usefulness of logical relationships should be subordinated to the purposes of information indexing and retrieval. Svenonius (2000), for her part, underlines the distinction be-tween genus-species and perspective hierarchies. In a more general sense, this distinction, taken up by a number of thesaurus standards, is expressed as being between paradigmatic/a priori relations—e.g., genus-species and syntagmatic/a posteriori ones—among them, perspective hierarchies. The genus-species re-

lationship is viewed as logically-based, definitionally true and functioning context independently. Besides, corresponding to the logical relationship of inclu-sion, it has been defined in terms of the properties of reflexivity, antisymmetry and transitivity.

Conversely, perspective hierarchies are regarded as functioning more contingently in given empirical contexts and depending on the point of view. Nor-mally, they are not provided with the same logical properties of the generic hierarchies. They express, in fact (Svenonius 2000, 164):

Points of view or aspects from which an object or concept is regarded. In many discipline-based classifications, the point of view is the knowledge domain in which the object or con-cept is located .… The genus-species relation-ship limits a rat to being a rodent; a perspective relationship allows it to be an agricultural pest, an experimental animal, and so on.

Thesaurus standards argue that relationships to be included in a thesaurus should be a priori rather than a posteriori. However, the genus-species and the per-spective relationships can have different functions and, in defining which hierarchical relationships a thesaurus has to be made of, different factors should be taken into consideration, including the character-istics of the vocabulary to be structured and the pur-pose for which the relations are intended in retrieval.

Concerning the first point, Svenonius (2000 and 2004), for example, in terms of hierarchy, considers a stricter logical ordering as particularly apt to struc-ture terms whose meanings are somehow more fixed, e.g. scientific terms, whereas she regards perspective hierarchies as more suitable to represent polyseman-tic and vague lexicons, as is mostly the case in social sciences. Regarding the second aspect, the genus-species relation, being logically based, is valuable, for example, for search broadening and narrowing as well as for retrieval strategies playing on inheritance properties. Perspective hierarchies, instead, are not suitable for these applications. Their added value in IR consists of providing contexts that elucidate from which point of view is a term being considered. In this way, they can assist in navigation and are apt for the disambiguation of multireferential terms (Sve-nonius 2000).

Perspective hierarchies are used by classifications such as the Dewey Decimal Classification (DDC). The term ‘Insect’, for example, while it can be lo-cated only in a single genus-species hierarchy (BT:

Page 14: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

203

‘Arthropoda’), it can instead pertains to several per-spective hierarchies according to the points of view from which its meaning is regarded: an insect can be viewed, for example, as an agricultural pest, a disease carrier, etc. (Svenonius 2000 and 2004). In the EARTh thesaurus, the idea of multiple thematic clas-sifications of terms as a complement of placing them into the genus-species tree has been developed on a similar basis (Mazzocchi & Plini, 2005).

It should be noted that terms linked by perspec-tive hierarchies belong to the same conceptual cate-gory. Yet, being these links based on a situated per-spective, they are not amenable to the ‘all-and-some’ test and thus, according to a strict application of the standards, not accepted as a valid hierarchy. To ex-plain this, ISO 2788 mentioned as an example ‘Par-rots BT Birds’, which is invariably a true (generic) hierarchy, and thus compatible with the all-and-some test, and ‘Parrots BT Pets’, that, however, is not (be-ing a perspective hierarchy), since some Pets are Par-rots, and only some Parrots are Pets. Yet, if this is mostly true, there may be special cases or particular circumstances where this does not apply. For exam-ple, in the restricted context of a specialized thesau-rus on domestic animals, Parrots as NT of Pets can be, instead, accepted.

Anyway, despite special cases, being perspective hierarchies somehow context-dependent, it seems that only genus-species hierarchies have the potential to provide the basis for a more consistent application throughout different systems.

3.1.3 The all-and-some test

Indeed, this matter is more complex than it appears. A couple of criteria are normally used to determine genus-species hierarchies. First, terms have to belong to the same conceptual category. This is a necessary (but not sufficient) condition to ensure that a hier-archy is logically based. Both the logical and perspec-tive hierarchies are compatible with it, but (nor-mally) not the RT-kind hierarchy.

The other criterion is compatibility with the all-and-some test. In this latter, Fisher (1998, 20) has recognized the extensional definition of subsump-tion:

Informally, it is said there that concepts are taken as classes which have members, and that for a genuine narrower concept [all] its mem-bers must also be members of the broader con-cept while for the broader concept only [some]

of its members must also be members of the narrower concept.

It should be said, however, that if on the one hand its usefulness is undeniable, on the other this test seems to present a number of issues that still need to be addressed. For example, the test does not dis-criminate which levels of a genus-species tree are linked when establishing a hierarchy. ‘Parrots BT Birds’, ‘Parrots BT Animals’ and ‘Parrots BT Organ-isms’ are all validated as hierarchies, since all parrots are birds, animals and organisms. But, of course, they encompass a different degree of (conceptual) information.

3.1.4 The intentional definition of the generic rela-

tionship and its historical predecessor

Naturally, the genus-species relationship may also be described on the basis of a representation of terms/concepts as sets of attribute values or features. We proceed from superordinates to subordinates, which contain all the attribute values of the former, by means of the addition of further key conceptual features (Fugmann 1993). In this formulation, Fisher (1998) has recognized a form of the intentional defi-nition of subsumption. Of course, as concepts be-come more specific they will also correspond to smaller classes of referents.

In order to better clarify this scheme, it might be helpful to briefly refer to the philosophical tradition from which it derives. Broadly speaking, the origin of the notions of genus and species in the history of the Western thought can be traced back to Plato’s and Aristotle’s philosophies, whereas the representa-tion of a series of subsequent genus-species links, that starting from a top level (categories) go down to the ultimate or infima species—which in turn are su-perordinate to the individuum—through a vertical taxonomic structure, was firstly conceived with the Porphyrian tree.

The crucial notion for the establishment of the genus-species relationship is that of specific differen-tia, which represents the key distinctive element dif-ferentiating a species from all others sharing the same genus (co-hyponyms). For example, the cate-gory ‘substance’ with the specific differentia ‘material’ becomes the subordinate genera ‘body’, while with the differentia ‘immaterial’ it becomes ‘spirit.’ The tree in figure 1 derives from adding, along different hierarchical levels, differentiae to the first of the ten Aristotle’s categories, substance. Even though Aris-

Page 15: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

204

totle never puts it in this way, by means of the same method analogous trees are expected to be developed from any of the other categories (quality, quantity, relation, where or place, when or time, position, hav-ing or state, action or operation, passion or process). According to some authors (Girgenti 2004, intro-duction to Porphyry’s Isagoge), the genus-species tree can be navigated both in an upward direction—ascension, according to a logical point of view—or in a downward direction—declination, based on an on-tological perspective.

The same notion of differentia plays a key role also in defining. A classic example is the definition of man (human) as a ‘rational animal.’ The parts of this definiens are ‘animal’, the proximate genus that in-corporates within its range of meaning all the essen-tial elements of the superordinate genera and ‘ra-tional’, the specific differentia distinguishing man

from all other animals. Listing all the differentiae, ‘human’ is defined as ‘rational sensitive animate ma-terial substance.’

Summing up, in a hierarchical arrangement ob-tained in this way, two items are most relevant: the mechanism of conceptual feature addition (the lower level is always a subclass of the higher one) and the key differentiating character of the added conceptual features. For Aristotle, such a method reflects, on the logical and language planes, a principle that oper-ates on an ontological level with the purpose of iden-tifying the distinctive features of things. Should the latter be adopted, the problem is then how to put it into practice, also considering that our highly struc-tured contemporary knowledge systems seem to be developing more on a horizontal and sectorial plain, than on a vertical level, as a univocal unfolding from an Ur-structure.

Figure 1. the Tree of Porphyry, as drawn by the 13th century logician Peter of Spain (by Sowa 2000, slightly modified)

Page 16: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

205

More generally, the possibility itself of accessing on a rational level the ‘meta' point of view— i.e., the fundamental ‘place of observation’ where the onto-logical order is unveiled—has become, from an epis-temological point of view, questionable and thus, to-gether with it, also the chance to separate, in a final and objective way, what is essential from what is ac-cidental and to develop that ‘unique’ genus-species tree, which derives from the further addition of spe-cific differentiae to the top categories.

According to Eco (1983), also Aristotle in some of his works, such as De partibus animalium, recog-nizes at an another level the possibility to develop multiple trees, that could be complementary among themselves, according to different perspectives. Given the impossibility to univocally distinguish ac-cidental from distinctive features, such characteristic of distinctiveness can, in Eco’s view, be acquired only in relation to a situated perspective (e.g., the classifi-catory or definitory problem in question).

Contemporary biological systematics and taxo-nomy provide an interesting example of synchronic copresence of different theoretical approaches. The classic Linnean approach—arranging organisms by their morphological similarities—and cladistics (or phylogenetic systematics)—where living beings are classified on the basis of their order in branching in an evolutionary tree—coexist and may also be used in a combined way to obtain further information. Different (theoretical) perspectives can, thus, lead to focusing on a diverse set of characteristics. But they need not necessarily be regarded as being in opposi-tion. There may be cases in which they provide com-plementary information, useful in obtaining a more complete picture of the matter. 3.1.5 Classification as interpretation

Broadening the perspective, this latter position may (partially) be related to the notion of interpretative horizon as developed, in Gadamer’s work, in the framework of contemporary hermeneutics. Such a notion, in fact, has mainly been used to explain the historicity of human understanding, yet in a more general way it can be regarded as the range of vision including “everything that can be seen from a par-ticular vantage point” (Gadamer 1976, 302). In op-position to an objectivistic and universalistic view, the idea of ‘classification as interpretation’ acknowl-edges the fact that any classificatory act is always made from a delimited horizon, which determines how classification is conceived and undertaken and,

thus, within the limits of certain basic constraints, which aspects of an item (term or object) are made salient.

In information science, Hjørland and Nissen Pedersen (2005) have developed a theory of classifi-cation for IR (that by extension can be applied to hierarchization) somehow reflecting this principle and that has been summarized by Hjørland himself (2007, 373) as follows:

Classification is the ordering of objects (or processes or ideas) into classes on the basis of some properties. (The same is the case when terms are defined: It is determined what objects fall under the terms) …. The properties of ob-jects [which are portrayed in the conceptual features of the terms used to name such ob-jects] are not just ‘given’ but are available to us only on the basis of some descriptions and pre-understandings of those objects [although these still have ‘objective’ properties] …. De-scription (or every kind of representation) of objects is both a reflection of the thing de-scribed and of the subject creating the descrip-tion …. The selection of the properties of the objects to be classified must reflect the purpose of the classification. There is no ‘neutral’ or ‘objective’ way to select properties for classifi-cation because any choice facilitates some kinds of use while limiting others …. Any given clas-sification or definition will always be a reflec-tion of a certain view or approach to the ob-jects being classified.

Regarding classification as interpretation means to acknowledge the fact that we always act from a clas-sificatory horizon (Paling 2004). This notion, how-ever, needs to be further explained and this can be done by indicating its possible constitutive elements. First, it comprehends the ontological and epistemo-logical meta-assumptions that provide the ‘lens’ through which we look at the world (Kuhn 1970) and the way in which they are reflected in the scien-tific activity. For example, positivism and instrumen-talism or hermeneutics have different views of the (same) world and, accordingly, lead to different con-ceptions of classification and hierarchization, too. Secondly, it includes the domain to which the classi-fication is referring. As stressed in their theory by Hjørland and Nissen Pedersen (2005), criteria for classification are (usually) domain-specific, since dif-ferent domains may need different descriptions and

Page 17: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

206

classification of items in order to meet their specific purposes.

For example, ‘benzene’ can be described and de-fined in several different ways depending on the dis-cipline or context in which it is considered. Chemists, of course, emphasize its structural properties in being precursor of a class of chemical compounds. Yet, physicists may focus on other properties and see it as a volatile and inflammable. Other descriptions can emphasize its possible effects—biologists may con-sider its toxicity and the different routes through which it can enter an organism—or employments—engineers would consider it as a fuel for combustion engines (Fugmann, 1993). Furthermore, the fact that within the same domain conflicting paradigms and views can coexist should also be taken into considera-tion (Hjørland 2007, 385): “in every domain, there exist different theories, approaches, interests, or ‘paradigms’, which also tend to describe and classify objects according to their respective views and goals.”

Finally, the purpose of classification plays a role in determining the classificatory horizon, too. In fact, even if a domain can be viewed in terms of a com-mon paradigm, different practical concerns may lead to different choices in establishing classificatory and hierarchical structures.

3.1.6 Possible insights from the language games theory

In this context, we believe that the notion of lan-guage games (Sprachspiele) can play a significant role and be relevant for IS issues, too. This notion has been introduced by Wittgenstein (1953) to explain the multiplicity of language practices that occur within a language. Language does not consist, in fact, of a single unified game. It is regarded, instead, as a collection of multiple and indefinite games. The ba-sic assumption of this theory is that the meaning of a word should not be regarded in terms of its referent, but of its use. Speaking language is a social action. To know the meaning of a word means to know how to use it as part of an activity, within the framework of a particular language game and its rules.

Wittgenstein has introduced also the notion of family resemblances. Considering several possible and different Sprachspiele, the instances of the use of a word do not (necessarily) share a common denomi-nator or essence (as it is, instead, assumed in class inclusion). They are ‘peripherically’ linked through family resemblances, being similar but each in a dif-ferent manner, like members of a family (where some may have the same eyes, others the same form

of mouth or chin, but without a single feature that necessarily all share).

Following this theoretical approach, it is clear that, having language and meaning the above charac-teristics, they should not be confined to the rules of a particular language game. Should a deeper investi-gation still be required, this has a number of impor-tant implications with respect to the idea of hierar-chical arrangement (in general and applied to a the-saurus) and to a number of other issues. As stated by Svenonius (2004, 578):

Subscribing to the concept of language games entails subscribing as well to the position that knowledge representations are not descriptive of things and relations in the real world; rather they are descriptive of linguistic behavior. The use of knowledge representations to organize information is one kind of language game, one kind of linguistic behaviour.

Besides, linking again the main point to what has been said in the previous paragraph, it could be af-firmed that each field of knowledge, which has its own set of conceptualisations, has also its particular language games with specific rules (although this does not mean that they cannot share common ele-ments). Meaning of words can, therefore, change (at least partially) from one domain to the next: “the meanings of words—and, thus, words used to name subjects—are in part fixed and, in part, variable. The variable part assumes its value by being contextual-ized within a system of concepts” Svenonius (2004, 581).

Further considerations would be needed to inves-tigate whether a hierarchy of conceptual features is possible, if some of these features cannot be ‘can-celled’ (without causing the total alteration of the as-sociated meaning) and what their nature is. The meaning of a term has, in fact, also a more stable part, that is likely to be maintained also after a major paradigm shift or along different domain-based viewpoints. Coming back to the example of ‘ben-zene’, all the listed descriptions share a common premise: benzene, first of all, is a ‘substance’ (that can have toxic effects, be used as fuel, etc.). Similarly, although diverse taxonomizations of a certain kind of animal may be possible (see note 5), none of them questions its recognition and classification at a higher level as an animal. These features, thus, pro-vide a more stable background while modifications occur mostly at a foreground level.

Page 18: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

207

Furthermore, in a given historical period, being expression of the dominant view, certain semantic re-lations (and then those conceptual features on which their establishment is based) appear to be more ‘sta-ble’ and can be (extensionally) validated by the all-and-some test. For example, according to the taxon-omy of the scientific discipline which is interested in studying it (chemistry), benzene <is_a> ‘organic aromatic substance’ and this ‘always’ holds. But this is not always the most important aspect in terms of application. In a nature conservation thesaurus, it might be more useful to represent the meaning of benzene as a ‘pollutant’ rather than as an ‘organic aromatic substance’. It is, however, true that this kind of relationships, in virtue of the stronger con-sensus sustaining their institution, can (at least) pro-vide a basis to ensure a certain degree of compatibil-ity and interoperability among different systems.

Of course, not all the words convey meaning in the same manner. Some of them have more variable meanings, i.e. more dependent on the context, than others. For example, words used in the social sci-ences are regarded to have more variable meanings, whereas words used in science as having more fixed meanings. But this is only partially true. Not only, in fact, meaning of scientific words changes along history in correspondence of paradigm shifts (Kuhn 1970). The idea that, in a given historical moment, science is a knowledge system based on universal conceptual structures and that words used in scientific discourses have one and the same meaning in all disciplinary domains has been ques-tioned by part of the XXI century epistemology. Kuhn (2000), for example, regards each discipline or community of practitioners of a certain scien-tific field as bearing its own set of conceptualiza-tions, crystallized in a particular lexical taxonomy, in the frame of which terms acquire specific mean-ings. This implies that for a (restricted) number of terms meaning changes along different disciplinary fields (local incommensurability).

Evidently, this fact can be particularly relevant for the design of the hierarchical arrangement of scien-tific thesauri whose subject field is multidisciplinary (as those devoted to ‘environment’). Moreover, the fact that in a given field of knowledge, different theoretical views can exist simultaneously, providing different descriptions of objects and interpretations of the meaning of terms, although less evident (and also less agreed upon) may be applied to scientific disciplinary areas, too (see also note 5).

Thus, in all cases, concepts are not a priori (and as such universal) entities, but should be regarded in the context of a given conceptualization system in which they are embedded. The meaning of words, including those that are part of scientific vocabular-ies, should be understood according to the rules of the language games they belong to. The same word can have (slightly or significantly) different mean-ings according to its use in diverse language games, which can pertain to different knowledge fields or to different theoretical views inside the same domain.

3.1.7 Implications for the retrieval of information

Both principles based on a hermeneutic perspective and the language games theory have practical impli-cations for the retrieval of information (based on the use of a thesaurus). Many databases contain, in fact, documents that have been produced in different sub-ject fields and, when within the same domain, some-times according to different theoretical perspectives. Meaning, however, cannot be defined by examining the documents of a literature as such. Documents should rather be seen as a means to access the con-ceptual structure of a given knowledge field and the language games that it encloses.

Words (used in documents), in fact, pertain to given language games. Each paradigm within a given domain (of which it embodies the ‘cognitive’ author-ity), specifies the basic rules of the use of any term and, then, its meaning. If searchers, as is actually the case, look for concepts (contained in documents) as defined in subject fields and their literatures, semantic tools such as thesauri should be able to represent—by means of their hierarchical arrangement and other re-lations—the meaning of words consistently with how these are defined in the language games of such do-mains. The retrieval of information would, in fact, be facilitated if a subject field represented in the docu-ments of a database had such documents indexed and searched by means of words used in accordance with the (domain-based) language games they refer to (Andersen & Christensen 1999).

In particular, users should be made aware of the possible different views on the meaning of words (as occurs in different language games) and, thus, of all the possible different views on a given topic (that can focus on as many aspects of it) which may be useful for them (Hjørland 1998). As underlined by Hjørland (2007, 389), while attempts at standardiz-ing terminology can cause the removal of some of these views, “a precondition for designing quality

Page 19: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

208

KOS is that the designer knows the different views and is able to provide a reasonably informed and ne-gotiated solution.”

Of course, a thesaurus has its own language game, too, whose rules are basically oriented towards the achievement of a semantic univocity for operational purposes. However, there are a number of devices that can be used in a thesaurus to represent the dif-ferent aspects of the semantics of terms and (wher-ever necessary) to disambiguate them. One of these is the coupled use of genus-species and perspective hierarchies, in order to exploit the different func-tions that they could have. As already mentioned in 3.1.2, perspective hierarchies can provide additional views about the semantics of a term (or the aspects of a given topic) and can be used for disambiguation purposes, while ‘all-and-some’ hierarchies can also provide a shared basis to make different KOSs more compatible and interoperable.

4. The partitive relationship

This section deals with the partitive relationship. A number of taxonomies organizing it into subclasses are also presented, followed by some remarks on the role played by ‘interpretation’ in implementing these relations (and semantic relations in general) to sat-isfy the needs of different conceptual contexts and empirical circumstances.

In the partitive relationship (also named mero-nymy) the narrower terms are parts of the broader ones. In linguistics, a number of test-frames are used to detect it, such as ‘an X is a part of a Y’ (or in-versely ‘a Y has an X / Xs’), but none of them seems to provide an unambiguous indicator of it, since they can also be used to express non-meronymic relation-ships (Cruse 1986).

Furthermore, which basic properties (among re-flexivity, antisymmetry and transitivity) may be as-cribed to this relationship is still a debated topic (Iris et al., 1987; Winston et al., 1986). As a rule, thesauri

standards regard only four types of this relation as hierarchical: those taking place among parts of the body; organizational structures; geographical loca-tions and disciplines or fields of knowledge. All other cases are classified, instead, as associative rela-tionships, even though exceptions may be accepted in specific subject areas (ISO 1986). The partitive re-lationship is, thus, not restricted to material objects and should be viewed as a collection of different subkinds (Iris et al., 1988). Yet, no consensus has been reached on the identification of such subkinds, nor has on the linguistic patterns that express them.

4.1 An overview of existing taxonomies of partitive

relations

A number of interesting studies have been under-taken in different knowledge fields, such as linguis-tics, logic and cognitive psychology, in order to de-velop a taxonomy of partitive relationships. Mostly, they focus on the degree of differentiation of the parts and on their role with respect to the whole. Despite their different origins and aims, the outcome of these studies provides useful insights also for a re-finement of this relationship in thesauri.

Perhaps the most influential taxonomization is by Winston et al. (1987), based on experimental data and on a psychological perspective. Winston and his co-workers distinguish six subtypes on the basis of the values of three relational elements, which sum-marize the attributes of the relationships:

1. Functionality (functional/non functional): parts

are/are not in a specific spatial or temporal posi-tion with respect to each other, which sustains their functional role with respect to the whole.

2. Degree of similarity (homeomerous/non homeo- merous): parts are similar/dissimilar to each other and to the whole to which they belong.

3. Spatial cohesion (separable/inseparable): parts can/ cannot be physically separated from the whole.

Subtypes Examples Functional Homeomerous Separable

Integral object- component

Collection- member Mass-portion Object-stuff Activity-feature Area-place

Cup-handle, Linguistics-phonology

Forest-tree Salt-grain Bike-steel Shopping-paying Desert-oasis

+ - - - + -

- - + - - +

+ + + - - -

Table 1. Winston et al.’s taxonomy of the partitive relation

Page 20: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

209

This scheme has already been integrated in some ad-vanced thesauri, e.g. in the project for the develop-ment of an environmental thesaurus—EARTh (En-vironmental Applications Reference Thesaurus).

Together with the description of each relation of the Winston et al.’s scheme, in order to have a look at some results of this implementation, we have listed a number of demonstrative partitive cases ex-trapolated from EARTh’s environmental (and closely related) terminology. Integral object-component It takes place between a whole (an ‘integral object’)—which presents some kind of patterned organization or structure—and its components. These latter are also patterned and generally bear specific structural and functional relationships to one another and to the whole of which they are parts. Integral objects consist both of things having an extensive dimension, such as physical things (e.g., natural objects or artefacts), and things whose parts are not extensively contained in their wholes, such as abstract objects and organiza-tions. Due to this reason, a further differentiation in subtypes might still be planned. Accordingly, in the EARTh thesaurus this relation is expressed as follows: <has_component/is_component_of>, used for mate-rial objects—these include, for example, biological systems (cells, anatomical structures, plants) and, among artefacts, instruments, installations and build-ings—and their parts; and a second expression, which however still needs to be defined (for the time being the generic <has_part/is_part_of>) to be used, in-stead, to express the relation between abstract entities, as for example disciplines, and their ‘parts’.

Cell <has_component> Cell membrane Cardiovascular system <has_component> Heart Electric vehicle <has_component> Electric engine Ecology <has_part> Land ecology

Collection-member It records membership in a collection. This relation-ship does not require that members have a given structural organization or carry out a particular func-tion in relation to each other and to the whole. Col-lection-member has some similarity to (and can con-sequently be confused with) the relationship of in-clusion since both involve membership of individuals in larger sets. Nevertheless membership in a class (genus) is determined by similarity to the other members (species) based on a set of intrinsic proper-ties. Membership in a collection is instead defined on

the basis of characteristics that are extrinsic to the individual members, such as spatial or temporal proximity or a social connection. Chaffin and Herr- mann (1988) distinguish three subkinds of this rela-tionship: group-member (e.g., herd-cow); member-collection-member (e.g., tree-forest, fleet-ship); and organization-unit (e.g., army-battalion). Up to now, in EARTh collection-member has been applied to connect material objects and is expressed by <has_member/is_member_of>.

Flora <has_member> Plants Game <has_member> Game species Car population <has_member> Car

Mass-portion Portions are homeomerous parts of physical objects or masses since every portion is similar to the others and to the whole. They have arbitrary boundaries and lack functional relation to the whole. They should also be distinguished from ‘pieces’ that originate, for example, from the destruction of an object and, unlike portions, are not always homeomerous. In Cruse’s words (1986, 158) “The contrast between parts and pieces is potentially operative even with highly integrated wholes such as animal bodies: there is a clear difference between such a body hacked to pieces, and one carefully dissected into its parts”. Chaffin and Herrmann (1988) make also a distinction between mass-measured portion (e.g., pie-slice) and mass-natural tiny piece (e.g., salt-grain). Furthermore, they include also measure-unit (e.g., mile-yard) as a third subkind. In EARTh, so far it has had a quite limited application and is expressed by <yields_ portion/is_portion_of>.

Land <yields_portion> Parcel of land

Object-stuff This relation links an object to the substance or mate-rial from which the object is naturally made or manu-factured/created. It differs from the object-com- ponent relationship in that the stuff of which an ob-ject is made cannot be physically separated from it without altering its identity. Chaffin and Herrmann (1988) distinguish mass-stuff (e.g., trash-paper) from object-stuff (e.g., lens-glass). These authors, like others such as Ahmad and Fulford (1992) and Iris et al. (1988), do not regard this relationship as partitive. It can, in fact, be considered also as a kind of associative relationship, as has occurred, for example, in EARTh where it is expressed by <consists_of/ is_matter_of>.

Page 21: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

210

Road <consists_of > Asphalt Can <consists_of > Tin Bicycle <consists_of> Aluminium

Activity-feature It points to the relation focused on those parts—phases, stages, discrete periods, features, etc.—that form, in a structured manner, a process or an activity, which constitutes the whole. Chaffin and Herrmann (1988), who do not include it among partitive kinds, distinguish process-phase (e.g., growing up-adoles- cence), continuous activity-phase (e.g., cycling-pedal- ing), and discrete activity-phase (shopping-buying). In EARTh, this relationship has been applied to (mostly natural) processes and to (social and other

related) activities and their ‘parts’. It is expressed by <includes_ phase/is_phase_of>.

Metabolism <includes_phase> Anabolism Environmental policy <includes_phase> Nature

conservation policy Transport planning <includes_phase> Road plan-

ning

Area-place It is applied to things that have a spatial extent, indi-cating the relation between areas and specific places within them. The latters are inaliable parts of the whole (areas) in which they are included. However, like members of a collection, places are not parts be-cause they functionally contribute to the whole. In EARTh, it has been applied mostly to geographic entities and expressed by <spatially_includes/is_ spatially_ included_in>.

Desert <spatially_includes> Oasis Earth <spatially_includes> Continent City <spatially_includes> City centre Park <spatially_includes> Central park area

Apart from Winston et al.’s proposal, there are also other taxonomies of the partitive relationship, mostly developed in the linguistics domain. For example, the

above mentioned Chaffin & Herrmann (1988), distin-guish a set of subkinds by using relational elements that do not coincide with those of Winston et al.. Iris et al. (1988), propose a classification founded on four basic models. Three of them (the functional compo-nent; the segmented whole; collection and members) are similar to the first three Winston et al.’s categories, whereas the other (sets and subsets) resembles the no-tion of class-inclusion. Another comparable list has been proposed by Gerstl & Pribbenow (1995), who identify kinds induced by (mass/quantities, collection/ elements and complex/components) or independent of (segments and portions) the compositional structure. Finally, Cruse (1986) classifies the partitive relation-ship according to quantificational differences.

In the work carried out by the Subcommittee on Subject Relationships/Reference Structures of the ALA (American Library Association) Subject Ana-lysis Committee (1997)—who has compiled a master list of 165 relationships from subject indexing and cataloguing literature—two main categories are dis-tinguished: the first, composition partitive relation-ships, focuses on aggregates or composites of various members of a class of entities; and the other, who-le/part pairs, is based on structural and spatial rela-tions and consists of further eight subtypes.

Composition partitive relationships Whole/part pairs

Non-physical whole/part pairs Physical whole/part pairs

Anatomical whole/part pairs Artefact whole/part pairs Geographic whole/part pairs

Topic inclusion Discipline/subdiscipline pairs Whole/attachment pairs Whole/integral part pairs Whole/piece pairs Whole/segmental part pairs Whole/systemic part pairs

Table 3. Subtypes of the partitive relation from ALA (simplified version)

WINSTON et al. IRIS et al. GERSTL & PRIBBENOW

Integral object/component

Collection/member

Mass/portion

Functional component

Collection and members

Segmented whole

Complex/components

Collection/elements

Mass/quantities

Table 2. (Partial) overlapping of partitive categories in three of the cited taxonomies

Page 22: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

211

Of course, in the framework of ontologies, where at-tempts to eliminate problems of ambiguity by pro-viding formal definitions of relations are undertaken, the issue of meronymy is greatly discussed, too. An interesting paper dealing with this topic, though in the framework of a broader analysis, is from Smith et al. (2005), who have advanced a Relation Ontology to assist the development of biomedical ontologies, such as the Gene Ontology, and promote their inter-operability. 4.2. Some remarks on relation refinement and

implementation

Without going further into this analysis, even though the overview is still incomplete, it seems pos-sible to infer an interesting point, that can be applied to all relational patterns. Despite the general agree-ment regarding a restricted number of basic relation-ships (namely hierarchical, associative and equiva-lence), that are in fact used in thesauri and other KOS, a consensus on how to differentiate them into distinct subkinds has still not been—and seems more difficult to be—achieved. Some authors such as Tud-hope et al. (2001) have highlighted the risk of an un-disciplined extension of the basic semantic model. For this reason, in order to ensure a certain degree of interoperability among advanced systems adopting different solutions, they advocate the adoption of a minimum common denominator—namely the basic thesaural relationships—for different types of appli-cations.

All of this may be partly comprehensible since we are still at an experimental stage in this research field. However, even though, as viewed in the case of the partitive subkinds, there is a more stable consensus among scholars on some more specific relations, the difficulty of univocally determining the ‘final’ set of relations may also be connected to an impossibility of identifying a solution for any circumstance and context and which could be regarded as equally valid from all viewpoints. Hjørland (2007, 380-381) has underlined, for example, how choices concerning which kinds of semantic relations a system should include have to be related to their practical usage in IR: “In a way, it is the specific ‘information need’ that determines which relations are fruitful and which are not in a given search session. A semantic relation that increases recall and precision in a given search is relevant in that situation.”

The fact is that the further differentiation of the basic semantic relations into subkinds and their bet-

ter definition do not necessarily guarantee the same results in all applications. Once a shared set is estab-lished, this latter may still be dissimilarly imple-mented. As already said in describing classification, multiple features can, in fact, be ascribed to terms (or objects). Depending on which of these features are made salient in a given context, different rela-tions can be established.

Indeed, the application of the relations in a the-saurus should reflect the knowledge of the subject area that the thesaurus aims to represent (with its paradigms and language games). Besides, it can vary according to different practical concerns and, in any case, to the way in which the criteria defining rela-tions are interpreted and implemented in given cir-cumstances. This might be applied to the partitive relation, too. Depending on all these factors, there could be room left for different ways of conceiving how parts relate to wholes. As underlined in their study of partitive relations by Chaffin & Hermann (1988), even the same pair of objects, and thus of words representing these objects, can be viewed as being connected by different relations once the con-text changes. This means that, even though cases of strong relational ‘ambiguity’ of such kind are some-how limited to a restricted number, there is not a single way to associate a word-pair to a relation kind (and this concerns also other kinds of relations) (Chaffin & Hermann 1988, 321-22):

The phenomenon of relation ambiguity makes the point that relations are constructed from knowledge of the two concepts related and that a particular relation may make use of some as-pects of the two concepts and ignore others .… If two words have more than one relation, then each relation must be based on somewhat dif-ferent aspects of the two concepts. This point about relation ambiguity may be clarified by comparison with ambiguity in other domains. The closest parallel is with categorization of concepts .… A word pair, more strictly a pair of word senses, may likewise support more than one relation. A relation need not to give equal weight to all aspects of the meaning of the two words. Relations typically emphasize some aspects and ignore others.

An example analyzed by different authors is ‘kitchen-refrigerator’ (Chaffin & Hermann 1988; Iris et al., 1988; Winston et al., 1987). It has been viewed as:

Page 23: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

212

integral object-component, when the most im-portant aspect of the refrigerator is consid-ered to be its function in relation to the kitchen (position shared by most of the au-thors);

mass-portion, when the important feature to fo-cus on is size, e.g., in those situations where small kitchens in contrast to large refrigera-tors are considered (this attribution seems, however, too circumstantial);

area-place, when the focus is on the occupied spatial area in relation to the kitchen.

In particular, the possibility of interpreting a word-pair either focusing on the component function and the whole or on the spatial relation occurring between them, pertains, indeed, also to other cases concerning, for example, body structures and geographical items. Remembering that a component (normally) plays a functional role in relation to an integral object taken as a whole but is separable from it, and that, instead, a place is not in this same relation to the area, but is rather a spatial and inalienable part of it, not always these criteria are easily applicable. A refrigerator nor-mally stands in a kitchen (although it is not an insepa-rable part of it). From the viewpoint of a kitchen, re-frigerators are functional but ‘optional’ parts since it is possible for a kitchen to lack a refrigerator (Cruse 1986). From the point of view of the refrigerator, however, its functional role can be considered apart from its relation with a kitchen (though this is its usual location). Its function, in fact, i.e. ‘to store food (or other products) at a low temperature’ seems more in relation to ‘what’ (to store) than to ‘where.’

This is quite different from the relation, for ex-ample, between ‘handle’ and ‘cup’ where the func-tional role of the handle applies only if it is attached to the cup (of which it constitutes a ‘canonical’ part) and only in relation to that whole. It is interesting also to know that while they regard a refrigerator as being (normally) a functional part of a kitchen, Winston et al. (1987, 433) consider, instead, this lat-ter as “merely a place within a house, not a compo-nent of the house” (in other words, ‘house—kitchen‘ is an example of area—place kind). Yet, this attribu-tion seems to be rather problematic (who would live in a house without a kitchen?).

Summarizing, in our interpretation, neither ‘kitchen-refrigerator’ (where a refrigerator is separa-ble from a kitchen and has a ‘partial’ functional role in relation to it, in the sense that it has a kitchen primarily as its usual functional location), nor pairs

like ‘house-kitchen (where the part is not separable from but has a functional role in relation to the whole) seem to fit entirely in one of Winston et al.’s categories and can be, also for this reason, differently classified. This is not only a possible flaw of the tax-onomy, but it may also derive from the fact that the complexity of the matter seems to require descrip-tions based on different perspectives in order to ob-tain a fuller view. This case seems also to underline the need for more fuzzy-boundaried relational cate-gories: many situations could be more easily classi-fied if conceived as part of a continuum between the two discussed categories.

What has been discussed in this paragraph fur-nishes, obviously, only some preliminary remarks on this topic. However, to conclude, we may affirm that, while a more elaborated structure can contrib-ute to decrease the level of arbitrariness in the im-plementation of thesaurus relations, and this of course is highly recommendable, there is no guaran-tee that only one valid set of relations exists or that the implementation of more specific relations can provide consistent results in all situations. The her-meneutic principle mentioned in the discussion about classification is, in fact, still relevant, since dif-ferent choices can be made according to different perspectives and in order to satisfy the needs of dif-ferent domains and operational contexts.

5. Conclusion

A thesaurus is a tool which semantically organizes a domain of knowledge for operational purposes. Its relational semantics is concerned with methods to connect terms with related meanings and designed to support information indexing and retrieval. With fo-cus on hierarchical relations, different aspects of the relational semantics of thesauri as well as the possi-bility to develop richer structures by differentiating standard relationships into subtypes have been ana-lyzed. We have also examined how semantic issues are implied in thesaurus construction. From a certain viewpoint, a thesaurus relational structure may be regarded as a system providing the representation, for operational purposes, of the meanings of the terms contained in the thesaurus. Thus, theories of semantics, which hold different perspectives about the nature of meaning and how it is represented, af-fect the way in which the relational semantics of the-sauri is designed.

In traditional approaches to knowledge organiza-tion the influence of logical positivism has played a

Page 24: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

213

significant role. And this is also reflected in the cur-rent trend towards an increase of formalism and stan-dardization. The search for a more refined relational semantics in thesauri has arisen from this same fra-mework and, according to its advocates, holds the promise to eliminate much of the ambiguity problems.

In our opinion, while it is likely that this field of study will bring valuable results in terms of an im-provement of the methodological basis and of a mo-re consistent application, different ways of interpret-ing meanings and of establishing semantic structures (and thus of organizing knowledge) will continue to be developed, on the basis of different paradigms, domains and operational contexts. Thus, if stan-dardization might be justified in given operational frameworks other solutions should be explored, too. The usefulness of static and monolithic structures is, in fact, rather limited. Tools are, instead, needed that are capable of representing the universe of knowl-edge domains and structures in its complexity (and also flexible enough to incorporate the continuous changes in languages and meanings, not mentioning how all of this is affected by the development of technology), in order to facilitate access to its con-stitutive elements (concepts) that are the true object of searching.

Therefore, it is important to consider which con-tributions may derive from theoretical positions such as those based on hermeneutics and those based on Wittgenstein’s view of language and meaning, which are more inclined to value such complexity (in terms of diversity of perspectives, contexts, rules, etc.). The possibility of their application in thesaurus design and other IR issues has been illustrated, even if this topic needs to be further investigated.

References

Ahmad, Khurshid and Fulford, Heather. 1992. Se-

mantic relations and their use in elaborating termi-nology. Computing Science reports CS-92-7. Sur-rey: University of Surrey.

American Library Association (ALA), Subject Analy-sis Committee, Subcommittee on Subject Relation-ships/Reference Structures. 1997. Final report to the ALCTS/CCS Subject Analysis Committee. http:// www.ala.org/ala/alctscontent/catalogingsection/ catcommittees/subjectanalysis/subjectrelations/ finalreport.htm (consulted: 15.09.2007).

Andersen, Jack and Christensen, Frank Sejer. 1999. Wittgenstein and indexing theory. In Albrechtsen, Hanne and Mai, Jens-Erik eds. Advances in classi-

fication research. Proceedings of the 10th ASIG SIG/CR Classification Research Workshop vol. 10. Medford, NJ: Information Today, pp. 1-21.

Chaffin, Roger and Herrmann, Douglas J. 1988. The nature of semantic relations: a comparison of two approaches In Evens, MarthaWalton, ed. Rela-tional model of the lexicon, representing knowledge in semantic networks. Studies in natural language processing. Cambridge: Cambridge University Press, pp. 249-94.

Chaffin, Roger, Herrmann, Douglas J., and Winston, Morton. 1988. An empirical taxonomy of part-whole relations: effects of part-whole type on re-lation identification. Language and cognitive proc-esses 3: 17-48.

Cruse, D. Alan. 1986. Lexical semantics. Cambridge: Cambridge University Press.

Dextre Clarke, Stella G. 2001. Thesaural relation-ships. In Bean, Carol and Green, Rebecca, eds. Re-lationships in the organization of knowledge. Dordrecht: Kluwer, pp. 37-52.

Eco, Umberto. 1983. L’antiporfirio. In Vattimo, Gianni and Rovatti, Pier Aldo, eds., Il pensiero de-bole. Milan: Feltrinelli, pp. 52-80.

Fischer, Dietrich. 1998. From thesauri towards on-tologies? In el-Hadi, Mustafa, Maniez, Jacque, and Pollitt, Stephen A. eds., Structure and relations in knowledge organization: Proceedings of the 5th In-ternational ISKO Conference. Würzburg: Ergon, pp. 18-30.

Foskett, Douglas J. 1980. Thesaurus. In Kent, Allen, ed., Encyclopedia of library and information sci-ence, vol. 20. New York: Marcell Dekker, Inc., pp. 416-63.

Fugmann, Robert. 1993. Subject analysis and index-ing. theoretical foundation and practical advice. Frankfurt/Main: INDEKS Verlag.

Gadamer, Hans-Georg. 1976. Truth and method, trans. G. Barden and J. Cumming from the 2nd German ed. London: Sheed and Ward.

Gerstl, Peter and Pribbenow, Simone. 1995. Midwin-ters, end games and body parts: a classification of part-whole relations. International journal of hu-man-computer studies 43. 865-89.

Hjørland, Birger. 1998. Information retrieval, text composition, and semantics. Knowledge organiza-tion 25: 16-31.

Hjørland, Birger. 2007. Semantics and knowledge organization. Annual review of information science and technology 41: 367-405.

Hjørland, Birger Pedersen, Karsten Nissen. 2005. A substantive theory of classification for informa-

Page 25: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 F. Mazzocchi, M. Tiberi, B. De Santis, and P. Plini. Relational Semantics in Thesauri

214

tion retrieval. Journal of documentation 61: 582-97.

International Standards Organization (ISO).1986. ISO 2788: Documentation—guidelines for the es-tablishment and development of monolingual thesauri. 2nd ed. Geneva: ISO.

Iris, Madelyn A., Litowitz, Bonnie E. and Evens, Martha Walton. 1988. Problems of the part-whole relation. In Evens, Martha Walton, ed., Relational model of the lexicon. representing knowledge in se-mantic networks. Studies in natural language proc-essing. Cambridge: Cambridge University Press, pp. 261-88.

Kuhn, Thomas S. 1970 The structure of scientific revo-lutions. 2nd ed. Chicago: University of Chicago Press.

Kuhn, Thomas S. 2000 The road since structure: phi-losophical essays, 1970-1993, with an autobio-graphical interview. Conant, James and Hauge-land, John, eds. Chicago: University of Chicago Press.

Maniez, Jacques. 1988. Relationships in thesauri: Some critical remarks. International classification 15. 133-38.

Mazzocchi, Fulvio & Plini, Paolo. 2005. Thesaurus classification and relational structure: the EARTh experience. In Madsen, Bodil Nistrup and Thomsen, Hanne Erdman, eds., Terminology and content development. Proceedings of the 7th Inter-national conference on Terminology and Knowledge Engineering. Copenhagen pp. 265-78.

Milstead, Jessica L. 2001. Thesaural relationships. In Bean, Carol and Green, Rebecca, eds., Relation-ships in the organization of knowledge. Dordrecht: Kluwer, pp. 53-66.

National Information Standards Organization (NISO). 2005. ANSI/NISO Z.39.19.2005: Guide-lines for the construction, format and management of monolingual controlled vocabularies. Bethesda (USA): NISO Press.

Paling, Stephen. 2004. Classification, rhetoric, and the classification horizon. Library trends 52(3). 588-603.

Porphyry. Isagoge, Girgenti, Giuseppe ed.. 2004. Mi-lan: Bompiani.

Schmitz-Esser, Windfried. 1991. New approaches in thesaurus application. International classification 18: 143-47.

Smith, Barry, Ceusters, Werner, Klagges, Bert, Köhler, Jacob, Kumar, Anand, Lomax, Jane, Mun-gall, Chris, Neuhaus, Fabian, Rector, Alan L. and Rosse, Cornelius. 2005. Relations in biomedical ontologies. Genome biology, 6: R46.

Soergel, Dagobert. 1974. Indexing languages and thesauri: construction and maintenance. Los Ange-les: Melville Publishing.

Soergel, Dagobert. 1995. The Art and Architecture Thesaurus (AAT): a critical appraisal. Visual re-source 10. 369-400.

Soergel, Dagobert, Lauser, Boris, Liang, Anita, Fis-seha, Frehiwot, Keizer, Johannes and Katz, Stephen. 2004. Reengineering thesauri for new applications: the AGROVOC Example. Journal of digital information 4 issue. 4. Article No. 257. http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/ (consulted: 01.10.2007).

Sowa, John F. 2000. Knowledge representation. logical, philosophical, and computational foundations. New York: Brooks/Cole.

Svenonius, Elaine. 2000. The intellectual foundation of information organization. Cambridge, MA: The MIT Press.

Svenonius, Elaine. 2004. The epistemological foun-dations of knowledge representations. Library trends 52(3). 571-87.

Tudhope, Douglas, Alani, Harith and Jones, Christo-pher. 2001. Augmenting thesaurus relationships: possibilities for Retrieval. Journal of digital infor-mation 1, Issue 8, Article N.41. http://jodi.ecs . soton.ac .uk/Art ic les/v01/ i08/Tudhope/ (consulted: 01.10.2007)

Van Slype, Georges. 1976. Definition of the essential characteristics of thesauri. Prepared for the Com-mission of the European Communities. Bruxelles: Bureau Marcel van Dijk.

Winston, Morton E., Chaffin, Roger, and Herrmann Douglas J. 1987. A taxonomy of part-whole rela-tions. Cognitive science 11: 417-44.

Wittgenstein, Ludwig. 1953. Philosophical investiga-tions, trans. Gertrude Elizabeth Margaret Ans- combe. New York: Macmillan.

Page 26: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

215

Graphic Tools for Knowledge Representation and Informal Problem-Based Learning in Professional Online Communities

Guglielmo Trentin

Istituto Tecnologie Didattiche, Consiglio Nazionale delle Ricerche, Via De Marini 6, 16149 Genova, Italy, <[email protected]>

Guglielmo Trentin is with the Institute for Educational Technology (ITD) of the Italian National Re-search Council (CNR). His studies have largely focused on the use of ICT in formal and informal learning. In this field he has managed several projects and scientific activities, developing technological applications and methodological approaches to support networked collaborative learning. He is con-tributing editor of Educational Technology (USA) and member of the editorial board of the Interna-tional Journal of Technology, Pedagogy & Education (UK). Since 2002 he teaches Network Technology & Human Resources Development at the University of Turin, Faculty of Political Science. Trentin, Guglielmo. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning in Professional Online Communities. Knowledge Organization, 34(4), 215-226. 24 references. ABSTRACT: The use of graphical representations is very common in information technology and engineering. Although these same tools could be applied effectively in other areas, they are not used because they are hardly known or are completely un-heard of. This article aims to discuss the results of the experimentation carried out on graphical approaches to knowledge rep-resentation during research, analysis and problem-solving in the health care sector. The experimentation was carried out on conceptual mapping and Petri Nets, developed collaboratively online with the aid of the CMapTool and WoPeD graphic appli-cations. Two distinct professional communities have been involved in the research, both pertaining to the Local Health Units in Tuscany. One community is made up of head physicians and health care managers whilst the other is formed by technical staff from the Department of Nutrition and Food Hygiene. It emerged from the experimentation that concept maps are con-sidered more effective in analyzing knowledge domain related to the problem to be faced (description of what it is). On the other hand, Petri Nets are more effective in studying and formalizing its possible solutions (description of what to do to). For the same reason, those involved in the experimentation have proposed the complementary rather than alternative use of the two knowledge representation methods as a support for professional problem-solving. 1. Introduction

In the discussion group, when trying to best explain one’s viewpoint, oral communication is often accom-panied by simple diagrams drawn on the spot either on paper or on a board. One therefore gives a sort of conceptual image (van Lambalgen and Hamm 2001; Stokhof 2002; Wheeler 2006) of the portion of knowledge to be discussed. This in turn triggers a process involving explicit, implicit and tacit knowl-edge (Polanyi 1975; Nonaka and Takeuchi 1995). The same thing often occurs also during interaction among members of an online professional commu-nity. In this case though, instead of paper or boards,

ad hoc graphic editors are used which allow the online circulation of graphical representations as a support for collaborative interaction. This article, in particu-lar, will refer to two specific methods for the graphi-cal representation of knowledge (Concept Maps and Petri Nets) and related software applications.

2. Graphical Representations

Graphical representations are de facto a language of communication and, like any language, syntactic rules are needed for it to act as a medium in communica-tion between two or more individuals (Donald 1987). Hence, specific graphic languages have been defined

Page 27: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

216

and formalized that are geared towards knowledge representation (hierarchical representations, semantic networks, concept maps, approaches to the represen-tation of procedural knowledge, etc.). Their devel-opment has been given considerable impetus from the field of artificial intelligence and, more in general, from all those areas which have attempted “to capture in digital” knowledge domains. They are formally represented so that they can be used by specific soft-ware engines: see for example, intelligent systems, de-cision support systems, semantic webs (Bosch 2006) and simulation systems.

Thanks to their simplicity and effectiveness, some of these graphic languages later spread beyond the specific area from which they originated where their use was often more simplified and less rigorous (Trentin 1991), so that even non-specialists could capitalize on the basic concepts. The question is: when are these graphical representations useful for the professional communities? A first consideration regards their effectiveness in facilitating the multi-perspective study of a given knowledge domain and/ or area of exploration: a new knowledge, the solu-tion to a problem, the functionalities of a complex system. The representation of concepts through graphics amplifies, in the eyes of the interlocutors, the existence of multiple interpretations of one sub-ject of study or debate (Cunningham 1991). A sec-ond consideration concerns the community’s need for technological aids to improve the flow and or-ganization of community knowledge (Shipman 1993; Prusak 1994; Haldin-Herrgard 2000).

We are aware the knowledge sharing processes (theoretical and procedural) are favored by two types of technological support: one for interpersonal com-munication and the other for the collection and man-agement of information and knowledge (Auger et al. 2001). Both cases need to give a conceptual schematic representation of the knowledge domain of reference (or portions of it) for a given community. Graphical representations can give an inside view of the concep-tual interconnections between elements making up the knowledge that is being discussed and shared. It is therefore an effective way to facilitate the communi-cation of conceptual images as well as the semantic organization of informative, documentary and factual material contained in the community memory (Lave and Wenger 1991). This last aspect is particularly in-teresting as many research engines now use concep-tual representations of the knowledge domain in which they work for the selective recovery of infor-mation (for example http://www.webbrain.com).

Before dealing with the experimentation which is the subject of this article, details of the two underly-ing representation tools of knowledge are summa-rized here below.

3. Concept Maps

A concept map is a coherent visual logical represen-tation of knowledge on a specific topic which en-courages individuals to direct, analyse and expand their analytical skills (Novak and Wandersee 1991; Halimi 2006). The approach was developed by J.D. Novak (1991) based on Ausubel’s theories (1963; 1968) and Quillam’s studies on semantic networks (1968). Concept maps use diagram representations which highlight meaningful relationships between concepts in the form of propositions, also called se-mantic units, or units of meaning. A proposition is the statement represented by a relationship connect-ing two concepts. Therefore, there are two basic fea-tures used to construct concept maps: concepts and their relationships (Figure 1).

Figure 1. Example of a concept map drawn with CMapTool

Besides the two basic features, a concept map is then characterized by hierarchical relationships between concepts and by cross-links between concepts be-longing to different domains of the same map.

Various graphic tools for editing concept maps have been developed and the dialogue window in Figure 1 shows of one of the best-known: CMap-Tool (http://cmap.ihmc.us/). Many of these envi-

Page 28: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

217

ronments are able to link the different concepts to a variety of items (documents, images, films, URLs, other concept maps) with the possibility then of converting them into HTML format, thereby creat-ing structured repositories that can be accessed online. This, for example, is one of the possible ways to organize an online community’s shared memory.

Designing concept maps with these software ap-plications is very simple and here, for example, is how one can work with CmapTool:

– after opening a new map and double clicking on

the white area, the starting concept may be de-fined (Figure 2a);

– by clicking and dragging the arrow one can create a link between a new concept and the starting concept (Figure 2b);

– then the two concepts and the relation type link-ing them have to be described (Figure 2c).

Figure 2a. The starting concept

Figure 2b. The link between two concepts

Figure 2c. Description of concepts and relation type By proceeding in such a way, it is possible to obtain graphical representations like the one reported in Figure 3 showing a maps produced during the ex-perimentation described here.

When very complex knowledge domains have to be described, such as the Clinical Audit in Figure 3, the corresponding concept maps tend to become much larger and difficult to manage. For this reason, CMapTools provide a function to compress/explode sections of the map being drawn. For example, by clicking on the symbol “>>” that appears to the right of “evidence-based practice”, the map linked to that concept expands (see Figure 4). Then clicking on the symbol “<<” will take you back to Figure 3.

4. Petri Nets and Procedural Knowledge

Representation

Petri Nets provide an effective way to describe and analyze models, whether complex systems, processes, knowledge domains, etc. (Peterson 1981). On account of this characteristic, they are often used in the graphical representation of procedural knowledge.

4.1. Resources and activities

A Petri Net is an oriented graphic in which two node types are represented: resources (indicated with cir-cles in Figure 5) and activities (indicated with seg-ments)—in literature on Petri Nets these nodes are respectively called places and transitions (Peterson 1981). A graphic arc that is directed from a resource to an activity indicates that the resource is necessary to carry out that activity. Similarly an arc that is di-rected from an activity to a resource indicates that the resource is the product of the same activity.

Page 29: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

218

Figure 3. A concept map on the Clinical Audit developed with CMapTool

Figure 4. Example of a complex concept expansion

Page 30: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

219

Figure 5. An example of Petri Net

What has just been listed are, so to speak, the basic “ingredients” to give shape to Petri Nets according to the use suggested within the experimentation re-ferred to here. In actual fact, the theory presupposed by the Petri Nets is much more articulated and rig-orous (Peterson 1981). In our case only the key con-cepts have been used to enable the two communities involved to assess the general philosophy governing the specific approach.

Figure 6. Example of environment to edit and implement

Petri Nets

Just as for concept maps, ad hoc software envi-ronments have been developed also in the case of the Petri Nets. By way of example, Figure 6 shows the dialogue screen of one of these environments, spe-cifically that of WoPeD (Workflow Petri Net De-signer— http://www.woped.org/).

The features of such applications not only provide an editing environment of Petri Nets, but also check syntax functions and simulation of proce-dures/systems that they describe.

4.2. Successive refinements (top-down expansion)

Starting from an initial Petri Net - in attempting to describe the process/procedure or knowledge do-main with even greater precision - activities, re-sources and links are often increasingly added. This therefore produces very complex graphs that are hard to process and read. A good method to over-come this drawback is to describe the network through successive refinements (or stages), expand-ing it using a top-down approach (Trentin 1991). In the first stage an overall (undetailed) representation is given of what one wants to describe. The resources and main activities are reported together with their respective interconnections. In the same network the complex activities are then highlighted that will be described in more refined detail in a specific sub-network. See, in Figure 6, activity “AC development” represented with a grey square.

The following stage involves developing the re-finement sub-networks giving a detailed description of the more complex activities. For example, Figure 7 reports the refinement of activity “AC develop-ment” shown in the Petri Net of Figure 6.

The refinement process is iterated until the de-sired level of detail given to the representation is at-tained.

The refinement activity is a consequence of the need to foster the so-called “functional abstraction” (Stein 2002), the process through which the atten-tion of the individual or whole group/community focuses on one aspect of what is being described at a time.

This is a process developed stepwise. It begins with an overview of the subject matter, such as a profes-sional issue, where the key elements characterizing it are identified (macro-representation of the domain). In the following steps, each key element is isolated and described in more detail by breaking it down into less complex sub-elements (for example, a complex activity is broken down into sub-activities). This is

Page 31: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

220

done by trying to abstract as much as possible from what is within the confines of the element that is considered one by one (the other elements), to guar-antee maximum success of its specific analysis.

Should this refinement step be inadequate for a deep analysis of the element being dealt with, the re-finement process is iterated until the level of detail is considered the most functional to reach the final ob-jective (analyzing a situation, solving a problem, de-scribing a complex system).

5. Research Issue

The use of graphical representations is very popular in information technology and engineering. Al-though the same tools could be applied effectively in other areas, they are not though since they are not well known or are completely unheard of. This is due to study curricula and/or training courses where there is no occasion to learn these techniques and technologies since they are not considered important for a given disciplinary/professional area.

This is the reason why - within the two specific projects aimed at fostering the launch and develop-ment of professional communities in the health care sector - research was carried out on the use of graphical approaches to professional knowledge rep-resentation. The aim was to analyze and discuss their actual usability and effectiveness in fostering col-laborative interaction, debate and reciprocal clarifica-tion during a process geared towards examining a specific professional theme/issue.

6. Experimental Setting

Two distinct professional communities have been in-volved in the research. The first (Audit community) was made up 31 head physicians and health care managers pertaining to Local Health Unit 11 of Livorno (Tuscany Region) who had the task of deal-ing with the theme of Clinical Audit, the key ele-ments characterizing it and the working methods to carry it out. The second (Alert community) formed by 18 technical staff from the Department of Nutri-

Figure 7. Example of refinement derived from Figure 6

Page 32: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

221

tion and Food Hygiene coming from all the health care units in Tuscany. In their case, the task was to define the organization of a Regional Working Group on the problem of managing food alerts.

In both cases, as already mentioned, concept maps and Petri Nets have been proposed as methods for graphical representations of knowledge. The devel-opment of each graphical representation has been di-vided into three stages:

– a face-to-face meeting for the first familiarization

with the graphic approach and the related editing software;

– two weeks of online collaborative activities in sub-groups;

– a closing meeting to evaluate and compare the graphical representations produced, and to discuss the online collaborative process implemented to produce them.

The participants were divided into sub-groups of 5-6 units and were asked to structure their work into two one-week periods:

– individual drawing up of one’s draft of the graphi-

cal representation; – sharing of graphical representation and conver-

gence towards one single sub-group version of it.

To co-construct the two representations the follow-ing applications have been used:

– CMapTool (http://cmap.ihmc.us/) and WoPeD

(Workflow Petri Net Designer) (http://www .woped.org/) respectively for the development of concept maps and Petri Nets;

– Moodle (http://moodle.org/) as environment to run interpersonal group communication.

7. Methodology

At the end of the collaborative activity, the partici-pants were given a questionnaire divided into 4 sec-tions:

A. Learnability, intended to pinpoint the times and

possible learning difficulties of the approaches to the formal representation of knowledge used in the experimentation.

B. Study and/or problem-solving, intended to re-search the perception of the general usefulness of the tools proposed for the study activities, analy-sis and search for solutions.

C. Usefulness on an individual level in one’s own pro-fessional practice, intended to research the per-ceived usefulness of tools proposed in relation to an individual use in one’s own professional prac-tice.

D. Usefulness in facilitating collaborative group work, intended to discover the perceived usefulness of tools proposed in fostering or not fostering group work when dealing with aspects related to their own professional practice.

In the questionnaire, two questions are associated with each survey indicator: one with a closed-ended answer based on attributing a score (on the Likert 1-5 scale); the other with an open-ended answer asking to explain the attribution of the above-mentioned score or to give further information about the same indica-tor. 25 participants belonging to the Audit commu-nity and 16 to the Alert community answered the questionnaire anonymously.

8. Results

The survey data revealed positive evaluations regard-ing the professional use of proposed graphic formal-ization methods. However, there were various and sometime considerable differences between what was expressed by the two communities. This likely to be related to the different roles covered by the respec-tive individuals: on the one hand, positive but lower scores were given by the Audit community made up mainly of people with a managerial role; on the other hand, higher scores were assigned by the Alert community made up of staff with a more technical role. A more analytical examination of the partici-pants’ answers is provided in the next section.

8.1. Learnability

As shown by Table 1, both groups stated that they found it more difficult to enter the logic of the Petri Nets than the concept maps.

Learnability Audit Alert

How easy has it been for you to master the logic and syntax of the concept maps?

3,1 3,7

How easy has it been for you to master the logic and syntax of the Petri Nets?

2,6 2,8

Table 1. Average data relating to answers on learnability

Page 33: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

222

It is a fairly common reaction, met in other similar experimentations (Trentin 1991; Stein 2002), and should be related to the greater effort of abstraction (and of dissection) that the top-down development of a Petri Net requires. The free answers given by the participants show how the use of concept maps seems to best mirror their way of coping with pro-fessional problems i.e. considering the elements characterizing them all together and simultaneously. The use of the Petri Nets, with a top-down ap-proach, generally baffles the professional not used to functional abstraction mechanisms which are more familiar in information technology and engineering.

This was confirmed by directly observing the par-ticipants’ first approach towards elaborating a Petri Net where individuals tended to draw a very detailed, and therefore complex graph already at the overview stage of the knowledge domain. Some open answers given by participants pointed out, among the prob-able causes of difficulties, how they are used to a se-quential approach to analyzing problems which is closer to the logic of flow-charts (used occasionally by some of them) than to the logic of top-down.

8.2. General usefulness for study activities, analysis and problem-solving

To best understand the convergences and diver-gences expressed by the participants on this point, we will firstly make a quantitative comparison of the average scores assigned by the two communities and then summaries the usefulness of the two ap-proaches in relation to every single activity indicated in the questionnaire.

8.2.1. Quantitative comparison of the scores

assigned by the two communities

As can be observed in Figure 8, the trends of average scores attributed by the two communities are fairly similar even though they are quantitatively different. The only divergence that is rather noticeable corre-sponds to the use of concept maps for study activi-ties. In this regard, 8 members of the Audit commu-nity justified the low score claiming that drawing up a concept map on a given topic can be done only if one already has sufficient knowledge about it. They

Figure 8. Quantitative comparison between the average scores assigned by the two communities in relation to the usefulness of graphical representations in their profession

Page 34: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

223

therefore think that the use of the concept maps can be more useful as a self-check tool of one’s learning than as an aid to studying (at least the basics). On the other hand, the rather high score attributed by the Alert community should be related to their idea of using the concept maps as a tool to support the collaborative study processes.

8.2.2. Summary on the different usefulness

of the two approaches

Apart from the deviation between the quantitative evaluations formulated by the two groups and the above-described divergence, from the graph in Fig-ure 8 it can be deduced that:

– the graphical representations are considered useful

particularly for analysis and problem-solving ac-tivities and less useful for study activities. The evaluation of the Alert Community is an excep-tion to this in correspondence with the use of concept maps;

– both communities showed concordance (despite at-tributing rather different average scores) in evaluat-ing that the use of the concept maps are more rec-ommended in analysis activities whilst that of the Petri Nets in problem-solving activities.

To sum up, the participants indicate that the concept maps are more useful in describing “what it is” whilst the Petri Nets in describing “what to do to.”

8.3. Usefulness of graphical representations on a per-

sonal and group level

After the general considerations, described in the previous sections, participants were asked to evaluate the perceived usefulness of the two graphic method-ologies as a tool for both personal and group use in their professional practice. Here are their evalua-tions:

Personal usefulness of graphical representations Audit Alert

How much do you think Concept Maps can/could be useful in your pro-fessional practice?

3,3 3,8

How much do you think Petri Nets can/could be useful in your profes-sional practice, for the representation of procedural knowledge?

3,3 3,3

Personal usefulness of graphical representations Audit Alert

How much do you think Petri Nets can/could be useful in your profes-sional practice, to describe complex situations/systems?

3,2 3,6

Table 2. Average data relating to the personal usefulness of graphical representations

As can be seen, both communities gave between av-erage and high average scores regarding the personal usefulness of graphical representations.

The attitude changes when instead the same tools are considered for collaborative group activities.

Usefulness of graphical representations in group work Audit Alert

How much do you think Concept Maps can/could be useful in group work?

3,7 4,1

How much do you think Petri Nets can/could be useful in group work, for the representation of procedural knowledge?

3,8 3,8

How much do you think Petri Nets can/could be useful in group work, to describe complex situations/systems?

3,7 3,9

Table 3. Average data relating to the usefulness of graphical representations in group work

A comparison between Table 2 and Table 3 shows how the participants underline how graphical repre-sentations are more useful in group work than in in-dividual work. Here, both communities have shown a certain convergence of opinion, although there are the usual deviations in average values.

From the diagram in Figure 9 it is interesting to ob-serve how there is an appreciable divergence between the two communities regarding the usefulness of the Petri Nets. The Audit community believe they are more effective for representation activities of proce-dural knowledge. On the other hand the Alert com-munity consider them more useful for those activities connected to the description/analysis of complex sys-tems. This is for both individual and group activities. Again, the divergence of opinion is likely to be related to the members’ role within the two different com-munities in the respective local health units.

Page 35: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

224

9. Conclusions

Perhaps the most interesting result emerging from the research is the idea of combining the use of the two graphic tools for professional problem-solving activities. In particular, as the participants indicate explicitly in some answers, the concept maps are be-lieved to be more effective in analyzing the knowl-edge domain related to the problem to be faced. On the other hand, the Petri Nets are thought to be more effective in studying and describing the proce-dures to solve the very problem.

Indeed this is confirmed by the typical stages characterizing problem-solving strategies (Heller and Reif 1984; Gick 1986):

1. analysis of reference scenario related to the prob-

lem; 2. description of what is already known regarding

the specific problem; 3. formalization of the problem and of its possible

breakdown into sub-problems; 4. identification of actions to undertake to provide a

solution to the problem and/or individual sub-problems where it can be broken down;

5. identification of necessary resources to carry out actions determined in the previous point

As can be observed, in the high stages (see points 1-2), where the question is to define the problem in terms of “what is it”, the concept map would in fact appear to be the most suitable tool. In the successive stages (3-4-5), the Petri Nets would instead have the advantage of favoring the procedural description of “what to do to”, at a macro level (solution overview) as well as micro level (solution details to sub-problems comprising the general problem).

With regard to the procedural representation of knowledge, it is worth pointing out how some par-ticipants found Petri Nets more effective than flow-charts in describing processes/solutions. This is due to at least two reasons:

– because besides indicating the link between activi-

ties characterizing a process, Petri Nets require the necessary resources for their development to be defined (flow-charts focus only on the state-ments);

– the top-down refinement helps focus step by step on the specific parts of the process and therefore

Figure 9. Comparison between the average scores assigned by the two groups regarding the usefulness of graphical representations respectively for individual and collaborative use

Page 36: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

225

avoids managing the complexity of what is being studied/analysed with just one graphical represen-tation.

These are a fairly interesting conclusions that could lead to new developments in researching technologi-cal solutions to support the integration of the two methods of formal knowledge representation dis-cussed here. The solutions need to be able to offer, through the same software environment, support functions to the conceptualization and to the proce-duralization in problem-solving activities.

These activities, as is known, provide the ideal op-portunity to trigger informal peer-to-peer learning processes which are typical in online professional communities.

References

Augier, Mie, Shariq, Syed Z. and Vendelø, Morten T.

2001. Understanding context: its emergence, transformation and role in tacit knowledge shar-ing. Journal of knowledge management 5: 125-36.

Ausubel, David P. 1963. The psychology of meaningful verbal learning. Grune and Stratton: New York.

Ausubel, David P. 1968. Educational psychology: a cognitive view. Holt, Rinehart & Winston: New York.

Bosch, Mela. 2006. Ontologies, different reasoning strategies, different logics, different kinds of knowledge representation: working together. Knowledge organization 33: 153-59.

Cunningham, Donald J. 1991. Assessing construc-tion and constructing assessments: a dialogue. Educational technology 31(5): 38-45.

Donald, Janet G. 1987. Learning schemata: methods of representing cognitive, content and curriculum structures in higher education. Instructional sci-ence 16: 187-211.

Gick, Mary L. 1986. Problem-solving strategies. Educational psychologist 21: 99-120.

Haldin-Herrgard, Tua. 2000. Difficulties in diffusion of tacit knowledge in organizations. Journal of in-tellectual capital 1: 357-65.

Halimi, Sonia. 2006. The concept map as a cognitive tool for specialized information recall. In A. J. Cañas and J. D. Novak, eds., Concept Maps: The-ory, Methodology, Technology: Proceedings of the Second International Conference on Concept Map-ping. San José, Costa Rica: Universidad de Costa Rica, Sección de Impresión del SIEDIN, pp. 213-222.

Heller, Joan I. and Reif, Frederick. 1984. Prescribing effective human problem-solving processes: prob-lem description in physics. Cognition and instruc-tion 1: 177-216.

Lave, Jean and Wenger, Etienne. 1991. Situated learn-ing: legitimate peripheral participation. Cambridge University Press.

Nonaka, Ikujiro and Takeuchi, Hirotaka. 1995. The knowledge-creating company: how Japanese compa-nies create the dynamics of innovation. Oxford University Press: New York.

Novak, Joseph D. 1991. Clarify with concept maps. The science teacher 58(7): 45-49.

Novak, Joseph D. and Wandersee, Jim, eds. 1991. Special Issue on “Concept Mapping” of Journal of research in science teaching 28 (10). New York: Wiley.

Peterson, James L. 1981. Petri net theory and the modeling of systems. Prentice-Hall, Inc.: Engle-wood Cliffs, N.J.

Polanyi, Michael. 1975. The tacit dimension. Univer-sity of Chicago Press: Chicago.

Prusak, Laurence. 1994. How virtual communities en-hance knowledge, Knowledge@Wharton. Retrieved from: http://www.knowledge.wharton.upenn .edu/articles.cfm?catid=7&articleid=152.

Quillian, M. Ross. 1968. Semantic memory. In M. Minsky (ed), Semantic information processing. MIT Press: Cambridge, pp.216-70.

Shipman, Frank M. 1993. Supporting knowledge-base evolution with incremental formalization. Technical report CU-CS-658-93, Department of Computer Science, University of Colorado, USA.

Stein, Benno. 2002. Design problem-solving by func-tional abstraction. Retrieved from: http://www-is.informatik.uni-oldenburg.de/~sauer/puk2002/ papers/stein.pdf.

Stokhof, Martin J.B. 2002. Meaning, interpretation, and semantics. In D. Barker-Plummer, D. Beaver, J. van Benthem and P. Scotto di Luzio, eds, Words, proofs, and diagrams. Stanford, CA: CSLI Press, pp. 217-40.

Trentin, Guglielmo. 1991. Description of problem solving using Petri Nets. Proceedings of the XXVth AETT International Conference, “Realizing Hu-man Potential”, AETT (Aspects of Educational and Training Technology), Roy Winterburn ed, v. 24. London: Kogan Page, pp. 122-28.

van Lambalgen, Michiel and Hamm, Fritz. 2001. Mo-schovakis’ notion of meaning as applied to linguis-tics. Retrieved from: http://staff.science.uva.nl/ _michiell.

Page 37: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 G. Trentin. Graphic Tools for Knowledge Representation and Informal Problem-Based Learning

226

Wheeler, Thomas J. 2006. Collaborative multidisci-pline/multiscale analysis, modeling, simulation and integration in complex systems. In Marina L Gavrilova et. al., eds., Computational science and

its applications: ICCSA 2006: International Con-ference, Glasgow, UK, May 8-11, 2006: Proceed-ings. Berlin/Heidelberg: Springer, 654-664.

Page 38: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

227

The Immediate Prospects for the Application of Ontologies in Digital Libraries

Jody L. DeRidder

Digital Library Center, James D. Hoskins Library, University of Tennessee, Knoxville, Tennessee. USA, <[email protected]>

Jody L. DeRidder received her M.S. in Computer Science from the University of Tennessee in 2002, after developing repositories for the Open Archives Initiative in its alpha phases. As the lead develo-per for the Digital Library Center of the University of Tennessee Libraries, she has built, customized and altered software to create interoperable digital library systems which provide usability features beyond the norm. Nearing completion of her M.S. in Information Sciences, her research interests have turned to interoperability between systems to support usability, sustainability in digital libraries, and the application and use of ontologies via automated cross-mapping by query engines. DeRidder, Jody L. The Immediate Prospects for the Application of Ontologies in Digital Libraries. Knowledge Organization, 34(4), 227-246. 53 references. ABSTRACT: The purpose, scope, usage, methodology, cross-mapping and encoding of ontologies is summarized. A snapshot of current research and development includes available tools, ontologies, and query engines, with their applications. Benefits, problems, and costs are discussed, and the feasibility and usefulness of ontologies is weighed with respect to potential and cur-rent digital library arenas. The author concludes that ontology application potentially has a huge impact within knowledge management, enterprise integration, e-commerce, and possibly education. Outside of heavily funded domains, feasibility de-pends on assessment of various evolving factors, including the current tools and systems, level of adoption in the field, time and expertise available, and cost barriers. 1. Introduction: defining ontology

Each of us has a slightly different way of looking at the world. Across cultures and research areas, these differences become palpable. What is clearly under-stood within a community may be unknown else-where and technically specific terminology needs to be translated, as if to a different language, for the general user. For applications to be able to serve us in search and retrieval across all these variations, human knowledge needs to be made comprehensible to computer programs. Building an ontology re-quires capturing concepts (including implicit ones), the relationships between them, and any constraints on those relationships (de Bruijn 2003, 35). In tech-nical terms, an ontology represents a “language” of concepts, relations, instances and axioms (de Bruijn and Polleres 2004), which enable computer applica-tions to logically reason out solutions or adapt que-ries. Stanford University offers a sample ontology application which suggests wine selections for your

choice of food includes encoding examples and ex-planations (Hsu 2003). To illustrate an ontology de-scription of an object, a graphic example of an on-tology application to an audio tape of a performance of a single concerto (in the ABC ontology) is shown in figure 1 (Hunter, 2001).

1.1 Points to consider

For ontologies to be useful and feasible in digital li-braries, several requirements must be met. First, there must be evidence that they are helpful to users. Usefulness must outweigh the cost and effort of creation and maintenance. Here we must consider further the identification of our user audiences, and the purpose and scope of what we wish to accom-plish. Secondly, what is the state of the art? What parts of this territory have been mapped out, and what are still murky waters? Is there, or will there soon be, broad support for the use of cross-mapped ontologies? If the road is clear and support is avail-

Page 39: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

228

able, it behooves us to make our digital libraries ac-cessible via ontology mapping, to increase accessibil-ity, interoperability, and to leverage the work in the broader arena to meet our constituents’ needs. If it will be years before the path is paved, standards will likely change rapidly over that time. Those with the funding and the capability can lead the way, contrib-uting to the development of standards and interop-erability. If funding and capabilities are limited, it is wiser to wait till the paths are well-laid, and the pro-cess is easier. Thirdly, we need user-friendly tools and methodology. What are the steps? What person-nel and tools are needed? As the field is still clearly in the beginning stages, an overview of current re-search and development is provided for further in-vestigation. Finally, we must seriously consider the costs. What level of funds, personnel, and expertise are available?

1.2 Benefits

As systems grow in decentralized manners, semantic heterogeneity is inevitable; how do we provide func-tional search and retrieval across distributed digital libraries? Searching by keyword retrieves irrelevant information when a term has multiple meanings; and information is missed when multiple terms have the same meaning. In addition, concepts that may not be represented by the terminology in the document or metadata are not available to searchers. Information retrieval is a negotiation process, and as digital con-tent multiplies, users need assistance in wading through the results of their searches. A comparison

of precision and recall between full text searching, la-tent semantic indexing, and ontology-based retrieval (with manual assignment of concepts to query) finds ontologies capable of providing far better retrieval efficiency (Paralič and Kostial 2003).

Digital libraries routinely provide their services without human assistance; thus it is essential that their metadata be suitable for computation, support-ing inference. The reference interview is not avail-able; therefore, computer applications need to be able to reason about their contents to reformulate queries, deduce relations between works, and cus-tomize services to the task and user. This is only possible via ontologies (Weinstein and Birmingham 1998).

Imagine a user entering a query, and the computer application offers different meanings for the entered terms; the user selects the intended meaning, or chooses one of the related terms offered. The query engine transforms the query into a language that matches the terminology used in describing the data sources. In addition, it locates material related to your query, based on logical deduction and inference, offering these results on the side. In this manner, re-levance and pertinence are improved, and browsing is enabled. With ontologies, we enable computer appli-cations to perform intelligent searching instead of keyword matching, query answering instead of in-formation retrieval, and to provide customized views of materials. A standardized vocabulary referring to natural language semantics enables automatic and human agents to share information and interoperate functionally (Fensel et al.2003c).

Figure 1.

Page 40: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

229

1.3 Depth and breadth

There are many different ways to classify ontologies; two of the most useful reflect the depth and the breadth of the ontology. In the depth dimension, the specificity of the ontology determines its “weight.” Lightweight ontologies are little more than taxono-mies, and include only concepts and their properties, relationships between those concepts, and controlled vocabularies. Heavyweight ontologies also include axioms and constraints that increase the capability of a computer application to logically reason with the data given. Dublin Core might be considered an ex-tremely light-weight ontology, whereas Cyc (created using the Knowledge Interchange Format, a proposed standard) may be the most extensive top-level ontol-ogy currently in existence (de Bruijn 2003, 6-9). (Two limited open-source versions of this encyclopedic on-tology are available: OpenCyc and Research Cyc.) In the breadth dimension, there are general (top-level, or global) ontologies, domain ontologies (specific to a particular area) and application ontologies, which describe concepts depending on the task as well as the domain (some refer to application ontologies as an-other form of domain).

1.4. Cross-mapping issues

In order to provide searching via natural vocabulary, a mapping is needed from the natural language of each user group to the entries in each metadata vo-cabulary. This is known as an “entry vocabulary in-dex” or EVI. In addition, to search across databases, it is necessary to have mappings between each possi-ble pair of system vocabularies, or ontologies. Map-

ping between ontologies must be done by people competent in both domains; the current status is that human assistance in mapping will likely be nec-essary for some time to come, for high quality map-pings (Bockting 2005).

Problems in cross-mappings can be of several ty-pes. Data objects of the same name may describe dif-ferent real-world elements; concepts may be ascribed to different levels of the metadata structures (an at-tribute in one ontology may be a class in another); conceptual approaches may preclude a functional correspondence; descriptions of a single real-world element may vary considerably and conflict with one another; and one of the ontologies may have incor-rect information (Adam, Atluri, and Adiwijaya 2000). A concept in one ontology may not exist in another, or may have an entirely different meaning. For example, in the Harmony Project, members of the closely-related domains of digital libraries and cultural heritage and museum communities sought to merge the digital library ABC Ontology (Lagoze and Hunter 2001), with the CIDOC (International Committee for Documentation of the International Council of Museums) Conceptual Reference Model (ICOM/CIDOC and CIDOC CRM 2005). They uncovered cultural biases particularly in terms of the nature of change; while both ontologies were con-cerned with change over time, one modeled the change of objects, while the other modeled changes in the context and meaning for those objects (Doerr, Hunter, and Lagoze 2003). A comprehensive over-view of the problem areas of mapping, including variation of expressiveness and the differing model-ing paradigms or styles, is discussed by Klein, and diagrammed in figure 2 (Klein, 2001).

Figure 2.

Page 41: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

230

An IMLS-funded effort (National Leadership Grant No. 178), based on prior research partially supported by a DARPA (Defense Advanced Re-search Projects Agency) contract, explored the feasi-bility of cross-mapping vocabularies of numeric data sets and text files (Buckland et al. 2007). It was dis-covered that the vocabularies for topical categoriza-tion vary greatly, requiring interpretive mappings be-tween systems, and that specification of geographical area and time period are problematic. Both names of places and of time periods are culturally based, un-stable, and ambiguous. The use of geospatial coordi-nates is suggested as the only effective method of re-lating locations to search terms, which means that both gazetteers and map visualizations become criti-cal to implement search retrieval in a user-friendly manner. A similar application needs to be developed for time periods, and this issue is being addressed in a subsequent IMLS-funded study by the Electronic Cultural Atlas Initiative (Electronic Cultural Atlas Initiative 2006). Among other objectives, the intent is to contextualize objects in library and museum collections by using or adapting existing and emerg-ing standards and protocols. This initiative is de-scribed further in (Petras et al. 2006).

Ontologies must be expected to evolve over time as knowledge and understanding grow, and termi-nology changes. Their mappings to other ontologies must also evolve, and this evolution may require change in other ontologies to which they are mapped (de Bruijn and Polleres 2004, 11). Thus the initial ef-fort to develop ontologies is insufficient; they must not only be maintained but also versioned over time, and compatibility with other ontologies considered with each evolution. Cross-mappings are rare, ex-pensive, time-consuming, and difficult to maintain. With 135 semantic types and 54 relationships, the Unified Medical Language System Metathesaurus is a notable example (Smith et al. 2004).

1.5. A bird’s eye view

It is insufficient to consider ontology mapping as a singular or only a local problem. Many differing on-tologies already exist with overlapping domains of knowledge and application (de Bruijn 2003). And there are at least three basic conceptual approaches to interoperability: a global ontology to which all lo-cal ontologies are mapped, a peer-to-peer system (where mappings exist between local ontologies where needed), and a combination of the two. A central, heavyweight global ontology is clearly pref-

erable for computer applications, as one-to-one map-pings of all involved ontologies does not scale. How-ever, obtaining global agreement on controlled terms and relationships is infeasible, so a layering approach based on generality is more likely to succeed, with mapping between domains and higher level ontolo-gies as needed (Meersman 1999). A single general light-weight ontology to be shared by multiple do-mains was explored by (Stuckenschmidt and van Harmelen 2005). After developing their framework, the authors stated that the shared ontology can only be developed if all sources of information are known, and the conceptualization of each source is accessi-ble; they concluded this was only feasible for a single domain (Stuckenschmidt and van Harmelen 2005, 249). De Bruijn and Polleres add that a limitation to this approach would be the likely lack of agreement on the interpretation of the concepts in the shared ontology by all the authors of local ontologies (de Bruijn and Polleres 2004).

Another possible middle ground between the peer-to-peer approach and the central core ontology method, would be to implement layers or a hierar-chical application (de Bruijn and Polleres 2004). One way to envision this is to compare a scientific disci-pline with a group of islands, where each area of re-search is an island, and each island has a further breakdown of specificity into “dialects.” If a single island had 3 dialects, each dialect would be a Level 1 ontology, probably the most specific in terminology. A shared ontology for the entire island would be a Level 2 ontology. Islands (or domains) could map to one another as needed. A shared ontology for the group of islands would be a Level 3 ontology, the most general so far. Other sets of islands could have similar structure, and again, the hierarchy could con-tinue as needed, but with a distributed, organically growing base rather than a single top-down applica-tion. This may be the only feasible solution, as it re-flects the grassroots approach and grows as needed.

2. State of the art

Currently, the semantic search engine Swoogle states that there are at least 10,000 ontologies in use on the WWW, and provides a list of ontology repositories, semantic web search engines and crawlers. The 2005 version of Swoogle indexed 337,182 documents, while the 2006 version currently lists their number of documents at 2,030,039 (Swoogle 2006), a major increase. This cursory comparison indicates a grow-ing interest in the implementation of ontologies.

Page 42: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

231

2.1 Within domains

Ontologies seem to have already found a home in in-structional technology, as an outgrowth of KOS (Knowledge Organization Systems). The primary difference is that ontologies apply logic to the rela-tions (Binding and Tudhope 2004). Other differ-ences are that existing KOS lack conceptual abstrac-tions, semantic coverage, consistency, and automat-able processing (Soergel et al. 2004). Ontologies are important to education because concepts and the re-lationships between them “provide a powerful, and perhaps the only, level of granularity with which to support effective access and learning” (Smith et al., 2004, 2). A portal already exists for sharing tools, projects, research and information for ontology use in education (Dicheva et al. 2006), and a commercial success in the education arena is Xyleme, which de-pends upon the existing heterogenous XML struc-ture in documents for pattern-matching, mapping, encoding, and creating “views” for abstract query re-sponse (Aquilera et al. 2000).

The Alexandria Digital Earth Protoype (ADEPT), currently in use for teaching geography courses at the University of California, employs an ontology to link the current lecture material to a graph showing its relation to other concepts, and also links to ex-amples from the digital library. All three views are presented at the same time, to give students the con-text and examples they need to make sense of what the teacher is trying to communicate. In addition, the ontology supports a Virtual Learning Environ-ment that lets the teacher create, use, and re-use learning materials in different fields of science and in various learning environments (Smith et al. 2004).

Yet here the content of the digital library itself is limited to examples, primarily images and graphs. For digital libraries containing complex materials, there exists the need for two levels of access: discovery of resources, and discovery within the resources, the lat-ter of which requires the creation of descriptions of semantic and internal structural organization through resource decomposition. The GREEN digital library project explored the problems and possibilities in this area, using term extraction algorithms, performing text analysis, and extending a combination of meta-data schemes (LOM for learning objects and MatML for materials). This group noted the need for a con-vergence of metadata schemes and robust mechanisms for navigating a complex associational web of re-sources (Shreve and Zeng 2003). Clearly, the ability to locate specific content, regardless of its location

within materials, would be extremely useful for isolat-ing information and minimizing the time spent sifting through search results. As the quantity of materials online explodes, findability becomes critical.

An example of ontology use in enterprise integra-tion would be the Unified Medical Language System (UMLS), which provides services for computer appli-cations across a multitude of health-industry areas. The UMLS Metathesaurus is a compendium and syn-thesis of more than 100 different thesauri, classifica-tions and code sets for health care, billing, statistics, medical literature, research and resources, and requires constant updating and renovation. The Metathesaurus preserves the many views present in the source vo-cabularies, as each may be useful for different tasks. Hence, it must be customized to be effective in any one application (U.S. National Library of Medicine, March 2006a). UMLS includes a Semantic Network to “provide a consistent categorization of all concepts represented in the UMLS Metathesaurus and to pro-vide a set of useful relationships between these con-cepts” (U.S. National Library of Medicine March 2006b). In addition, the SPECIALIST Lexicon pro-vides a general English vocabulary that includes bio-medical terms, for Natural Language Processing (NLP), to improve searchability for the general user (U.S. National Library of Medicine, March 2006c).

E-Commerce potential is clearly indicated in the level to which ontologies have already proven their value in critical government defense, finance, and manufacturing. An example in the business arena is Australia’s InfoMaster. In the United States, Ontol-ogy Works, founded in 1998 by former members of the intelligence community, currently serves the criti-cal needs of such clients as the U.S. Department of Defense, the U.S. Department of Justice, Science Ap-plications International Corporation, Boeing, North-rop Grumman, and the Sierra Nevada Corporation. Ontology Works is a highly successful commercial venture, and claims to have the most sophisticated on-tology-driven database on the market (Ontology Works, 2005). Another commercial success is Onto-broker, a deductive, object-oriented database system, now available via Ontoprise.

MOMIS (Mediator environment for Multiple In-formation Sources) has been used to model a tourism information provider system. In the MOMIS Integra-tion Methodology, local source schemata are ex-tracted. If the source material is unstructured, text is extracted, analyzed, and an XML schema is generated. Then a meaning for each element of the source schema is chosen from a lexical database of English,

Page 43: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

232

WordNet (prompts for choices are given to a human; the choice is manual). A common thesaurus, a global schema, and sets of mappings to local schemata are generated. Finally, a meaning is assigned (semi-automatically) to each element of the global schema. The query manager then rewrites the incoming global query as an equivalent set of queries to match the local source schemata; local sources are queried with these, and the resulting responses are fused and reconciled into a final response (Bergamaschi et al. 2005).

Exploration has been made into non-textual con-tent as well. Annotation of historical images with a domain-specific ontology enables users to retrieve images for which they inadequate historical knowl-edge and keywords (Soo et al. 2002). An Amsterdam research group has developed a Visual Ontology Us-ing MPEG-7 and WordNet, which supports descrip-tions of colors and shapes of objects, to support automatic annotation (Hollink et al. 2005). By ex-tracting and analyzing visual features, mapping clus-ters of sequences and patterns to ontological con-cepts, another experiment has demonstrated the fea-sibility of semi-automated ontology annotation of domain-specific videos (Bertini, et al. 2005). In a fourth model, audio tapes of sports broadcasts were annotated (Khan, McLeod, and Hovy 2003), though the text analyzed was extracted from the closed cap-tions that came with the audio objects. In this pro-ject, only three relations were modeled (isA, In-stance-Of, and Part-Of), and an automatic query ex-pansion mechanism was built using WordNet as a generic ontology, though they found it too incom-plete to functionally model the domain.

According to (Ontology Works 2005), the leading research groups in ontologies are IFOMIS (The In-stitute for Formal Ontology and Medical Informa-tion Science), ECOR (European Centre for Onto-logical Research), LOA (Laboratory for Applied Ontology), and NCOR (National Center for Onto-logical Research). Based on the number of recent on-tological projects, Stanford University’s Knowledge Systems, Artificial Intelligence Laboratory and the Sirma Group’s OntoText Semantic Technology Lab should perhaps be added to this list.

2.2. Across domains

One of the primary purposes of cross-mapping is to allow searching of heterogenous resources from a single interface. The Digital Government Research Center Energy Data Collection project used an ove-rarching ontology (SENSUS) to provide searching

across over 50,000 database tables, manually defining the domain model with 500 concept nodes, then mapping them with intentionally vague semantic meaning to the possible 70,000 nodes of the larger ontology. While much of the model building was au-tomated, it was far from simple to create a coherent domain model out of the variation of metadata and domain terms within the databases. The end product cannot support automated inference, but does enable browsing and non-expert searching with familiar terms (Hovy, 2003).

OntoMedia, an opensource effort, builds on the CIDOC Conceptual Reference Model and the IFLA- NET (International Federation of Library Associa-tions and Institutions) FRBR model (Functional Re-quirements for Bibliographic Records) to facilitate the annotation of semantic content of multimedia. It pro-vides the user with a graphical user interface with metadata indexing and search capabilities, for organiz-ing multimedia collections, though the ontology is presented as a general, high-level ontology for reuse across domains (Lawrence et al. 2005).

Semantic Interoperability of Metadata and Infor-mation in unLike Environments (SIMILE) is a joint project of MIT Libraries and MIT Computer Science and Artificial Intelligence Laboratory, which lever-ages and extends DSpace. The intent is to enhance general interoperability across distributed informa-tion stores of varying types, and to provide useful end-user services for mining that material (Leuf 2006, 223-4). In an early prototype of the project, VRA Core (Visual Resources Association Data Standards Committee) and IMS LOM (Learning Object Metadata) were translated into RDF schemas with enrichment obtained from Wikipedia and the prototype OCLC Library of Congress Name Au-thority Service. Then the datasets were transformed from XML to RDF/ XML using XSLT. While the developers were able to automate linkage of RDF datasets using string similarity techniques, the ap-proach was error prone and results had to be manu-ally reviewed. In addition, the enrichment techniques could be automated as well, but again, required hu-man intervention to verify the validity of the data produced (Butler et al. 2004).

3. Fundamentals

3.1. Methodology

A recent analysis of the state of ontology engineering bemoans a lack of guidance, unified methodology,

Page 44: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

233

cost benefit analysis tools, and selection support to choose engineering approaches (Simperl and Tempich 2006). Real-world applications require comprehen-sion of the scope and progression of the project, cus-tomizable workflows, user-friendly tools, and auto-mation of the majority of the tasks. While several on-tology management tools are relatively mature, many necessary ontology engineering activities are not yet adequately supported by technology, and critical as-pects, such as automation of ontology creation, appli-cation, and mapping are still being researched. The basic model for implementation of an ontology (without consideration of ontology mapping: see fig-ure 3) includes a feasibility study, domain analysis, conceptualization, encoding, maintenance and use (Simperl and Tempich 2006).

3.2. Purpose and scope

Before choosing, adapting, or creating an ontology, the purpose and the user audience must be deter-mined. If the domain is clearly delineated and there is no desire for interoperability or cross-mapping to outside ontologies, the scope and direction are sim-plified. If, however, the desired outcome is more di-verse and interoperable, the choices made in this as-sessment will be both critical and complex.

One unusual investigation tested the hypothesis that the more indexing is geared toward the user task, the better the results. Kabel, Hoog, Wielinga and Anjewierden (Kabel et al. 2004) compared the efficiency, effectiveness, precision of use, and quality of results when users were given access to keywords versus a domain index versus an instructional index, for creating lesson plans. The domain index was con-tent-based, with specific terminology. The instruc-tional index provided classification of objects by use in instructional material, and hence was task-oriented (an application ontology). An example of this would be a “behavioral description” with “spe-cific” scope, and the instructional role of “illustra-tion.” Their hypothesis was generally correct. The domain index provided more efficient, effective

search and retrieval than the keyword search, and the instructional index provided better precision than the use of keywords and domain indexing. Hence, it appears that we need to clearly understand the needs of our users, in order to choose the type of ontology that will actually provide the specificity they need for the task at hand.

ScholOnto (Shum et al. 2000), for example, is an effort to develop an ontology for discourse about re-search, rather than for the research itself, which is an

Figure 3. Ontology Engineering Activities

Page 45: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

234

interesting twist. Designed to provide an ontology for scholars to interpret, discuss, analyze and debate about existing literature, ScholOnto (developed us-ing OCML (Operational Conceptual Modeling Lan-guage) overlays existing metadata and does not at-tempt to directly describe the content of the re-search. Instead, the ontology provides a structure to clarify the intellectual lineage of ideas, their impact, scholarly perspectives on those ideas, inconsistencies in approaches or claims, and convergences of differ-ent streams of research (Shum et al. 2000, 3). Here, the comments about the literature become the ob-jects for retrieval and for building new structures to define the usefulness of the object. This is a social networking function, an interactive community-created layer over the research itself. This could be an invaluable way to add context and clarity to un-derstanding and exploration of a domain. Thus, the application of ontologies to digital libraries might not be in querying the documents themselves, but in building relationships and connections and social context around the documents.

3.3. Conceptualization

If ontologies exist that can be adapted to the pur-pose at hand, tools are needed to perform such adap-tation. If an appropriate ontology does not yet exist, tools are needed for modeling and constructing the ontology. Selecting or creating an ontology involves a fundamental tradeoff between the degree of com-plexity and generality versus the degree of efficiency of interpretation and reasoning within the language (Weinstein and Birmingham 1998, 35). Maximum consideration must be given to the desired services. The following findings are intended to provide a starting point for further exploration.

One possibility is that of creating ontologies out of existing metadata schemes or thesauri, adapting and adding as needed. The more complex and struc-turally coherent the metadata scheme, the more fea-sible this may be. One effort under development is an adaptation of the AGROVOC Thesaurus, devel-oped and maintained by the Food and Agriculture Organization of the United Nations (Soergel et al. 2004). An older effort to transform MARC (MA-chine Readable Cataloging) uncovered difficulties in the varying dimensions and multiple levels of granu-larity containing partial descriptions, which is a req-uisite feature of bibliographic data (Weinstein and Birmingham 1998). Another possibility is creating an ontology from scratch, using existing models to

pave the way. OCML (Operational Conceptual Mo-deling Language) supports the construction of on-tologies and problem solving methods, and is sup-ported by a large library of reusable models (via the WebOnto editor). Currently in use by several pro-jects, OCML is available free of charge for non commercial use.

Building on previous work is a third option, and the one which offers the greatest variety of tools at present. Many of these are domain-specific.

The ABC Metadata Model Constructor funda-mental classes for digital libraries were determined by analyzing commonalities between Dublin Core, INDECS (Interoperability of Data in e-Commerce Systems), MPEG-7 (Multimedia Con-tent Description Interface), CIDOC (Interna-tional Committee for Documentation of the In-ternational Council of Museums) Conceptual Reference Model and the IFLANET (Interna-tional Federation of Library Associations and In-stitutions) FRBR model (Functional Require-ments for Bibliographic Records). These classes form building blocks for developing either appli-cation or domain-specific ontologies, with event-aware views for modeling different manifestations of a relationship (Hunter 2001). This tool pro-vides graphical user interfaces and is free to download, but it is still an experimental prototype (Leuf 2006, 217-8), without support, and assumes users understand Java, RDF, and basic ontology and metadata principles. WebOnto is a freely available Java applet coupled with a customized web server (LispWeb), which provides browsing, visualization and editing of knowledge models via the web. WebOnto is cur-rently being used with ScholOnto (discussed above) and PlanetOnto, for search, retrieval, news feeds, alerts, and presentations of laboratory-related information. The Kraft project outlines steps to building sha-red ontologies: ontology scoping, domain analy-sis, ontology formulation, and top-level ontology (Jones et al. 1998). However, their methodology lacks comprehensive evaluation of ontologies and is not applicable to global domains (Stucken-schmidt 2005, 68). The Protégé opensource Ontology Editor pro-vides two main ways of modeling ontologies, and

Page 46: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

235

can export in various formats including OWL. Used extensively in clinical medicine and the biomedical sciences, Protégé covers the full range of development processes (Leuf 2006, 209-210). KAON (KArlsruhe ONtology) offers a stable opensource, comprehensive tool suite for ontol-ogy creation, management, and a framework for building applications; it was designed for business applications requiring scalability and efficient rea-soning capabilities (Leuf 2006, 213). Chimæra is a system for creating and maintaining distributed web ontologies, as well as for merging ontologies and providing multidimensional diag-noses to identify problems (Leuf 2006, 210-211). Chimæra can load and export files in OWL, and is available opensource.

There are many possible variations in the ability of software to combine and relate ontologies; Klein pro-vides a comparison of several different approaches (see table 1): SKC (Scalable Knowledge Composi-tion), Chimæra, PROMPT, SHOE (Simple HTML Ontology Extensions), OntoMorph, metamodel, OKBC (Open Knowledge Base Connectivity) and layering. Of these, OntoMorph addresses the major-ity of the stated problems in combining ontologies. However, Klein states that “mismatches in expres-siveness between languages is not solvable” and more comprehensive schemes need to be developed for in-teroperability of ontologies (Klein 2001).

3.4 Encoding

For computer applications to be able to use ontolo-gies, they must be encoded in machine-readable lan-guages: in particular, all implicit relations between concepts must be explicitly encoded. To enable inter-operability between ontologies and query engines, we need to agree on standards for these encodings. As in any other area, there is some disagreement on what is the most useful path. OntologyWorks used the draft ISO (International Organization for Standardization) standard, SCL (Simple Common Logic), which has been superceded by the Common Logic Standard, currently under development (ISO 2006). Since On-tologyWorks does not seek interoperability with the broader public (it is a commercial effort), their focus was on what was most efficient and effective for their needs. However, if this standard is adopted by the ISO, it will likely compete with OWL for wider on-tology development. CyCorp developed its own lan-guage, CycL, for their powerful Cyc system; how-ever, their opensource components (OpenCyc and ResearchCyc) provide translators to certain other languages, and the ability to export selectively in OWL (CyCorp 2002). Schematron, “a language for making assertions about patterns found in xml documents,” is based on the tree pattern uncovered in the marked-up document. It allows you to determine which variant of a language you are working with, as well as to verify that it conforms to a particular schema (Leuf 2006, 218). Schematron was published as a draft ISO standard in 2004.

Table1. Table of problems and approaches for combined use of ontologies

Legend A: Solves problem automatically U: Solutions suggested to user M: Provides mechanism for specifying solution (Klein 2001)

Page 47: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

236

A proposed RDF Thesaurus Specification provides “conceptual relationships for encoding thesauri, clas-sification systems and organized metadata”, as well as a proposal for encoding a core set of thesaurus rela-tionships (Cross et al. 2003). The two standards that have been adopted by the World Wide Web Consor-tium are the Resource Description Framework (RDF) and the Web Ontology Language (OWL). RDF is a simple notation for representing relation-ships between and among objects. RDF uses URIs (Uniform Resource Identifiers) for identification, and describes resources in terms of three parts: sub-ject, predicate (the type of property about the sub-ject), and object (the value of the property about the subject) (World Wide Web Consortium, 2004b). OWL, the Web Ontology Language, was developed for defining and instantiating web ontologies so that computers can logically interpret information. An ex-tension of RDF, OWL has 3 increasingly complex sublanguages:

– OWL Lite is the simplest, and most closely re-

lated to thesauri. – OWL DL is based on description logics, which

enable computer applications to reason logically and make inferences.

– OWL Full is provides maximum expression with no computational guarantees.

OWL Full will probably never have wide usage due to its lack of tractability and lack of logic support; practical applications will likely use some subset of OWL DL, as it can provide both power and func-tionality. (de Bruijn 2003, 74).

3.5 Tools

Much of the research in the cross-mapping arena is focused on identifying and seeking solutions to the problems, rather than developing tools. However, XeOML offers an extensible markup language for mapping ontologies against one another, two at a time. Simple mappings are one-to-one relations, and complex mappings may involve more than one ele-ment or element type in either or both languages (Pazienza 2004).

MetaNet is a metadata term thesaurus created by the Harmony project to provide additional semantic knowledge that does not exist in XML-encoded me-tadata descriptions. Since many entities and relation-ships occur across all domains, it is possible to gen-erate a simplified set of semantic relationships be-

tween metadata terms in domain schemas to the pre-ferred terms in the ABC ontology, and then (based on this relationship), generate semantic relationships (cross-domain) between each of those original meta-data terms, outputting the results in RDF (Hunter, 2001). In addition, Harmony offers the ABC Meta-data Model Constructor for use with their ABC on-tology, an RDF visualization tool for complex meta-data (RDFViz), and a simple RDF query language (Rudolf “Squish”).(Brickley et al., 2002b)

The SIMILE project (Semantic Interoperability of Metadata and Information in unLike Environments) assessed existing tools in 2003, including RDF editors (IsaViz and RDFAuthor), schema editors (Protégé-2000, KAON OI-Modeller, and Ontolingua), ontol-ogy visualization software (OntoRama and Ontosau-rus), application profile editors (SCART: The MEG Registry Client), metadata instance editors (Hay-stack, Standardized Hyper Adaptable Metadata Edi-tor, and Simple Instance Creator), XForms for com-bining XML and forms, and thesaurus construction software (WebChoir vocabulary tools, Thesaurus Builder, MultiTes, and Term Tree) (Gilbert and Butler 2003). They determined that the existing tools only assist users in formally capturing existing models, rather than helping them to model their own schema. In addition, they found no formal approach for creat-ing RDF models, so they proceeded to fill the gaps. Some of the tools they created include: a faceted browser for RDF browsing via standard web brows-ers (Longwell), an interactive graphical RDF visuali-zation browser (Welkin), a tool for converting exist-ing syntaxes into RDF (RDFizers), a tool that sum-marizes the structure of an XML dataset (Gadget), and a generic ontology for rendering RDF in a hu-man-friendly manner (Fresnel, still in development) (Mazzocchi, Garland and Lee 2005).

University of Maryland’s Mindswap Lab has de-veloped an open-source OWL-DL reasoner, Pellet, for which commercial-level support is available. The InfoSleuth project is working to develop a commer-cial query server that dynamically adapts to the avail-able information sources and services, fusing related information from heterogenous resources and ab-stracting results to the level appropriate to the user needs (Telcordia Technologies 2005). Query engines can currently be classified coarsely by whether they use a centralized ontology to which all others are mapped, or whether they support individual map-pings between ontologies. TSIMMIS, InfoMaster, MOMIS, and Xyleme (an industrial solution) are based on a framework in which a single central

Page 48: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

237

schema is mapped to local schemas (Pazienza et al., 2004). The Bremen University Semantic Translator for Enhanced Retrieval (BUSTER) is a middleware of this same type, designed to access and integrate multiple ontologies which are based on a common vocabulary. The general top-level ontology it uses is based on simple Dublin Core with some added re-finements. (Visser and Schuster 2002). Thus the user must commit to the basic generalized vocabulary that is used to define concepts in all the source on-tologies, and is not presented with a specific domain view (Stuckenschmidt 2005, 199-207).

In contrast, the OBSERVER (Ontology Based System Enhanced With Relationships for Vocabulary hEterogeneity Resolution) system requires the user to select his terms from one of the ontologies it sup-ports; the source material that ontology covers is then queried (Figure 4). If the results are not satis-factory, the user query is rewritten into the ontolo-gies of other information sources in order to query other holdings (Mena et al. 2000).

OBSERVER uses synonyms, hypernyms, hypo-nyms, overlap, disjointedness and coverage to map between ontologies, storing these relations in a cen-

tral repository to use for translating queries Stucken-schmidt, 2005, 192-198). In this manner, heteroge-nous databases and ontologies are managed without the need for a single global ontology (Mena et al. 2000). MAFRA (the Ontology MApping FRAme-work) also is based on distributed mediation systems rather than a centralized one (Pazienza et al. 2004).

3.6 Costs

While ontologies offer benefits in terms of interop-erability, browsing and searching, reuse, and structur-ing knowledge in a domain, the costs must be consid-ered. Costs include construction, learning, cross-mapping, and maintenance and continual develop-ment of both the ontologies and the software (Men-zies 1997). Information about cost is difficult to ob-tain, as most efforts are prototypes or commercial developments. Tim Berners-Lee, a major proponent of the Semantic Web, downplays the total cost, and fails to consider methodologies, depth of ontology, or even level of usability in his online assessment (Bern-ers-Lee 2005). In a later article with others, however, this stance is modified somewhat by implying that

Figure 4

Page 49: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

238

general web applications may only need lightweight ontologies; and recognition that in certain commer-cial applications, the use of powerful heavyweight on-tologies will easily recoup the cost (Shadbolt et al. 2006). Recently a cost estimation approach has been developed (ONTOCOM; a detailed description is available in (Bontas and Mochol 2006) and an exam-ple of its application to a particular ontology (DILI-GENT) is described (Bontas and Tempich 2005), though the actual results of the many formulas upon the various cost drivers are not included in this publi-cation. These cost drivers include:

Product factors: complexity of the domain analy-sis, conceptualization, implementation, instantiation, evaluation, integration, reusability, and documenta-tion (Institut für Informatik 2006):

– Personnel factors: ontologist/domain expert ca-

pability & experience, language and tool experi-ence, and personnel continuity

– Project factors: tool support, multi-site develop-ment, and required development schedule

– Reuse/maintenance factors: ontology understand-ability, domain/expert unfamiliarity, and complex-ity of evaluation, modifications, and translations

Development of an ontology requires a shared con-ceptualization by domain experts, users and design-ers (de Bruijn 2003, 5); this is not only difficult, but requires such a high initial investment, it will only be supportable where there is commercial interest (Stuckenschmidt and van Harmelen 2005, 249). While the initial cost of ontology implementation is frightening, one IBM researcher predicts the long term maintenance of an ontology to be 80% of the cost (Welty 2005). In a recent survey of 34 ontology engineering projects, half of which were commercial, all participants emphasized the resource-intensive nature of domain analysis and the lack of low barrier methods and tools (Simperl and Tempich 2006). The implications are that there must be a clear and press-ing need for the benefits of ontological indexing and retrieval, sufficient to provide extensive funding or the dedicated volunteer labor of known and trusted professionals. From the limited survey of the land-scape performed for this report, it appears that fund-ing is currently available in medical fields, environ-mental research, national defense, and business ap-plications. The educational field may contain suffi-cient volunteer experts, university support, and grant-funded development to make ontology devel-opment feasible for instructional materials.

To be able to effectively apply an ontology, much less change it, one must learn it, another time-consuming task. Apart from domain knowledge, the person encoding the document must have a level of understanding approaching that of a skilled knowl-edge engineer (Marshall and Shipman 2003). To ex-pect the average citizen to have or develop the neces-sary knowledge and skill to coherently apply a do-main ontology to a document is infeasible (Marshall 2004). If the users will not apply the ontologies, then the application of metadata to resources must be performed by the institution or service. Hence the users only bear the cost if they pay for the ser-vice, either directly or indirectly; this implies that ontologies may indeed only be feasible, in the long term, for applications in commercial services.

The only other solution to this cost would be the automation of application of ontologies to resources. The development of this functionality depends heav-ily on research and tools developed by the artificial intelligence community. Some of the techniques de-veloped include a noun phrasing technique for con-cept extraction and concept association based on context, frequency and co-occurrence of terms (Chen 1999). However, precise meanings for every relation are necessary for automatic classification (Weinstein and Birmingham 1998). A 2003 assess-ment stated that there are a number of issues to be resolved before natural language can be understood by computers; and the majority of information pre-sent on the web is in natural language (Fensel 2003a). However, for technical fields with more structured terminology, a text-mining system for scientific literature, Textpresso, shows considerable promise for assisting in automatic ontology annota-tion. While the machine cannot replace the human expert, it can increase efficiency greatly (Müller et al. 2004). Further investigation into current develop-ments in this area is warranted.

For the ontology to be widely usable and interop-erable, cross-mapping to other ontologies and do-mains is necessary, requiring the involvement of mul-tiple domain experts (Adam, Atluri, and Adiwijaya 2000). And ontologies (and their supporting soft-ware) must be expected to change (de Bruijn 2003, 35), as knowledge and terminology are continually evolving. It is quite possible that this aspect may re-strict the usability of ontologies to specified do-mains. Cross-mapping is only likely if there is suffi-cient need and funding to offset the expense, and then it is not likely to be maintained over time with-out continued funding and demand.

Page 50: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

239

4. Conclusions

The decision about if, when, and how one should apply ontologies to one’s digital library is a complex one. There are many aspects to consider, and several of those aspects are moving targets. Any assessment or survey, such as this one, can only be a snapshot of an evolving landscape, and as such, is useful primar-ily in helping one get his bearings for the moment. Further research and feasibility studies are necessary components for any digital library considering the application of ontologies.

Some purposes of ontologies may be particularly useful. Dieter Fensel predicted in 2003 that three ar-eas in which ontology application potentially has a huge impact are knowledge management, enterprise integration, and e-commerce (Fensel 2003b). Al-ready this prediction seems to be proving true. If one’s digital library falls into these domains, the use-fulness may outweigh the cost: funding created by demand for a service may well be sufficient to over-come other obstacles. Usefulness in educational realms seems quite promising, but the return on in-vestment has yet to be proven (Milam 2005).

Outside of heavily funded domains, feasibility is yet to be determined. If the target audience for the digital library is the general public, at no cost to the user, then it is not likely that the application of on-tologies is currently monetarily feasible. Ontologies incur tremendous expenditures of resources in their creation or adoption, application, cross-mapping, maintenance, and possibly software development. Tools exist to assist in modifying existing ontologies, but they are not simple, and require extensive do-main knowledge and understanding of the concepts and relations required for the ontology to be func-tional. Tools to apply ontologies to existing re-sources are still under development. Cross-mapping ontologies for use beyond a single domain is a new territory; if the source ontologies have the same ba-sis, query engines appear to have good results, but that’s a rather telling caveat. Otherwise, it seems that only general mappings are feasible, supporting gen-eral queries with limited precision. To some extent, mappings can be automated, but must still be re-viewed by a human.

Systems to support ontology use (query engines and semantic web browsers) are becoming available, but their usefulness is limited by the ontologies and their mappings. And the cost of maintenance and continual evolution of an ontology is yet unmeas-ured. On the other hand, a general ontology lan-

guage has been adopted by W3C, tools and systems continue to evolve, and new ontologies appear every year. If funding exists, and an acceptable ontology exists in OWL for a domain covered by a particular digital library, it would be reasonable to assess the existing tools for application and delivery, and possi-bly move forward in implementation. As ontology and tool development lowers the technical and cost barriers, general digital libraries should certainly be-come involved: this is perhaps in the very near fu-ture.

References

ACS. World Ranking Thesaurus Software. Active

Classification Solutions. http://www.termtree.com .au/, viewed 1 June 2007.

Adam, Nabil R., Atluri, Vijayalakshmi and Adiwi-jaya, Igg. 2000. SI [system integration] in digital libraries. Communications of the ACM, 46: 6.

Aquilera, Vincent, Cluet Sophie, Veltri, Pierangelo, Vodislav, Dan, and Wattez, Fanny. 2000. Querying XML documents in Xyleme. In Proceedings of the ACM SIGIR Workshop on XML and Information Retrieval, 28 July 2000. http://www.haifa.il.ibm .com/sigir00-xml/final-papers/xyleme/Xyleme Query/XylemeQuery.html.

Becker, Peter, Green, Steve, and Roberts, Nataliya. Ontorama. knowledge, visualization and ordering laboratory. http://www.kvocentral.org/software/ ontorama.html, viewed 1 June 2007.

Beckett, Dave, Steer, Damian, Heery, Rachel, and Johnston, Pete. 2003. MEG registry project. UKOLN Metadata for Education Group. http:// www.ukoln.ac.uk/metadata/education/regproj/, last updated 12 September 2003.

Bergamaschi, Sonia, Beneventano, Domenico, Guerra, Francesco, and Vincini, Maurizio. 2005. Building a tourism information provider with the MOMIS system. Journal of information technology and tourism 7: 3-4. http://tourism.wu-wien.ac.at/ Jitt/JITT_7_34_Bergamaschi_et_al.pdf.

Berners-Lee, Tim. 2005. Putting the Web Back in Semantic Web. In 4th International Semantic Web Conference, November, 2005, slide 17. http:// www.w3.org/2005/Talks/1110-iswc-tbli/.

Bertini, M., Del Bimbo, A., and Torniai, C. 2005. Automatic video annotation using ontologies ex-tended with visual information. In ACM Multime-dia Conference 2005. ACM, November 2005.

Binding, Ceri and Tudhope, Douglas. 2004. KOS at your Service: programmatic access to knowledge

Page 51: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

240

organisation systems. Journal of digital informa-tion 4: 4. http://jodi.tamu.edu/Articles/v04/i04/ Binding.

Bockting, Sander. 2005. A semantic translation ser-vice using ontologies. In University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science 3rd Twente Student Confer-ence on IT, Enschede, 20 June 2005. http:// bockting.student.utwente.nl/documents/semantic _translation_service_using_ontologies.pdf.

Bontas, Elena Paslaru., and Mochol, Malgorzata. 2006. Ontology engineering cost estimation with ONTO-COM. Technical Report TR-B-06-01, Freie Univer-sität Berlin, Germany, 7 February 2006. http:// ontocom.ag-nbi.de/docs/tr-b-06-01.pdf.

Bontas, Elena Paslaru, and Tempich, Christoph. 2005. How much does it cost? Applying ONTOCOM to DILIGENT. Technical report TR-B-05-20, Freie Universität Berlin, Germany, 27 October 2005. http://ontocom.ag-nbi.de/docs/tr-b-05-20.pdf.

Brickley, Dan, Miller, Libby., Hunter, Jane and Lagoze, Carl, Principal Investigators. 2002a About Harmony. A joint project of the Distrib-uted Systems Technology Center (Australia), the Institute for Learning and Research Technology (UK), and Cornell Digital Library Research Group (USA). http://metadata.net/harmony/ index.html, viewed 29 May 2007.

Brickley, Dan, Miller, Libby, Hunter, Jane and Lagoze, Carl, Principal Investigators. 2002b. Harmony: Re-sults. A joint project of the Distributed Systems Technology Center (Australia), the Institute for Learning and Research Technology (UK), and Cor-nell Digital Library Research Group (USA). http:// metadata.net/harmony/Results.htm, viewed 30 May 2007.

Buckland, Michael, Chen, Aitao, Gey, Fredric C., and Larson, Ray R. 2006. Search across different media: numeric data sets and text files. In Infor-mation technology and libraries 25: 182. http:// metadata.sims.berkeley.edu/searchacross.pdf.

Butler, Mark. H., Gilbert, John, Seaborne, Andy, and Smathers, Kevin. 2004. Data conversion, extraction and record linkage using XML and RDF tools in Project SIMILE. Hewlett-Packard Company Tech-nical Report HPL-2004-147, 31 August 2004. http://metadata.sims.berkeley.edu/searchacross .pdf.

CES 2003. MetaNet: An Overview. A project of the European Commission Community Research In-formation Society Technologies, published by the University of Edinburgh, Research Centre of the

School of Education, Centre for Educational So-ciology. http://www.epros.ed.ac.uk/metanet/, last modified 3 November 2003.

Chalupsky, Hans. OntoMorph: a translation system for symbolic knowledge. University of Southern California Information Sciences Institute. http:// www.isi.edu/~hans/ontomorph/presentation/ontomorph.html, viewed 30 May 2007.

Chen, Hsinchun. 1999. Semantic research for digital libraries. D-Lib Magazine 5: 10. http://www.dlib .org/dlib/october99/chen/10chen.html.

Cross, Phil, Brickley, Dan, and Koch, Traugott. 2003. RDF thesaurus specification (draft). Institute for Learning and Research Technology Technical Re-port Number 1011, 21 July 2003. http://www.ilrt .bris.ac.uk/publications/researchreport/rr1011/report_html.

CyCorp. 2007. What is Cyc? CyCorp, Inc. http:// www.cyc.com/cyc/technology/whatiscyc, viewed 29 May 2007.

CyCorp. 2007. Opencyc.org: OpenCyc license infor-mation. CyCorp, Inc. http://www.opencyc.org/ license, viewed 29 May 2007.

CyCorp. 2007. ResearchCyc. CyCorp, Inc. http:// research.cyc.com/, viewed 29 May 2007.

CyCorp. 2002a The Syntax of CycL. CyCorp, last updated 28 March 2002. http://www.cyc.com/ cycdoc/ref/cycl-syntax.html

CyCorp. 2002b. Frequently Asked Questions about OpenCyc, Version 07b. CyCorp, last updated 20 September 2002. http://www.opencyc.org/faq/ opencyc_faq

Davies, John, Fensel, Dieter, and Van Harmelen, Frank, editors. 2003. Towards the semantic web: ontology-driven knowledge management. West Sus-sex: John Wiley & Sons.

de Bruijn, Jos. 2003. Using ontologies: enabling knowledge sharing and reuse on the semantic web. Digital Enterprise Research Institute Technical Re-port DERI-2003-10-29, October 2003. http:// www.deri.at/fileadmin/documents/DERI-TR -2003-10-29.pdf.

de Bruijn, Jos and Polleres, Axel. 2004. Towards an ontology mapping specification language for the semantic web. Digital Enterprise Research Institute Technical Report DERI-2004-06-30, June 2004. http://www.deri.at/fileadmin/documents/DERI -TR-2004-06-30.pdf.

Delugach, Harry, Editor. 2007. Common logic stan-dard. ISO Final Draft International Standard 24707. http://cl.tamu.edu/, viewed 30 May 2007.

Page 52: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

241

DGRC. The EDC Project. Digital Government Re-search Center. http://www.isi.edu/dgrc/dgrc -research.html, viewed 29 May 2007.

Dicheva, Darina, Sosnovsky, Sergey, Gavrilova, Ta-tiana, and Brusilovsky, Peter. 2006. Ontologies for Education Portal. A collaborative project of Winston Salem State University, University of Pittsburgh, and Saint-Petersburg State Polytech-nic University. http:// i iscs .wssu.edu/o4e/ viewhome.do?tm=O4E.xtm, viewed 1 April 2007.

Doerr, Martin, Hunter, Jane, and Lagoze, Carl. 2003. Towards a core ontology for information integra-tion. Journal of digital information 4:1, Article 169, 9. http://jodi.ecs.soton.ac.uk/Articles/v04/ i01/Doerr/.

Domingue, John. WebOnto. The Open University Knowledge Media Institute. http://kmi.open.ac.uk/ projects/webonto/, viewed 30 May 2007.

Domingue, John, and Motta, Enrico. PlanetOnto. The Open University Knowledge Media Institute. http://kmi.open.ac.uk/projects/planetonto/, viewed 30 May 2007.

Dublin Core Metadata Initiative. 2004. Dublin Core Metadata Element Set, Version 1.1: reference de-scription. Dublin Core Metadata Initiative, 20 De-cember 2004. http://dublincore.org/documents/ dces/, viewed 29 May 2007.

Electronic Cultural Atlas Initiative. 2006. Support for the learner: what, where, when, and who. University of California, Berkeley. http://ecai.org/imls2004/, viewed 29 May 2007.

Faculty of Engineering at Modena. 2004. The Media-tor envirOnment for Multiple Information Sour-ces (MOMIS) Project. University of Modena e Reggio Emilia, 2 October 2004. http://www .dbgroup.unimo.it/Momis/, viewed 15 May 2007.

Fensel, Dieter. 2003a. From a presentation for the Next Web Generation Seminar at the University of Innsbruck, Summer, 2003. In de Bruijn, J. Using Ontologies: Enabling Knowledge Sharing and Re-use on the Semantic Web. Digital Enterprise Re-search Institute Technical Report DERI- 2003-10-29, October 2003. http://www.deri.at/fileadmin/ documents/DERI-TR-2003-10-29.pdf.

Fensel, Dieter. 2003b Ontologies: a silver bullet for knowledge management and electronic commerce. In de Bruijn, J. Using ontologies: enabling knowl-edge sharing and reuse on the semantic web. Berlin: Springer-Verlag. Digital Enterprise Research Insti-tute Technical Report DERI-2003-10-29, October 2003.

http://www.deri.at/fileadmin/documents/ DERI-TR-2003-10-29.pdf.

Fensel, Dieter, Hendler, Jim, Lieberman, Henry, and Wahlster, Wolfgang. 2003c Introduction. In Spin-ning the semantic web. (Cambridge: MIT Press) h t t p : / / w 5 . c s . u n i - s b . d e / t e a c h i n g / w s 0 3 / internetagenten/Introduction.pdf, viewed 15 May 2007.

Food and Agriculture Organization of the United Nations. 2007. Agriculture Information Manage-ment Standards: AGROVOC Thesaurus. http:// www.fao.org/aims/ag_intro.htm, last updated 22 May 2007.

FZI WIM and AIFB LS3. 2007. KAON Tool Suite. http://kaon.semanticweb.org/, last updated 10 May 2005.

Genesreth, Michael R. 2004. Knowledge interchange format: draft proposed American National Stan-dard (dbANS), NCITS.T2/98-004. Stanford Logic Group, Stanford University. http://logic .stanford.edu/kif/dpans.html, viewed 14 April 2007.

Gilbert, John, and Butler, Mark H. 2003. Review of existing tools for working with schemas, metadata, and thesauri. Hewlett-Packard Company Techni-cal Report HPL-2003-218, 6 November 2003. http://www.hpl.hp.com/techreports/2003/HPL -2003-218.pdf.

Gray, Peter, Gray, Alex, Fiddian, Nick, Shave, Mi-chael, and Bench-Capon, Trevor, Principal Inves-tigators. 2000. KRAFT: Knowledge Reuse & Fu-sion/Transformation. A joint project of The Uni-versity of Aberdeen Computer Science Depart-ment, The Cardiff University School of Com-puter Science, and The University of Liverpool Computer Science Department. http://www.csd .abdn.ac.uk/~apreece/Research/KRAFT/, viewed 30 May 2007.

Hjørland, Birger. 2007. Knowledge organization sys-tems. Core concepts in library and information sci-ence (LIS), 11 February 2007. http://www.db.dk/ bh/l ifeboat_ko/CONCEPTS/knowledge_ organization_systems.htm, viewed 15 May 2007.

Hovy, Eduard. 2003. Using an ontology to simplify data access. Communications of the ACM 46.

Hollink, Laura, Worring, Marcel, and Schreiber, A. Th. (Guus). 2005. Building a visual ontology for video retrieval. In ACM Multimedia Conference 2005. ACM, November 2005. http://www.cs.vu .nl/%7Eguus/papers/Hollink05b.pdf , viewed 10 November 2007. Ontology available at http://

Page 53: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

242

appling.kent.edu/nsdlgreen/default.htm, viewed 29 May 2007.

Hsu, Eric I. 2003. Wine agent 1.0: how does it work? Stanford University Knowledge Systems Artificial Intelligence Laboratory, last updated 8 April 2003. http://www.ksl.stanford.edu/projects/wine/ explanation.html

Hunter, Jane. 2001. MetaNet; a metadata term the-saurus to enable semantic interoperability be-tween metadata domains. Journal of Digital In-formation 1:8, No. 42, 8 February 2001. http:// jodi.tamu.edu/Articles/v01/i08/Hunter/.

Hüsemann, Bodo. 2006. OntoMedia. University of Muenster. http://www.ontomedia.de/, viewed 29 May 2007.

ICOM/CIDOC Document Standards Group and CIDOC CRM Special Interest Group. 2005. Definition of the CIDOC Conceptual Reference Model, Version 4.2. edited by Crofts, Nick, Doerr, Martin, Gill, Tony, Stead, Stephen, and Stiff, Mat-thew, June 2005. http://cidoc.ics.forth.gr/docs/ cidoc_crm_version_4.2.pdf.

IFLA. 1998. Functional requirements for bibliographic records. Final Report. International Federation of Library Associations and Institutions, Cataloguing Section, FRBR Review Group. http://www.ifla .org/VII/s13/frbr/frbr.htm, viewed 29 May 2007.

InfoMaster. 2006. InfoMaster: the power to make better decisions. Efekt Pty Ltd. http://www .infomaster.com.au/, viewed 29 May 2007.

IMS. 2007. Learning resource meta-data specifica-tion. IMS Global Learning Consortium, Inc. http://www.imsproject.org/metadata/, viewed 29 May 2007.

Information Sciences Institute, University of South-ern California. Large resources: ontologies (SEN-SUS) and lexicons. Information Sciences Institute, University of Southern California. http://www.isi .edu/natural-language/projects/ONTOLOGIES .html, viewed 29 May 2007.

Institut für Informatik. 2006a. Ontology engineer-ing cost estimation with ONTOCOM. Institut für Informatik, Networked Information Systems, Freie Universität Berlin. http://ontocom.ag-nbi.de/index.html, viewed 1 June 2007.

Institut für Informatik. 2006b. ONTOCOM cost drivers. Institut für Informatik, Networked Infor-mation Systems, Freie Universität Berlin. http:// ontocom.ag-nbi.de/ontocom.html, viewed 1 June 2007.

International Standards Organization. 2006. Infor-mation technology – common logic (CL): a frame-

work for a family of logic-based languages. (ISO/IEC JTC 1/SC 32 N 1498), 31 December 2006. http://cl.tamu.edu/docs/cl/24707-31-Dec-2006.pdf.

International Standards Organization. 2004. MPEG-7 overview. (ISO/IEC JTC1/SC29/WG11 N 6828) Coding of Moving Pictures and Audio. http:// www.chiariglione.org/mpeg/standards/mpeg-7/ mpeg-7.htm, viewed 29 May 2007.

International Standards Organization, International Electrotechnical Commission. 2004. Document schema definition languages (DSDL) – part 3: rule-based validation – schematron. (ISO/IEC FDIS 19757-3). http://www.schematron.com/iso/dsdl-3 -fdis.pdf.

Jelliffe, Rick. Schematron: a language for making as-sertions about patterns found in XML documents. http://www.schematron.com/overview.html, viewed 30 May 2007.

Jones, D.M, Bench-Capon, T.J.M., and Visser, P.R.S. 1998. Methodologies for ontology development. In Jose Cuena, ed., IT&KNOWS information technology and knowledge systems: Proceedings of the XV. IFIP World Computer Congress, 31 Aug.- 4 Sept. 1998, Vienna, Austria and Budapest, Hun-gary. Vienna: Austrian Computer Society/Inter- national Federation for Information Processing, pp. 62-75.

Kabel, S., de Hoog, R., Wielinga, B.J., Anjewierden, A. 2004. The added value of task and ontology-based markup for information retrieval. Journal of the American Society for Information Science and Technology 55:348-62.

Kahn, L., McLeod, D., and Hovy, E. 2004. Retrieval effectiveness of an ontology-based model for in-formation selection. The international journal on very large data bases 13: 71- 85.

Karger, David R., Principal Investigator. Haystack Project. Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (MIT CSAIL). http://haystack.lcs.mit .edu/, viewed 1 June 2007.

Kent State University. Green’s Functions Digital Li-brary. A collaborative effort of Kent State Univer-sity, National Institute of Standards and Testing (NIST), and Massachusetts Institute of Technol-ogy (MIT). http://appling.kent.edu/nsdlgreen/ default.htm, viewed 29 May 2007.

Klein, Michael. 2001. Combining and relating on-tologies: an analysis of problems and solutions. In International Joint Conferences on Artificial Intelli-gence Workshop on Ontologies and Information

Page 54: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

243

Sharing. http://www.informatik.uni-bremen.de/ agki/www/buster/IJCAIwp/Finals/klein.pdf.

Knowledge Media Institute and the Open Univer-sity. 2004. Scholarly ontologies project: Summary. http://kmi.open.ac.uk/projects/scholonto/summary.html, viewed 30 May 2007.

Knowledge Media Institute. 2000. OCML: opera-tional conceptual modelling language. http://kmi .open.ac.uk/projects/ocml/ viewed 14 March 2007.

KSL. 2005a Chimæra. Stanford University Com-puter Science Department, Knowledge Systems Artificial Intelligence Laboratory. http://www-ksl .stanford.edu/software/chimaera/, viewed 30 May 2007.

KSL. 2005b Ontolingua. Stanford University Com-puter Science Department, Knowledge Systems Artificial Intelligence Laboratory. http://www-ksl .stanford.edu/software/ontolingua/, viewed 1 June 2007.

Lagoze, Carl and Hunter, Jane. 2001. The ABC on-tology and model. Journal of digital information 2:2, Art. 77. http://jodi.ecs.soton.ac.uk/Articles/ v02/i02/Lagoze/.

Lawrence, Faith, Tuffield, Mischa M., Jewell, Mike O., Prügel-Bennett, Adam, Millard, David E., Nixon, Mark S., Schraefel, Monica, and Shadbolt, Nigel R. 2005. OntoMedia creating an ontology for marking up the contents of heterogenous me-dia. In Proceedings Ontology Patterns for the Se-mantic Web ISWC-05 Workshop. http://eprints .ecs.soton.ac.uk/11153/01/onto_workshop.pdf.

Leuf, Bo. 2006. The semantic web: crafting infrastruc-ture for agency. West Sussex: John Wiley & Sons.

Lin, David and Hunter, Jane. 2001. ABC Metadata Model Constructor: The ABC Ontology. A result of the DSTC (Australia), JISC (UK), and NSF (US) funded Harmony Project. Developed under the direction of Carl Lagoze. http://metadata .net/harmony/constructor/ABC_Constructor.htm, viewed 30 May 2007.

Library of Congress. 2007. MARC standards. Library of Congress, Network Development and MARC Standards Office. http://www.loc.gov/marc/, last updated 24 January 2007.

LSDIS. 2005. ADEPT: Alexandria Digital Earth Pro-totype (1999-2004). Large Scale Distributed Infor-mation Systems, University of Georgia, Computer Science Department. http://lsdis.cs.uga.edu/ projects/past/ADEPT/, viewed 29 May 2007.

Marshall, Catherine C. 2004. Taking a stand on the se-mantic web. http://www.csdl.tamu.edu/~marshall/ mc-semantic-web.html, viewed 15 May 2007.

Marshall, Catherine C. and Shipman, Frank M. 2003. Which semantic web? In HyperText ‘03 Confer-ence, Nottingham, UK, 26-30 August, 2003. Copyright 2003 ACM. http://www.csdl.tamu .edu/~marshall/ht03-sw-4.pdf, viewed 15 May 2007.

MatML Working Group. 2004. MatML: XML for materials property data. http://www.matml.org/ schema.htm, viewed 29 May 2007.

Mazzocchi, Stephano, Garland, Stephen, and Lee, Ryan. 2005. SIMILE: practical metadata for the semantic web. In O’Reilly XML.com 2005. XML From the Inside Out, 26 January 2005. http:// www.xml.com/pub/a/2005/01/26/simile.html

Meersman, Robert. 1999. Semantic ontology tools in is design. In Zbigniew W. Ras and Andrzej Skow-ron, eds., Foundations of intelligent systems: 11th International Symposium, ISMIS’99, Warsaw, Po-land, June 8-11, 1999: proceedings. Berlin and New York: Springer, 30-45.

Mena, Eduardo, Illarramendi, Arantza, Kashyap, Vipul, and Sheth, Amit P. 2000. OBSERVER: an approach for query processing in global informa-tion systems based on interoperation across pre-existing ontologies. Distributed and Parallel Data-bases 8: 223-71.

Menzies, Tim. 1999. Cost benefits of ontologies. In-telligence 10, no. 3: 26-32.

Milam, John. 2005. Ontologies in higher education. In HigherEd.org. http://highered.org/docs/milam -ontology.pdf, viewed 1 April 2007.

Mindswap Lab. 2006. Pellet: an OWL DL reasoner. Developed at the University of Maryland’s Mindswap Lab, commercially supported by Clark & Parisial, LLC. http://pellet.owldl.com/, last up-dated 4 November 2006.

Miller, George A. WordNet: a lexical database for the English language. Cognitive Science Laboratory, Princeton University. http://wordnet.princeton .edu/, viewed 29 May 2007.

MIT 2007a Simile: semantic interoperability of meta-data in unlike environments. A joint project of Massachusetts Institute of Technology Libraries and Massachusetts Institute of Technology Com-puter Science and Artificial Intelligence Labora-tory. http://simile.mit.edu/, viewed 29 May 2007.

MIT and Hewlett Packard. 2007b DSpace: Welcome to DSpace. A joint project of Massachusetts Insti-

Page 55: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

244

tute of Technology Libraries and Hewlett Pack-ard. http://www.dspace.org/, viewed 29 May 2007.

Motta, Enrico. OCML: operational conceptual model-ling language. The Open University Knowledge Media Insitute. http://kmi.open.ac.uk/projects/ ocml/, viewed 30 May 2007.

Müller, Hans-Michael, Kenny, Eimear E., and Stern-berg, Paul W. 2004. Textpresso: an ontology-based information retrieval and extraction system for biological literature. Public library of science: biol-ogy 2: 11. http://www.pubmedcentral.nih.gov/ articleren-der.fcgi?tool=pubmed&pubmedid=15383839.

Multisystems. 2007. Thesaurus construction and pub-lishing solutions. Multisystems. h t t p : / / w w w. multites.com/, viewed 1 June 2007.

Noy, Natasha. Prompt. Stanford Medical Informat-ics, Stanford University. http://protege.stanford .edu/plugins/prompt/prompt.html, viewed 30 May 2007.

OCLC. 2007. Learn more about LC Name Author-ity Service. Online Computer Library Center Pro-grams and Research, ResearchWorks. http://www .oclc.org/research/researchworks/authority/ default.htm, viewed 29 May 2007.

OKBC Working Group. 1995. Open Knowledge Base Connectivity Home Page. A joint project of Cy-Corp, Information Sciences Institute, Stanford Knowledge Systems Laboratory, Science Applica-tions International Corporation (SAIC), SRI In-ternational, and Teknowledge; Richard Fikes, working group chair. http://www.ai.sri.com/~ okbc/, viewed 30 May 2007.

Ontology Works, Inc. 2005. Ontology Works knowl-edge server. http://www.ontologyworks.com/ks .php, viewed 8 March 2007.

Ontoprise. 2007. Know how to use know-how. Onto-prise GmbH. http://www.ontoprise.de/content/, viewed 29 May 2007.

Palmer, Matthias, Naeve, Ambjörn, Enoksson, Fredrik, Nilsson, Mikael, Eriksson, Henrik, Danils, Jan, and Stark, Jöran. 2007. SHAME: stan-dardized hyper adaptible metadata editor. http:// kmr.nada.kth.se/shame/wiki/Overview/Main, last updated 5 December 2006.

Paralič, Jan and Kostial, Ivan. 2003. Ontology-based Information Retrieval. In Proceedings of the 14th International Conference on Information and Intel-ligent Systems (IIS2003), Varadin, Croatia. ISBN 953-6071-22-3, 23-28.

Parallel Understanding Systems Group. SHOE: sim-ple html ontology extensions. Department of Com-

puter Science, University of Maryland at College Park. http://www.cs.umd.edu/projects/plus/SHOE/index.html, viewed 30 May 2007.

Pazienza, MariaTeresa., Stellato, Armando, Vindigni, Michele, Zanzotto, Fabio Massimo. 2004. XeOML: an XML-based extensible ontology mapping language. Paper presented at the 3rd In-ternational Semantic Web Conference (ISWC2004) in Hiroshima, Japan, November 2004. http:// ai-nlp.info.uniroma2.it/stellato/publications/2004 _ISWC-04_XeOML%20An%20XML-based%20 extensible%20Ontology%20Mapping%20 Language.pdf, viewed 6 February 2007.

Petras, Vivien, Larson, Ray, and Buckland, Michael. 2006. Time period directories: a metadata infra-structure for placing events in temporal and geo-graphic context. In Opening information horizons: Joint Conference on Digital Libraries, Chapel Hill, NC, 11-15 June 2006. http://metadata.sims. berkeley.edu/tpdJCDL06.pdf, viewed 24 February 2007.

Pietriga, Emmanuel. 2007. IsaViz: a visual authoring tool for RDF. World Wide Web Consortium, RDF Developer. http://www.w3.org/2001/11/IsaViz/, May 2007.

Riva, Alberto. LispWeb. Common Lisp Web Server. http://snpper.chip.org/lispweb, viewed 30 May 2007.

Russ, Tom and Patil, Ramesh. 2006. Loom ontosau-rus. University of Southern California Informa-tion Sciences Institute. http://www.isi.edu/isd/ ontosaurus.html, last updated 5 December 2006.

Rys, Michael. 1998. The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS). Stan-ford University, last updated 4 April 1998. http:// www-db.stanford.edu/tsimmis/, viewed 3 Febru-ary 2007.

Shadbolt, Nigel, Hall, Wendy, and Berners-Lee, Tim. 2006. The semantic web revisited. IEEE Intelligent Systems 21: 96-101. http://eprints.ecs.soton.ac.uk/ 12614/01/Semantic_Web_Revisted.pdf.

Shreve, Gregory M. and Zeng, Marcia Lei. 2003. In-tegrating resource metadata and domain markup in an NSDL collection. In Proceedings of the In-ternational DCMI Metadata Conference and Work-shop, Seattle, WA, 28 September - 2 October, 2003. http://www.siderean.com/dc2003/604_ paper62.pdf, viewed 16 March 2007.

Shum, Simon Buckingham, Motta, Enrico and Dominigue, John. 2000. ScholOnto: an ontology-based digital library server for research documents

Page 56: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

245

and discourse. International Journal on Digital Li-braries 3: 3. http://kmi.open.ac.uk/projects/ scholonto/docs/ScholOnto-IJoDL-2000.pdf, viewed 4 February 2007.

SID. 2001. Interoperable Database Group, Research Group of Distributed Information Systems (SID). http://sid.cps.unizar.es/OBSERVER/, up-dated 12 April 2001.

Silva, Nuno, Varady, Zoltan, Westerhausen, Frank, Fodor, Oliver, Silva, PedroVieira, and Maio, Paulo. 2006. MAFRA toolkit. h t tp : / /maf ra - too lk i t .sourceforge.net/, last updated 2 January 2006.

Simperl, Elena Paslaru Bontas, and Tempich, Chris-toph. 2006. Ontology engineering: a reality check. In 5th International Conference on Ontologies, Da-tabases, and Applications of Semantics. http:// ontocom.ag-nbi.de/docs/odbase2006.pdf, viewed 16 March 2007.

Sirin, Evren. Simple Instance Creator (SIC). Mary-land Information and Network Dynamics Lab Semantic Web Agents Project (MINDSWAP). http://www.mindswap.org/~evren/SIC/, viewed 1 June 2007.

Smith, Terence R., Zeng, Marcia L., and the ADEPT Project Team. 2004. Building semantic tools for concept-based learning spaces: knowledge bases of strongly-structured models for scientific con-cepts in advanced digital libraries. Journal of digi-tal information 4:4, Art. 263, 28 January 2004. http://jodi.tamu.edu/Articles/v04/i04/Smith/.

Soergel, Dagobert, Lauser, Boris, Liang, Anita, Fi-esseha, Frehiwot, Keizer, Johannes, and Katz, Ste-phen. 2004. Reengineering thesauri for new appli-cations: the AGROVOC Example. In Journal of digital information 4:3, No. 257. http://jodi.tamu .edu/Articles/v04/i04/Soergel.

Soo, Von-Wun, Lee, Chen-Yu and Yeh, Jaw Jium. Us-ing sharable ontology to retrieve historical images. In International Conference on Digital Libraries, Proceedings of the 2nd ACM/IEEE-CS Joint Con-ference on Digital Libraries, ACM Press, July 2002.

Stanford Medical Informatics. 2007. Protégé. Stanford University School of Medicine, Stanford Medical Informatics. http://protege.stanford.edu/, last up-dated 25 May 2007.

Steer, Damian. 2003a. RDF author. http://rdfweb .org/people/damian/RDFAuthor/, last modified 2 August 2003.

Steer, Damian. 2003b. The Meg Registry Client Soft-ware (SCART). UKOLN Metadata for Education Group.

http://www.ukoln.ac.uk/metadata/education/regproj/scart/, viewed 1 June 2007.

Stuckenschmidt, Heiner, and van Harmelen, Frank. 2005. Information Sharing on the Semantic Web. Berlin: Springer.

Swoogle. 2006. Swoogle manual: FAQs. University of Maryland, Baltimore County Ebiquity Research Group. http://swoogle.umbc.edu/index.php ?option=com_swoogle_manual&manual=faq, viewed 16 March 2007.

Thesaurus Builder. 2007. Thesaurus Builder thesaurus management software. Thesaurus Builder. http:// www.thesaurusbuilder.com/, viewed 1 June 2007.

TECUP Consortium. 2001. INDECS: interoperabil-ity of data in e-commerce systems. TECUP Con-sortium, lead by Universität Göttingen / Nied-ersächsische Staats- und Universitätsbibliothek (UNIGOE) as Coordinator. http://gdz.sub.uni -goettingen.de/tecup/indecs.htm, viewed 30 May 2007.

Telcordia Technologies. 2005. The InfoSleuth agent system. Applied Research Greenhouse, Telcordia Technologies. http://www.argreenhouse.com/ InfoSleuth/index.shtml, viewed 11 April 2007.

U.S. National Library of Medicine. 2006a UMLS me-tathesaurus fact sheet. National Institutes of Health, U.S. Department of Health and Human Services, 28 March 2006. http://www.nlm.nih.gov /pubs/factsheets/umlsmeta.html, viewed 14 March 2007.

U.S. National Library of Medicine. 2006b. Unified medical language system. National Institutes of Health, U.S. Department of Health and Human Services, 28 March 2006. http://www.nlm.nih .gov/research/umls/about_umls.html, viewed 14 March 2007.

U.S. National Library of Medicine. 2006c. SPE-CIALIST lexicon fact sheet. National Institutes of Health, U.S. Department of Health and Human Services, 28 March 2006. http://www.nlm.nih. gov/pubs/factsheets/umlslex.html, viewed 14 March 2007.

U.S. National Library of Medicine. 2005. UMLS knowledge source server (UMLSKS), Version 5.0. National Institutes of Health, U.S. Department of Health and Human Services, 30 August 2005. http://umlsks.nlm.nih.gov/, viewed 14 March 2007.

Visser, Ubbo and Hübner, Sebastian. 2003. Bremen University Semantic Translator for Enhanced Re-trieval (BUSTER). http://www.informatik.uni -bremen.de/agki/www/buster/new/application .html, last modified 25 May 2003.

Page 57: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 J. L. DeRidder. The Immediate Prospects for the Application of Ontologies in Digital Libraries

246

Visser, Ubbo and Schuster, Gerhard. 2002. Finding and integration of information: a practical solu-tion for the semantic web. In Proceedings of ECAI 02, Workshop on Ontologies and Semantic Interop-erability, pp. 73-78. http://www.informatik.uni -bremen.de/agki/www/buster/papers/ECAI02WS .pdf viewed 14 March 2007.

VRA. 2007. VRA core. Visual Resources Association, The International Association of Image Media Professionals. http://www.vraweb.org/projects/ vracore4/, viewed 29 May 2007.

WebChoir. 2006. Project vocabulary tools. WebChoir, Inc. http://www.webchoir.com/products/pvt .html, viewed 1 June 2007.

Weinstein, PeterC. and Birmingham, William P. 1998. Creating ontological metadata for digital library content and services. International journal on digital libraries 2: 20-37. http://deepblue.lib.umich.edu/ handle/2027.42/42334.

Welty, Chris. 2004. Ontology maintenance support: text, tools, and theories. Presentation at the 7th International Protégé Conference, Bethesda MD. http://protege.stanford.edu/conference/2004/ slides/2.1_Welty_Ontology_Maintenance_ Support_v3.pdf, viewed 16 March 2007.

Wiederhold, Gio, Jannink, Jan, Prasenjit, Mitra, Decker, Stefan, and Vasan, Pichai S. Scalable knowl-edge composition (SKC). Stanford University In-foLab. http://infolab.stanford.edu/SKC/, viewed 30 May 2007.

World Wide Web Consortium. 2007. Extensible markup language (XML). W3C Architecture Do-main. http://www.w3.org/XML/, viewed 29 May 2007.

World Wide Web Consortium. 2006. XForms 10 (Sec-ond Edition): W3C Recommendation 14 March 2006. http://www.w3.org/TR/xforms/, viewed 26 March 2007.

World Wide Web Consortium. (2004a) OWL web on-tology language guide. W3C Recommendation, Feb-ruary 2004. http://www.w3.org/TR/owl-guide/, viewed 26 March 2007.

World Wide Web Consortium. (2004b) RDF primer: W3C recommendation 10 February 2004. http:// www.w3.org/TR/REC-rdf-syntax/, viewed 21 March 2007.

World Wide Web Consortium. 2004c. RDF/XML syntax specification (revised). W3C Recommenda-tion 10 February 2004. http://www.w3.org/TR/ rdf-syntax-grammar/, viewed 29 May 2007.

World Wide Web Consortium. 2000. Resource de-scription framework (RDF) schema specification 1.0. http://www.w3.org/TR/2000/CR-rdf-schema -20000327/, viewed 29 May 2007.

World Wide Web Consortium. 1999. XSL transfor-mations (XSLT), Version 1.0. W3C Recommenda-tion 16 November 1999. http://www.w3.org/TR/ xslt, viewed 29 May 2007.

XYLEME. 2006. Xyleme: harness the power of XML. XYLEME, 2006. http://www.xyleme.com/, viewed 29 May 2007.

Page 58: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

247

Automated Classification of Textual Documents Based on a Controlled Vocabulary

in Engineering†

Koraljka Golub,* Thierry Hamon,** and Anders Ardö***

*KnowLib Research Group, Lund University, P. O. Box 118, SE-221 00 Lund, Sweden <[email protected]>

** Laboratoire d'Informatique de Paris-Nord – UMR CNRS 7030, Institut Galilée, Université Paris-Nord, Avenue J.-B. Clément, 93430 Villetaneuse, France

<[email protected]>

***KnowLib Research Group, Lund University, P. O. Box 118, SE-221 00 Lund, Sweden <[email protected]>

Koraljka Golub has interest in traditional and recent knowledge organization systems in the context of digital libraries. She acquired her doctorate from Lund University, Sweden in 2007. Her thesis dealt with automated subject classification in Web-based hierarchical browsing systems. From 2008 she will work as a research officer at UKOLN, Bath. One project will be about terminology registries, and the other on social tagging and ways it can enhance information retrieval, especially when combined with more traditional controlled vocabularies.

Thierry Hamon is assistant professor at the Computer Science Department, Paris-Nord University. He received his Ph.D. in computer science in 2000, on the topic of semantic variation in specialized corpora. His current research interest is in terminology acquisition and structuring, and bringing to-gether tools for Natural Language Processing (NLP). He has developed several NLP tools: a termino-logical system SynoTerm dedicated to the acquisition of synonymy relations between terms, based on lexical resources, a term extractor, and a linguistic platform for the enrichment of specialized web do-cuments.

Anders Ardö is Associate Professor at the Department of Electrical and Information Technology, Lund University, where manages the Knowledge Discovery and Digital Library Research Group (KnowLib). He has a background in Computer Systems with a PhD from Lund University in 1986. Since 1992 he has worked with research and development for digital library services. He has participa-ted in many EU-projects including DESIRE, Telematics Applications Programme, Renardus, ALVIS and DELOS.

† Many thanks to Traugott Koch, Douglas Tudhope, Marianne Lykke Nielsen, and anonymous re-viewers for providing comments on the manuscript, which helped improve the paper considerably. The authors also wish to thank two subject experts who helped in evaluation. This work was sup-ported by the IST Programme of the European Community under ALVIS (IST-002068-STP).

Golub, Koraljka, Hamon, Thierry, and Ardö, Anders. Automated Classification of Textual Documents Based on a Control-led Vocabulary in Engineering. Knowledge Organization, 34(4), 247-263. 33 references.

Page 59: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

248

ABSTRACT. Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require trai-ning documents–instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Com-pendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, ex-clusion of certain terms, and enrichment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms. 1. Introduction

Subject classification is organization of objects into topically related groups and establishing relationships between them. In automated subject classification (in further text: automated classification) human intel-lectual processes are replaced by, for example, statisti-cal and computational linguistics techniques. Auto-mated classification of textual documents has been a challenging research issue for several decades. Its relevance is rapidly growing with the advancement of the World Wide Web. Due to high costs of human-based subject classification and the ever-increasing number of documents, there is a danger that recog-nized objectives of bibliographic systems (Svenonius 2000, 20-21) would be left behind; automated means could provide a solution to preserve them (30).

Automated classification of text has many different applications (see Sebastiani 2002 and Jain et al. 1999); in this paper, the application context is that of infor-mation retrieval. In information retrieval systems, e.g., library catalogues or indexing and abstracting services, improved precision and recall are achieved by con-trolled vocabularies, such as classification schemes and thesauri. The specific aim of the classification algo-rithm is to provide a hierarchical browsing interface to a document collection, through a classification scheme. In our opinion, one can distinguish between three major approaches to automated classification: text categorization, document clustering, and docu-ment classification (Golub 2006a).

In document clustering, both subject clusters or classes into which documents are classified and, to a limited degree, relationships between them are auto-matically produced. Labeling the clusters is a major research problem, with relationships between them, such as those of equivalence, related-term and hierar-chical relationships, being even more difficult to automatically derive (Svenonius 2000, 168). In addi-tion, “[a]utomatically-derived structures often result in heterogeneous criteria for category membership and can be difficult to understand” (Chen and Du-

mais 2000, 146). Also, clusters’ labels and relation-ships between them change as new documents are added to the collection; unstable class names and rela-tionships are in information retrieval systems user-unfriendly, especially when used for subject browsing.

Text categorization (machine learning) is the most widespread approach to automated classification of text. Here characteristics of subject classes, into which documents are to be classified, are learnt from docu-ments with human-assigned classes. However, human-classified documents are often unavailable in many subject areas, for different document types or for dif-ferent user groups. If one would judge by the standard Reuters Corpus Volume 1 collection (RCV1) (Lewis et al. 2004), some 8,000 training and testing docu-ments would be needed per class. A related problem is that the algorithm performs well on new documents only if they are similar enough to the training docu-ments. The issue of document collections was also pointed out by Yang (1999) who showed how certain versions of one and the same document collection had a strong impact on performance.

In document classification, matching is conducted between a controlled vocabulary and text of docu-ments to be classified. A major advantage of this ap-proach is that it does not require training documents. If using a well-developed classification scheme, it will also be suitable for subject browsing in information retrieval systems. This would be less the case with automatically-developed classes and structures of document clustering or home-grown directories not created in compliance with professional principles and standards. Apart from improved information re-trieval, another motivation to apply controlled vo-cabularies in automated classification is to reuse the intellectual effort that has gone into creating such a controlled vocabulary (see also Svenonius 1997).

The importance of controlled vocabularies such as thesauri in automated classification has been recog-nized in recent research. Bang et al. (2006) used a the-saurus to improve performance of a k-NN classifier and managed to improve precision by 14%, without

Page 60: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

249

degrading recall. Medelyan and Witten (2006) showed how information from a subject-specific thesaurus improved performance of keyphrase extraction by more than 1.5 times in F1, precision, and recall.

The overall purpose of this experiment is to gain in-sights into what degree a good controlled vocabulary such as Engineering Information thesaurus and classi-fication scheme (Milstead 1995) (in further text: Ei controlled vocabulary) could be used in automated classification of text, using string-matching. Vocabu-lary control in thesauri is achieved in several ways (Aitchinson et al. 2000). We believe that the following could be beneficial in the process of automated classi-fication:

– Terms in thesauri are usually noun phrases, which

are content words; – Three main types of relationships are displayed in

a thesaurus: - equivalence (e.g., synonyms, lexical variants); - hierarchical (e.g., generic, whole-part, instance

relationships); and, - associative (terms that are closely related con-

ceptually but not hierarchically and are not members of an equivalence set).

– In automated classification, equivalence terms could allow for discovering concepts and not just terms expressing the concepts. Hierarchies could provide additional context for determining the correct meaning of a term; and so could associa-tive relationships;

– When a term has more than one meaning in the thesaurus, each meaning is indicated by the addi-tion of scope notes and definitions, providing ad-ditional context for automated classification.

In a previous paper Golub (2006c) explored to what degree different types of Ei thesaurus terms and Ei classification captions influence performance of automated classification. In short, the algorithm searched for terms from the Ei controlled vocabulary in engineering documents to be classified (see 2.1). The majority of classes were found when using all the types of terms: preferred terms, their synonyms, related, broader, narrower terms and captions, in combination with a stemmer: recall was 73%. The remaining 27% of classes were not found because the words in the term list designating the classes did not exist in the text of the documents to be classified. No weighting or cut-offs were applied in the ex-periment. Apart from showing that all those types of terms should be used for a term list in order to

achieve best recall, it was also indicated that higher weights could be given to preferred terms (from the thesaurus), captions (from the classification scheme) and synonyms (from the thesaurus), as those three types of terms yielded highest precision.

The aim of this experiment is to improve the clas-sification algorithm based on string-matching be-tween the Ei controlled vocabulary and engineering documents to be classified. We especially wanted to do the following:

– increase levels of F1 and precision, similar to those

of recall from the previous experiment (Golub 2006c, 964), by applying different weights and cut-offs; and,

– increase levels of recall to more than those achieved in the previous experiment by adding new terms ex-tracted using natural language processing methods such as multi-word morpho-syntactic analysis and synonym extraction.

2. Methodology

2.1 String matching algorithm

This section describes the classification algorithm used in the experiment. It is based on searching for terms from the Ei controlled vocabulary, in the field of engineering, in text of documents to be classified (also in the field of engineering). The Ei controlled vocabulary consists of two parts: a thesaurus of en-gineering terms, and a hierarchical classification scheme of engineering topics. These two controlled vocabulary types have each traditionally had distinct functions: the thesaurus has been used to describe a document with as many controlled terms as possible, while the classification scheme has been used to group similar documents together to the purpose of shelving them and allowing systematic browsing. The aim of the algorithm was to classify documents into classes of the Ei classification scheme in order to provide a browsing interface to the document col-lection. A major advantage of Ei is that thesaurus de-scriptors are mapped to classes of the classification scheme. These mappings have been made manually (intellectually) and are an integral part of the thesau-rus. Compared with captions alone, mapped thesau-rus terms provide a rich additional vocabulary for every class: instead of having only one term per class (there is only one caption per class), in our experi-ment there were on average 88 terms per class. (A caption is a class notation expressed in words, e.g., in

Page 61: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

250

the Ei classification scheme “Electric and Electronic Instruments” is the caption for class “942.1”.)

Pre-processing steps of Ei included normalizing upper- and lower-case words. Upper-case words were left in upper case in the term list, assuming that they were acronyms; all other words containing at least one lower-case letter were converted into lower case. The first major step in designing the algorithm was to extract terms from Ei into what we call a term list. It contained class captions, thesaurus terms (Term), classes to which the terms and captions map or denote (Class), and weight indicating how appro-priate the term is for the class to which it maps or which it designates (Weight). Geographical names, all mapping to class 95, were excluded on the grounds that they are not engineering-specific. The term list was formed as an array of triplets:

Weight: Term (single word, Boolean term or

phrase) = Class

Single-word terms were terms consisting of one word. Boolean terms were terms consisting of two or more words that must all be present but in any order or in any distance from each other. Boolean terms in this form were not explicitly part of Ei, but were cre-ated to our purpose. They were considered to be those terms which in Ei contained the following strings: and, vs. (short for versus), , (comma), ; (semi-colon, separating different concepts in class captions), ( and ) (parentheses, indicating the con-text of a homonym), : (colon, indicating a more spe-cific description of the previous term in a class cap-tions), and – (double dash, indicating heading–subheading relationship). These strings we replaced with @and which indicated the Boolean relation in the term. All other terms consisting of two or more words were treated as phrases, i.e., strings that need to be present in the document in the exact same or-der and form as in the term. Ei comprises a large portion of composite terms (3,474 in the total of 4,411 distinct terms in our experiment); as such, Ei provides a rich and precise vocabulary with the po-tential to reduce the risks of false hits.

The following are two excerpts from the Ei classi-fication scheme and thesaurus, based on which the excerpt from the term list (further below) is created:

From the classification scheme: 931.2 Physical Properties of Gases, Liquids and

Solids …

942.1 Electric and Electronic Instruments … 943.2 Mechanical Variables Measurements From the thesaurus: TM Amperometric sensors UF Sensors–Amperometric measurements MC 942.1 … TM Angle measurement UF Angular measurement UF Mechanical variables measurement–Angles BT Spatial variables measurement RT Micrometers MC 943.2 … TM Anisotropy NT Magnetic anisotropy MC 931.2

All the different thesaurus terms as well as captions were added to the term list. Despite the fact that choosing all types of thesaurus terms might lead to precision losses, we decided to do just that in order to achieve maximum recall, as shown in a previous paper (Golub 2006c). In the thesaurus, TM stands for the preferred term, UF (“Used For”) for an equivalent term, BT for broader term, RT for related term, NT for narrower term; MC represents the main class; sometimes there is also OC, which stands for optional class, valid only in certain cases. Main and optional classes are classes from the Ei classification scheme that have been made manually (intellectually) and are an integral part of the thesau-rus. Based on the above excerpts, the following term list would be created:

1: physical properties of gases @and liquids @and

solids = 931.2, 1: electric @and electronic instruments = 942.1, 1: mechanical variables measurements = 943.2, 1: amperometric sensors = 942.1, 1: sensors @and amperometric measurements =

942.1, 1: angle measurement = 943.2, 1: angular measurement = 943.2, 1: mechanical variables measurement @and angles

= 943.2, 1: spatial variables measurement = 943.2, 1: micrometers = 943.2, 1: anisotropy = 931.2, 1: magnetic anisotropy = 931.2,

Page 62: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

251

The number at the beginning of each triplet is weight estimating the probability that the term of the triplet designates the class; in this example it is set to 1 as a baseline, and experiments with different weights are discussed later on.

The algorithm searches for strings from a given term list in the document to be classified and if the string (e.g., magnetic anisotropy from the above list) is found, the class(es) assigned to that string in the term list (931.2 in our example) are assigned to the document. One class can be designated by many terms, and each time a term is found, the corre-sponding weight (1 in our example) is added to a score for the class. The scores for each class are summed up and classes with scores above a certain cut-off (heuristically defined, discussed later on) are selected as the final ones for the document being classified.

The Ei classification scheme is hierarchical and consists of six main classes divided into 38 finer classes which are further subdivided into 182 classes. These are subdivided even further, resulting in some 800 individual classes in a five-level hierarchy. For this experiment one of the six main classes was se-lected, together with all its subclasses: class 9, Engi-neering, General. The reason for choosing this class was that it covers both natural sciences such as phys-ics and mathematics, and social sciences fields such as engineering profession and management. The lit-erature of the latter tends to contain more polysemic words than the former, and as such presents a more complex challenge for automated classification. Within the 9 class, there are 99 subclasses. However, for seven of them the number of documents in a da-tabase based on which the document collection was created (see 2.2 Document collection) were few, less than 100. Thus those seven classes were excluded from the experiment altogether. These were: 9 (En-gineering, General), 902 (Engineering Graphics; En-gineering Standards; Patents), 91 (Engineering Man-agement), 914 (Safety Engineering), 92 (Engineering Mathematics), 93 (Engineering Physics), and 94 (In-struments and Measurement). Of the remaining 92 classes, the distribution at the five different hierar-chical levels is as follows: at the fifth hierarchical level 11 classes, at the fourth 67, at the third 14, and at the second hierarchical level 5.

2.2 Document collection

The document collection comprised 35,166 biblio-graphic records from the Compendex database (En-

gineering Information 2006). (Compendex being a commercial database, the document collection can-not be made available to others, but the authors are willing to provide documents’ identification num-bers on request.) The records were selected by sim-ply retrieving the top 100 or more of them upon en-tering the class notation. A minimum of 100 records per class were downloaded at several different points in time during the years of 2005 and 2006.

For each record there was at least one of the 92 selected classes that were human-assigned (see 2.1). A subset of this collection was created to include only those records where main class was class 9 (The first one listed in the Ei classification codes field of the record.); this subset contained 19237 documents.

From each bibliographic record (in further text: document) the following elements were extracted: an identification number, title, abstract and human-assigned classes (Ei classification codes). Thesaurus descriptors (in Compendex called Ei controlled terms) were not extracted since the purpose of this experiment was to compare automatically assigned classes (and not descriptors) against the human-assigned ones. Below is an example of one docu-ment:

Identification number: 03337590709 Title: The concept of relevance in IR Abstract: This article introduces the concept of relevance as viewed and applied in the context of IR evaluation, by presenting an overview of the multidimensional and dynamic nature of the con-cept. The literature on relevance reveals how the relevance concept, especially in regard to the mul-tidimensionality of relevance, is many faceted, and does not just refer to the various relevance criteria users may apply in the process of judging rele-vance of retrieved information objects. From our point of view, the multidimensionality of rele-vance explains why some will argue that no con-sensus has been reached on the relevance concept. Thus, the objective of this article is to present an overview of the many different views and ways by which the concept of relevance is used - leading to a consistent and compatible understanding of the concept. In addition, special attention is paid to the type of situational relevance. Many researchers perceive situational relevance as the most realistic type of user relevance, and therefore situational relevance is discussed with reference to its poten-tial dynamic nature, and as a requirement for in-teractive information retrieval (IIR) evaluation.

Page 63: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

252

Ei classification codes: 903.3 Information Re-trieval & Use, 723.5 Computer Applications, 921 Applied Mathematics

Automated classification was based on title and ab-stract, and automatically assigned classes were com-pared against human-assigned ones (Ei classification codes in the example). On average, 2.2 classes per document were human-assigned, ranging from 10 to 1.

2.3 Evaluation methodology 2.3.1 Evaluation challenge

According to ISO standard on methods for examining documents, determining their subjects, and selecting index terms (International Standards Organization 1985), human-based subject indexing is a process in-volving three steps: 1) determining subject content of a document, 2) conceptual analysis to decide which aspects of the content should be represented, and 3) translation of those concepts or aspects into a con-trolled vocabulary. These steps, in particular the sec-ond one, are based on a specific library’s policy in re-spect to its document collections and user groups. Thus, when evaluating automatically assigned classes against the human-assigned ones, it is important to know the human-based indexing policies. Unfortu-nately, we were unable to obtain indexing policies ap-plied in the Compendex database. What we could de-rive from the document collection was the number of human-assigned classes per document, which were used in evaluation. However, without a thorough qualitative analysis of automatically assigned classes one cannot be sure whether, for example, the classes assigned by the algorithm, but not human-assigned, are actually wrong, or if they were left out by mistake or because of the indexing policy. A further issue is that we did not know whether the articles had been human-classified based on their full-text or/and ab-stracts; we had, however, only abstracts.

Another problem to consider when evaluating automated classification is the fact that certain sub-jects are erroneously assigned. When indexing, people make errors such as those related to exhaustivity pol-icy (too many or too few terms become assigned), specificity of indexing (which usually means that peo-ple do not assign the most specific term), they may omit important terms, or assign an obviously incor-rect term (Lancaster 2003, 86-87). In addition, it has been reported that different people, whether users or professional subject indexers, would assign different

subject terms or classes to the same document. Studies on inter- and intra-indexer consistency report gener-ally low indexer consistency (Olson and Boll 2001, 99-101). Markey (1984) reviewed 57 indexer consistency studies and reported that consistency levels range from 4% to 84%, with only 18 studies showing over 50% consistency. There are two main factors that seem to affect it:

1. Higher exhaustivity and specificity of subject in-

dexing both lead to lower consistency, i.e., index-ers choose the same first term for the major sub-ject of the document, but the consistency de-creases as they choose more classes or terms;

2. The bigger the vocabulary, or, the more choices the indexers have, the less likely will they choose the same classes or terms (Olson and Boll 2001, 99-101).

Both of these two factors were present in our ex-periment:

1. High exhaustivity: on average, 2.2 classes per

document had been human-assigned, ranging from 10 to 1;

2. Ei controlled vocabulary is rather big (we chose 92 classes) and deep (five hierarchical levels), allow-ing many different choices.

An analysis of automatically and human-assigned classes in a previous study showed, among other things, how certain human-assigned classes were ac-tually wrong and some automatically-assigned classes that were not human-assigned were correct (Golub 2006b). An analysis conducted within this study proved the same (see section 4.3).

Today evaluation in automated classification ex-periments is mostly conducted under controlled conditions, ignoring the above-discussed issues. As Sebastiani (2002, 32) puts it:

The evaluation of document classifiers is typi-cally conducted experimentally, rather than analytically. The reason is that … we would need a formal specification of the problem that the system is trying to solve (e.g., with respect to what correctness and completeness are de-fined), and the central notion … that of mem-bership of a document in a category is, due to its subjective character, inherently nonfor-malizable.

Page 64: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

253

Because of the fact that methodology for such ex-periments has yet to be developed, as well as limited resources, we followed the common approach to evaluation and started from the assumption that human-assigned classes in the document collection were correct, and compared automatically assigned classes against them.

2.3.2 Evaluation measures

The subset of the Ei controlled vocabulary we used comprised 92 classes that are all topically related to each other. The topical relatedness is expressed in numbers representing the classes: the more initial dig-its any two classes have in common, the more related they are. For example, 933.1.2 for Crystal Growth is closely related to 933.1 for Crystalline Solids, both of which belong to 933 for Solid State Physics, and finally to 93 for Engineering Physics. Each digit represents one hierarchical level: class 933.1.2 is at the fifth hier-archical level, 933.1 at the fourth etc. Thus, comparing two classes at only first few digits (later referred to as partial matching) instead of all the five also makes sense. Still, unless specifically noted, the evaluation in this experiment was conducted based on all the five different levels (later referred to as complete match-ing), i.e., an automatically assigned class was consid-ered correct only if all its digits were the same as a human-assigned class for the same document.

Evaluation measures used were the standard mi-croaveraged and macroaveraged precision, recall and F1 (Sebastiani 2002, 40-41), for both complete and partial matching:

Precision = correctly automatically assigned

classes / all automatically assigned classes

Recall = correctly automatically assigned classes /

all human-assigned classes F1 = 2*Precision*Recall / (Precision + Recall)

In macroaveraging the results are first calculated for each class, and then summed and divided by the number of classes. In microaveraging the results for each part of every equation are summed up first (e.g., all correctly automatically assigned classes are added together, all automatically assigned classes are added together), and then the “aggregated” values are used in one equation. Equations for macroaver-aged and microaveraged precision are given below:

Precisionmacroaveraged = sum of precision values for each class / number of all classes

Precisionmicroaveraged = sum of correct automated

assignments for each class / sum of all auto-mated assignments for each class

In microaveraging more value is given to classes that have a lot of instances of automatically assigned classes and the majority of them are correct, while in macroaveraging the same weight is given to each class, no matter if there are many or few automati-cally assigned instances of it. The differences be-tween macroaveraged and microaveraged values can be large, but whether one is better than the other has not been agreed upon (Sebastiani 2002, 41-42). Thus, in this experiment, it is the mean macroaveraged and microaveraged F1 that is mostly used.

In order to examine different aspects of the auto-mated classification performance, several other fac-tors were also taken into consideration:

– Whether the (human-assigned) main class is

found; – The number of documents that got automatically

assigned at least one class; – Whether the class with highest score was the same

as the human-assigned main class; – The distribution of automatically versus human-

assigned classes; and, – The average number of classes assigned to each

document. There were 2.2 human-assigned classes per document, and our aim was to achieve similar. In the context of hierarchical browsing based on a classification scheme, having too many classes as-signed to a document would place one document to too many different places, which would create the opposite effect of the original purpose of a classification scheme, that of grouping similar documents together.

3. Improving the algorithm

The major aim of the experiment was to improve the algorithm that was previously experimented with in Golub 2006c, where highest (microaveraged) recall was 73% when all types of terms were included in the term list. In that experiment neither weights nor cut-offs were experimented with, so all the classes that were found for a document were assigned to it. Here we wanted to achieve as high as possible precision lev-els by use of term weighting and class cut-offs. In or-

Page 65: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

254

der to also allow for better recall, the basic term list was enriched with new terms extracted from docu-ments in the Compendex database, using multi-word morpho-syntactic analysis and synonym acquisition.

3.1 Term weights

The aim of this part of the experiment was to achieve as high as possible precision levels by use of weight-ing and cut-offs. As shown in Golub 2006c, all types of terms need to be used in the term list for maxi-mum recall. Thus, all the different types of terms and their mappings to classes were merged into the final term list. This resulted in a number of duplicate cases which were dealt with in the following manner:

– If one term mapping to the same class was a cap-

tion, a preferred term, and a synonym at the same time, the highest preference was, based on their performance (see Table 4), given to captions, fol-lowed by preferred terms, followed by synonyms, while others were removed from the list;

– If one term mapping to both optional class (OC) and main class (MC) was a caption, a preferred term, and a synonym at the same time, the highest preference was, based on their performance (see Table 4), given to captions, followed by preferred terms, followed by synonyms, while others were removed from the list;

– If one thesaurus term of the same type mapped to both optional class (OC) and main class (MC), the one that mapped to the optional class was re-moved (based on their performance, see Table 2).

The final term list consisted of 8099 terms, out of which 92 were captions (all mapped to main class (MC)), 668 were broader terms, 729 narrower, 1653 preferred, 3224 related, and 1733 were synonym terms. This big number of terms that have been hu-man-mapped to classes indicates potential usefulness of such a controlled vocabulary in a string-matching algorithm for automated classification.

In order to systematically vary different parame-ters, the following 14 weighting schemes evolved:

1. w1: All terms in the term list were given the same weight, 1. This term list served as a baseline.

2. w134: Different term types were given different weights: single-word terms 1, phrases 3, and Boolean terms 4. These weights were heuristically derived in a separate experiment (Table 1). Three different term

lists were created, each containing only single-word terms, phrases or Boolean terms. Weight 1 was as-signed to all of them. The documents were classified using these three terms lists and their performance was compared for precision.

Single Phrase Boolean

Avg. precision (%) 8 26 33Derived weight 1 3 4

Table 1. Single, phrase and Boolean term lists and their per-formance as a basis for weights.

Avg. precision (%) is mean microaveraged and macroaveraged precision. Derived weights were based on dividing precision values (Avg. precision) by the lowest precision value (in this case 8).

3. w12: Terms mapping to a main class (MC) were given weight 2, and those mapping to an optional class (OC) were given weight 1. These weights were heuristically derived in a separate experiment (Table 2). Two different term lists were created, one con-taining only those terms that map to a main class, and another one containing only those terms that map to an optional class. Weight 1 was assigned to all of them. The documents were classified using these two terms lists and their performance was compared for precision.

MC OC

Avg. precision (%) 13 6Derived weight 2 1

Table 2. Main code and optional code term lists and their performance as a basis for weights.

Avg. precision (%) is mean microaveraged and macroaveraged precision. Derived weights were based on dividing precision values (Avg. precision) by the lowest precision value (in this case 6).

4. w134_12: This list was a combination of the two preceding lists. Weights for term type 1, 3, and 4 for single, phrase or Boolean term were multiplied by the weight for the type of class to which the term mapped – 1 or 2 for optional or main class.

5. wOrig: As used in the original term weighting scheme when the string-matching algorithm based on Ei was first applied (Koch and Ardö 2000). These weights were intuitively derived. They combined types of terms depending if it were a single-word term, Boolean or phrase, and whether the assigned class was main (MC) or optional (OC).

Page 66: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

255

Phrase Boolean Single OC 4 2 1MC 8 3 2

Table 3. Weights in the original algorithm.

6. w1234: With weights for different Ei term type as experimented with in Golub 2006c (captions are from the classification scheme, all others are thesau-rus terms).

7. w134_1234: This list was a combination of two previous lists, w134 and w1234. Weights for term type 1, 3, and 4 for single, phrase or Boolean term were multiplied by the weight for the type of Ei term as given in Table 4.

8. w134_12_1234: This list was a combination of two previous lists, w134_12 and w1234. Weights for term type 1, 3, and 4 for single, phrase or Boolean term were multiplied by the weight for the type of class to which the term mapped – 1 or 2 for optional or main class, and by the weight for the type of Ei term as given in Table 4.

9. wTf10: In this list weights were based on the number of words the term consisted of, and of the number of times each of its words occurred in other terms (cf. tf-idf, term frequency – inverse document frequency, Salton and McGill 1983, 63, 205). If f were the frequency with which a word w from the term t occurred in other terms, term t consisting of n words, then the weight of that term was calculated as follows:

weightt = log(n) · ( 1/fw1 + 1/fw2 +…+ 1/fwn )

Logarithm was applied in order to reduce the impact of parameter n, i.e., to avoid getting overly high weights for terms consisting of several sparse words. In order to get integers as weights, the weights were multiplied by 10, rounded and increased by 1 to avoid zeros. 10. wTf10Boolean: As in wTf10, with all the phrases modified into Boolean terms. This list was created in order to study the influence of phrases and Boolean terms on precision and recall.

11. wTf10Phrases: As in wTf10, with all the Boolean terms modified into phrases. This list was created in order to study the influence of phrases and Boolean terms on precision and recall.

12. wTf10_12: As in wTf10, with those weights mul-tiplied by the weight for the type of class to which the term maps – 1 or 2 for optional or main class. The multiplication was done before the rounding.

13. wTf10_1234: As in wTf10, with those weights multiplied by the weight for the type of relationship (Table 4). The multiplication was done before the rounding.

14. wTf10_12_1234: As in wTf10_12, with those weights multiplied by the weight for the type of rela-tionship (Table 4). The multiplication was done be-fore the rounding.

3.1.1 Stop-word list and stemming

Although the terms and captions in the Ei controlled vocabulary are usually noun phrases which are good content words, they can also contain words which are frequently used in many contexts and as such are not very indicative of any document’s topicality (e.g., word general in the Ei class caption Engineer-ing, General). Thus, a stop-word list was used. It contained 429 such words, and was taken from Onix text retrieval toolkit (Onix text retrieval toolkit). For stemming, the Porter’s algorithm (Porter 1980) was used. The stop-word list was applied to the term lists, and stemming to the term lists as well as docu-ments.

3.2 Cut-offs

In a previous experiment (Golub 2006c) cut-offs were not used–instead, all the classes that were found for a document were assigned to it. In the context of hierarchical browsing based on a classifi-cation scheme, having too many classes assigned to a document would place one document to many dif-ferent places, which would create the opposite effect of the original purpose of a classification scheme (grouping similar documents together). In the

Broader Captions Narrow er Preferred Related SynonymsAvg. precis ion (%) 10 43 25 39 10 35

Derived w eight 1 4 2 4 1 3

Table 4. Different types of thesaurus terms captions and their performance as a basis for weights.

Page 67: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

256

document collection, there were 2.2 human-assigned classes per document, and the aim of automated clas-sification was to achieve similar. The effect of several different cut-offs was investigated: 1. All automatically derived classes are assigned as

final ones (no cut-off). 2. In order to assign a certain class as final, the score

of that class had to have a minimum percentage of the sum of all the classes’ scores. Different values for the minimum percentage were tested: 1, 5, 10, 15 and 20, as well as some others (see section 4 Results).

3. The second type of cut-off in combination with the rule that if there were no class with the re-quired score, the one with the highest score would be assigned.

4. In order to follow the subject classification prin-ciple of always assigning the most specific class possible, the principle of score propagation was introduced. The principle was implemented so that the scores for classes at deeper hierarchical levels were a sum of their own score together with scores of classes at upper hierarchical levels if such were assigned.

3.3 Enriching the term list with new terms

In the previous experiment (Golub 2006c), highest achieved recall was 73% (microaveraged), when all types of terms were included in the term list. In or-der to further improve recall, the basic term list was enriched with new terms. These terms were ex-tracted from bibliographic records of the Com-pendex database, using multi-word morpho-syntac- tic analysis and synonym acquisition, based on the existing preferred and synonymous terms (as they gave best precision results).

Multi-word morpho-syntactic analysis was con-ducted using a parser FASTER (Jacquemin 1996) which analyses raw technical texts and, based on built-in meta-rules, detects morpho-syntactic vari-ants. The parser exploits morphological (derivational and inflectional) information as given by the database CELEX (Baayen et al. 1995). Morphological analysis was used to identify derivational variants, such as:

effect of gravity: gravitational effect architectural design: design of the proposed archi-

tecture supersonic flow: subsonic flow structural analysis: analysis of the structure

Syntactical analysis was used to:

a) insert word inside a term, such as: – flow measurement: flow discharge measure-

ments – distribution of good: distribution of the fin-

ished goods – construction equipment: construction re-

lated equipment – intelligent control: intelligent distributed

control

b) permute components of a term, such as: – control of the inventory: inventory control – flow control: control of flow – development of a flexible software: software

development

c) add a coordinated component to a term, such as: – project management: project schedule and

management – control system: control and navigation sys-

tem

Synonyms were acquired through a rule-based system SynoTerm (Hamon and Nazarenko 2001) which in-fers synonymy relations between complex terms by employing semantic information extracted from lexi-cal resources. First the documents were preprocessed and tagged with part-of-speech information and lem-matized. Then terms were identified through the YaTeA term extractor (Aubin and Hamon 2006). The semantic information provided by the database Word- Net (Fellbaum 1998; WordNet) was used as a boot-strap to acquire synonym terms of the basic terms. The synonymy of the complex candidate terms was assumed to be compositional, i.e., two terms were considered synonymous if their components were identical or synonymous (e.g., building components: construction components, building components: con-struction elements).

Although verification by a subject expert is desir-able for all automatically derived terms, due to lim-ited resources only the extracted synonyms were verified. Checking the synonyms is also most impor-tant since computing those leads to a bigger seman-tic shift than morphological and syntactical opera-tions do. The verification was conducted by a subject expert, a fifth-year student of engineering physics. Suggested synonym terms were displayed in the user interface of SynoTerm. The verification was not strict: derived terms were kept if they were semanti-cally related to the basic term. Thus, hyperonym

Page 68: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

257

(generic/specific) or meronym (part/whole) terms were also accepted as synonyms. The expert spent 10 hours validating the derived terms. Of the 292 automatically acquired synonyms, 168 (57.5%) were validated and used in the experiment.

4. Results 4.1 Improving F1 and precision: applying weights

and cut-offs

Based on each of the 14 term lists, the classification algorithm was run on the document collection of 35,166 documents (see 2.2). As described earlier (2.3.2), several aspects were evaluated and different evaluation measures were used; thus, for each term list, the following types of results were obtained:

1. min 1: if no classes were assigned because their final scores were below the pre-defined cut-off value (described in 3.2), the class with the highest score was assigned; 2. cut-off: the applied cut-off value; 3. min 1 correct: number of documents that were assigned at least one correct class; 4. min 1 auto: number of documents that were as-signed at least one class; 5. avg auto/doc: average number of classes that were assigned per document, based on documents that were assigned at least one class; 6. macroa P: macroaveraged precision; 7. macroa R: macroaveraged recall; 8. macroa F1: macroaveraged F1; 9. microa P: microaveraged precision; 10. microa R: microaveraged recall; 11. microa F1: microaveraged F1; 12. mean F1s: arithmetic mean of macroaveraged and microaveraged F1 values.

The same experiment was run on all the 14 term lists. For each term list, two parameters were varied: 1) whether min 1 was assigned or not; and, 2) the first two cut-off variants from section 3.2.

When looking at mean F1 values, the differences between the term lists are not larger than four per-cent. Performance of the different lists measured in precision and recall is also similar. Three lists that perform best in terms of mean F1 are w1234, w134_1234 and w134_12_1234 – all of them based on weights for different Ei term types. In compari-son to the baseline when no weights or cut-offs are used, an improvement of six percent is achieved

when using these three term lists. As expected, best precision results are gained when cut-off is highest, up to 0.37 macroaveraged, and best recall when there is no cut-off, up to 0.54.

When using cut-offs, two sets of experiments were conducted: one with assigning at least the class with highest score, and the other following the threshold calculation only. Because the former results in more documents with assigned correct classes, in further experiments the rule to assign at least the class with highest score is applied.

4.1.1 Stop words: removal and stemming

Next, the influence of stop-words removal and stemming was tested (as described in 3.1.1). For this experiment three lists that performed best in the previous one were chosen: w1234, w134_1234 and w134_12_1234. Every list was run against stop-words removed, stemming, and both the stop-words removed and stemming, each in combination with different cut-off values: 5, 10 and 15. Improvements when using either stemming or stop-words removal or both are achieved in majority of cases up to two percent. There is also a slight increase in the number of correctly found classes without finding more in-correct classes. The differences between the three term lists measured in mean F1 are minor – one or two percent. The best term list is w134_12_1234 used in combination with stemming and stop-words removal and cut-off 10 – best mean F1 is 0.24. For this list more cut-offs were experimented with for better results; the value of 9 proved to perform best but better only on a third decimal digit than that of 10. In the following experiments, unless specifically noted, we used the best-performing w134_12_1234 term list and setting (applying stemming and stop-words removal, cut-off 9).

4.1.2 Individual classes

It was shown that certain classes perform much bet-ter than the average. Performance of different classes varies quite a lot. For example, top three performing classes as measured in precision are different from top three classes for recall or F1: see Table 5.

4.1.3 Partial matching

As expected, the algorithm performs better when evaluation is based on partial matching between automatically and human-assigned classes (see sec-

Page 69: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

258

tion 2.3.2). As seen from Table 6, at the second hier-archical level F1 is up to 0.66 and at third 0.59. At the second hierarchical level the best F1 is achieved by classes Engineering mathematics (represented by notation 92) and General engineering (90). At the third hierarchical level, the class that performs best of all is 921 Applied Mathematics, while the worst one is 943 Mechanical and Miscellaneous Instruments. In conclusion, for the 14 classes at top three hierarchi-cal levels mean F1 is almost twice as good as for the complete matching, which implies that our classifica-tion approach would suit better those information systems in which fewer hierarchical levels are needed, like the Intute subject gateway on engineer-ing (Intute Consortium 2006).

The variations in performance between individual classes for both complete and partial matching are quite big, but at this stage it is difficult to say why. The two best-performing classes at the second hier-archical level have by far the smallest number of terms designating them (terms). However, in other cases there does not seem to be any correlation be-tween number of terms and performance, as also dis-covered in Golub 2006b. Further research is needed to explore what the factors contributing to perform-ance are.

4.1.4 Score propagation

A relevant subject classification principle is to always assign the most specific class available. This principle provided us with a basis for score propagation, in which scores of classes at narrower (more specific) hierarchical levels were increased by scores assigned to their broader classes (later referred to as “propa-

gated down”). In another run, this was slightly var-ied, so that the broader classes from which scores were propagated to their narrower classes were re-moved (“propagated down, broader removed”).

These types of score propagation were tested on the best performing term list and setting (w134_12 _1234 with stemming and stop-words removal). In complete matching, “propagated down” performs best. However, it is slightly worse than when not us-ing score propagation at all. In partial matching, both “propagated down” and “propagated down, broader removed” perform slightly better than the original on the first two or three hierarchical levels, and slightly worse on the fourth and fifth ones. These not-so-good results with score propagation can be partially explained by the fact that the term list contained both broader and narrower terms, which was done in order to achieve best recall (Golub 2006c).

4.1.5 Finding main classes

We further analyzed the degree to which the one most important concept of every document is found by the algorithm. To this purpose, a subset of (19,153) documents was used which had the human-assigned main class in class 9 (there is one main class per document). In complete matching, 78% of main classes are found when no cut-offs are applied. When cut-offs are applied, 22% of main classes are found. In partial matching, more main classes are found at the second and third hierarchical levels when using both types of score propagation, up to 59% and 38% respectively. Thus, score propagation could be used in services for which fewer hierarchical levels are needed (e.g., Intute Consortium 2006).

Precision - class: value Recall - class: value F1 - class: value

Cellular Manufacturing (913.4.3): 0.98 Amorphous Solids (933.2): 0.61 Crystal Grow th (933.1.2): 0.45

Electronic Structure of Solids (933.3): 0.97 Crystal Grow th (933.1.2): 0.52 Amorphous Solids (933.2): 0.44

Information Retrieval and Use (903.3):0.82 Manufacturing (913.4): 0.50 Optical Variables Measurement (941.1): 0.40

Table 5. Top three performing individual classes

F1terms

901 902 903 911 912 913 914 921 922 931 932 933 941 942 943 944F1 0.4 0.3 0.5 0.3 0.4 0.3 0.3 0.6 0.3 0.44 0.3 0.5 0.3 0.4 0.2 0.4

terms 275 241 163 237 596 393 696 628 220 1648 801 453 422 373 604 349

0.49679 1922 848 2902 17480.65 0.5 0.66 0.51

Instruments90 91 92 93 94

General Management Maths Physics

Table 6. Results for partial matching at the second and third hierarchical levels, and num-ber of terms per each class.

Page 70: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

259

4.1.6 Distribution of classes

Using the same best setting achieved so far, the algo-rithm was also evaluated for distribution of auto-matically assigned classes in comparison to that of the human-assigned ones. The comparison was based on how often two classes get assigned together when using the algorithm in comparison to when they get human-assigned. Figure 1 shows the frequency dis-tribution of assigned class pairs. The x-coordinate presents human-assigned class pairs ordered by de-

scending frequency. One point represents one class pair: e.g., the pair of classes 912.2 and 903 occurs most frequently in human-based classification (48 times, as marked on the y-coordinate) and is repre-sented by point 1 on the x-coordinate; point 500 on the x-coordinate represents the 913.5 and 911 pair that occurs 3 times, as marked on the y-coordinate. Thus, the smoothest line (Human-assigned) repre-sents the human-assigned classes. The minimum of 2538 pairs of classes that both the algorithm and people have produced are shown.

A correlation of 0.38 exists between the human-assigned classes and automatically assigned classes (Automated). However, for the 100 most frequent pairs, the correlation drops to 0.21. In the top 10 most frequent pairs of classes, there is no overlap at all. In conclusion, the distribution of human-assigned and automatically assigned classes is more correlated when looking at all pairs of classes occur-

ring together, but less so for more frequently occur-ring pairs.

4.1.7 Implications for application

Since automated classification algorithms can have

a number of different applications, it is important to emphasize that an algorithm can be adjusted for the specific application need. Here those applications are pointed out in which our algorithm was shown to yield promising results in terms of F1 and precision.

1. In all applications, best precision and F1 are achieved when applying the w134_12_1234 term list, together with stemming and stop-words re-moval.

2. In information systems such as Intute (Intute Consortium 2006), several broader hierarchical levels are used. To the purpose of such an applica-tion, the classification algorithm should be im-plemented so that only classes from top three hi-erarchical levels are used, but so that scores from classes at lower hierarchical levels are added to the final some of their broader classes.

3. In applications where classes at all hierarchical lev-els are needed, such as other hierarchical browsing systems, searching or machine-aided indexing softwares, cut-off level of nine, and the principle of assigning at least the class with highest score should be implemented. In addition, the choice can be made to assign only the class with highest score,

Figure 1. Frequency distribution of assigned pairs of classes (2538 pairs).

Page 71: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

260

i.e., the class with highest probability that it is cor-rect, as it is done in the Thunderstone’s web site catalog (Thunderstone 2005). Alternatively, the classes can be ranked in descending order based on the score indicating the probability that the docu-ment is dealing with the topic designated by the class.

4.2 Enhancing the term list with new terms

In the previous experiment (Golub 2006c), highest achieved recall was 73% (microaveraged), when all types of terms were included in the term list. In or-der to further improve recall, the basic term list was enriched with new terms. These terms were ex-tracted from bibliographic records of the Com-pendex database, using multi-word morpho-syntac- tic analysis and synonym acquisition, based on the existing preferred and synonymous terms (as they gave best precision results). The number of terms added to the term list was as follows:

1. Based on multi-word morpho-syntactic analysis:

– derivation: 705, out of which 93 adjective to noun, 78 noun to adjective, and 534 noun to verb;

– permutation: 1373; – coordination: 483; – insertion: 742; and – preposition change: 69.

2. Based on semantic variation (synonymy): 292 automatically extracted, out of which 168 were verified as correct by the subject expert.

In order to examine the influence of different types of extracted terms, nine different term lists were cre-

ated and the classification was based on each of them. It was shown that the number of terms is not proportional to performance, e.g., permutation-based extraction comprises 1373 terms, and, when stemming is applied, has performance as measured in mean F1 of 0.02, whereas coordination comprises 403 terms, with performance of 0.07. These two cases can be explained by the fact that permutation also implies variation based on insertion and preposi-tion change (e.g., engineering for commercial win-dow systems: system engineering) which leads to bigger semantic shift than the identification of term variant based on the coordination. By combining all the extracted terms into one term list, the mean F1 is 0.14 when stemming is applied, and microaveraged recall is 0.11, which would imply that enriching the original Ei-based term list with these newly extracted terms should improve recall. In comparison to re-sults gained in Golub 2006c, where microaveraged recall with stemming is 0.73, here the best recall, also microaveraged and with stemming, is 0.76.

The next step was to assign appropriate weights to the newly extracted terms (Table 7). We used the w134_12_1234 term list, earlier shown to perform best. The result as measured in mean F1 is the same as in the original, 0.24 (cut-off 10, stemming applied but not stop-word removal). The difference is that recall and the number of correctly assigned classes increases by 3%, but precision decreases. Thus, de-pending on the final application, terms extracted in this way could be added to the term list or not.

4.2.1 Implications for application

Enriching the term list with terms extracted using multi-word morpho-syntactic analysis and synonym

stemming no yes no yesstop-words out no no yes yes

min 1 correct 24479 29639 26039 30466min 1 auto 34086 34966 34425 34987

avg auto/doc 16.79 28.61 18.06 29.68macroa P 0.11 0.09 0.11 0.09macroa R 0.54 0.71 0.55 0.72

macroa F1 0.19 0.16 0.18 0.15microa P 0.07 0.06 0.07 0.06microa R 0.55 0.73 0.59 0.76

macroa F1 0.13 0.11 0.13 0.10mean F1 0.16 0.13 0.16 0.13

all combined

Table 7. Performance of the w1 term list enriched with all automati-cally extracted terms.

Page 72: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

261

acquisition slightly improves recall. At the same time, precision decreases. Thus, enhancing the term list in this way would be appropriate for applications such as focused crawling, when the purpose is to crawl as many documents as possible and precision is less important. To maximize recall, no weights or cut-offs need be applied.

4.3 Term analysis and shortened term lists

In the original term list there were 4,411 distinct terms. In the document collection, 53% of them were found. The average length of the terms found was between one and two words, while the longer ones were less frequently found. Of the terms found in the collection, based on 16% of them correct classes were always found, while based on 43% of them incorrect classes were always found. For a sample of documents containing terms that were shown to always yield incorrect results, we had a male subject expert confirm whether the documents were in the wrong class according to his opinion. For 10 always-incorrect terms with most frequent occur-rences, the subject expert looked at 30 randomly se-lected abstracts containing those terms. Based on his judgments, it was shown that 24 out of those 30 documents were indeed incorrectly classified, but there were also 6 which he deemed to be correct. This is another indication of how problematic it is to evaluate subject classification in general, and auto-mated subject classification in particular. Perhaps one way would be to have a number of subject experts agree on all the possible subjects and classes for every document in a test collection for automated classification; another way could be to evaluate automated classification in context, by end-users.

Based on the term analysis, three new term lists were extracted from the original one, and tested for performance:

1. Containing only those terms that found classes

which were always correct (1,308 terms). When cut-off is between 5 and 10, macroaveraged preci-sion reaches 0.89, and microaveraged 0.99, when neither stemming nor stop-words removal are ap-plied. Stemming does not really improve general performance because recall increases only little, by 0.03, while precision decreases by 0.2. However, when using only those 1,308 terms, only 5% of documents are classified. The best mean F1, 0.15, is achieved when stemming and the stop-word removal are used.

2. Containing those terms that found classes which were correct in more instances than they were in-correct (1,924 terms). This list yields best mean F1, 0.38. This value is achieved when stemming is used but no stop-words are removed. There are 65% of documents that are classified, with the av-erage number of classes 1.7. When stemming is not used, precision levels are 0.75 for microaver-aged, and 0.79 for macroaveraged.

3. Containing all terms excluding those that found classes which were always incorrect (4,751 terms). The mean F1 is 0.25, when cut-off is 10 and both stop-words removal and stemming are used. The slight improvement in comparison to the original list is due to increase in precision.

4.3.1. Implications for application

Using the same w134_12_1234 term list, apart from by using only weights and cut-offs, precision and F1 are further improved by exclusion of terms that al-ways yield incorrect classes. This setting improves precision without degrading recall, so it should be used in applications when either, or both, are impor-tant. The best F1 throughout the whole experiment is achieved when terms that yield incorrect classes in majority of cases are excluded.

5. Conclusion

The study showed that the string-matching algo-rithm could be enhanced in a number of ways:

1. Weights: adding different weights to the term list

based on whether a term is single, phrase or Boo-lean, which type of class it maps to, and Ei term type, improves precision and relevance order of assigned classes, the latter being important for browsing;

2. Cut-offs: selecting as final classes those above a certain cut-off level improves precision and F1;

3. Enhancing the term list with new terms based on morpho-syntactic analysis and synonyms acquisi-tion improves recall;

4. Excluding terms that in most cases gave wrong classes yields best performance in terms of F1, where the improvement is due to increased preci-sion levels.

The best achieved recall is 76%, when the basic term list is enriched with new terms, and precision 79%, when only those terms previously shown to yield

Page 73: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

262

correct classes in the majority of documents are used. Performance of individual classes, measured in precision, is up to 98%. At third and second hierar-chical levels mean F1 reaches up to 60%.

These results are comparable to machine-learning algorithms (see, for example, Sebastiani 2002), which require training documents and are collection-dependent. Another benefit of classifying docu-ments into classes of well-developed classification schemes is that they are suitable for subject brows-ing, unlike automatically-developed controlled vo-cabularies or home-grown directories often used in document clustering and text categorization (Golub 2006a).

The experiment has also shown that different ver-sions of the algorithm could be implemented so that it best suits the application of the automatically clas-sified document collection. If the application re-quires high recall, such as, for example, in focused crawling, cut-offs would not be used. Or, if one pro-vides directory-style browsing interface to a collec-tion of automatically classified web pages, web pages could be ranked by relevance based on weights. In such a directory, one might want to limit the number of web pages per class, e.g., assign only the class with highest probability that it is correct, as it is done in the Thunderstone’s web site catalog (Thunderstone 2005).

References

Aitchinson, Jean, Gilchrist, Alan, Bawden, David.

2000. Thesaurus construction and use: a practical manual, 4th ed., Aslib, London.

Aubin, Sophie, and Hamon, Thierry. 2006. Improv-ing term extraction with terminological resources. Proceedings of the 5th International Conference on NLP, FinTAL, pp. 380-387.

Baayen, R.H., Piepenbrock, R., and Gulikers, L. 1995. The CELEX lexical database, release 2, Lin-guistic Data Consortium, University of Pennsyl-vania, Philadelphia, PA. [CD-ROM].

Bang, Sun Lee, Yang, Jae Dong, and Yang, Hyung Jeong. 2006. Hierarchical document categoriza-tion with k-NN and concept-based thesauri. In-formation processing and management 42: 387-406.

Chen, Hao, and Dumais, Susan.T. 2000. Bringing or-der to the web: automatically categorizing search results. In T. Turner and G. Szwillus, eds., CHI '00: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York: ACM Press, 145-152.

Engineering Information. 2006. Compendex, Engi-neering Information, Elsevier, available at: http:// www.ei.org/databases/compendex.html (accessed 30 June 2006). engineering/ (accessed 30 August 2007).

Fellbaum, Christiane. 1998. WordNet: an electronic lexical database, MIT Press, Cambridge, MA.

Golub, Koraljka. 2006a. Automated subject classifi-cation of textual web documents. Journal of docu-mentation 62: 350-71.

Golub, Koraljka. 2006b. Automated subject classifica-tion of textual web pages, based on a controlled vo-cabulary: challenges and recommendations. New review of hypermedia and multimedia 12: 11-27.

Golub, Koraljka. 2006c. The role of different thesauri terms in automated subject classification of text. In T. Nishida, ed., 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings) (WI '06): Pro-ceedings: 18-22 December 2006, Hong Kong, China. Los Alamitos, Calif.: IEEE Computer So-ciety, 961-65.

Hamon, Thierry, Nazarenko, Adeline. 2001. Detec-tion of synonymy links between terms: experi-ment and results. Recent advances in computa-tional terminology, ed. Didier Bourigault et al. Amsterdam: John Benjamins, pp. 185-208.

International Standards Organization. 1985. Docu-mentation–methods for examining documents, de-termining their subjects, and selecting index terms: ISO 5963, Geneva, ISO.

Intute Consortium. 2006. Intute: science, engineering and technology – engineering, available at: http:// www.intute.ac.uk/sciences/ (accessed 30 August 2007).

Jacquemin, Christian. 1996. A symbolic and surgical acquisition of terms through variation. Connec-tionist, statistical and symbolic approaches to learn-ing for natural language processing, ed. Stefan Wermter et al. Berlin: Springer, pp. 425-38.

Jain, Anil K., Murty, M. Narasimha, and Flynn, Pat-rick J. 1999. Data clustering: a review. ACM Com-puting Surveys 31: 264-323.

Koch, Traugott, and Ardö, Anders. 2000. Automatic classification. DESIRE II D3.6a, Overview of re-sults), available at: http://www.it.lth.se/knowlib/ DESIRE36a-WP2.html (accessed 29 November 2007).

Lancaster, F.W. 2003. Indexing and abstracting in the-ory and practice, 3rd ed, Facet, London.

Lewis, David D., Yang, Yiming, Rose, Tony G., and Li, Fan. 2004. RCV1: a new benchmark collection

Page 74: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 K. Golub, Th. Hamon, A. Ardö. Automated Classification of Textual Documents

263

for text categorization research. The journal of machine learning research 5: 361-97.

Markey, Karen. 1984. Interindexer consistency tests: a literature review and report of a test of consis-tency in indexing visual materials. Library & in-formation science research 6: 155-77.

Medelyan, Olena, and Witten, Ian H. 2006. Thesau-rus based automatic keyphrase indexing. In Gary Marchionini, Michael L. Nelson, and Cathy Mar-shall, eds., 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons. New York: ACM Press, 296-97.

Milstead, Jessica L. ed. 1995. Ei thesaurus, 2nd ed. Hoboken, NJ: Engineering Information Inc.

Olson, Hope A., and Boll, John J. 2001. Subject analysis in online catalogs, 2nd ed., Libraries Unlimited, Englewood, CO.

Onix text retrieval toolkit: Stop word list 1. available at: http://www.lextek.com/manuals/onix/ stopwords1. html (accessed 29 November 2007).

Porter, Martin F. 1980. An algorithm for suffix stripping. Program 14 no. 3: 130-37.

Salton, Gerard, and McGill, Michael J. 1983. Intro-duction to modern information retrieval, McGraw-Hill, Auckland.

Sebastiani, Fabrizio. 2002. Machine learning in automated text categorization. ACM computing surveys 34: 1-47.

Svenonius, Elaine. 1997. Definitional approaches in the design of classification and thesauri and their implications for retrieval and for automatic classi-fication. In I.C. McIlwaine, ed., Knowledge organi-zation for information retrieval: Proceedings of the Sixth International Study Conference on Classifica-tion Research held at University College London, 16-18 June 1997. The Hague: International Federa-tion for Information Documentation, 12-16.

Svenonius, Elaine. 2000. The intellectual foundations of information organization, MIT Press, Cam-bridge, MA.

Thunderstone. 2005. Thunderstone’s Web Site Cata-log, available at: http://search.thunderstone.com/ texis/websearch (accessed 29 November 2007).

WordNet, “WordNet Search”, available at: http:// wordnet.princeton.edu/perl/webwn (accessed 29 November 2007).

Yang, Yiming. 1999. An evaluation of statistical ap-proaches to text categorization. Journal of infor-mation retrieval 1: 67-88.

Page 75: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Book Reviews

264

Book Reviews Edited by Clément Arsenault

Book Review Editor

Murtha Baca, Patricia Harping, Elisa Lanzi, Linda McCrea, and Ann Whiteside (eds.). Cataloging Cul-tural Objects: A Guide to Describing Cultural Work and Their Images. Chicago: American Library Asso-ciation, 2006. 396 p. ISBN 978-0-8389-3564-4 (pbk.)

At a time when cataloguing code revision is continu-ing apace with the consolidation of the International Standard Bibliographic Description (ISBD), the draf-ting of RDA: Resource Description and Access, and the development of common principles for an internatio-nal cataloguing code (International Meeting of Ex-perts on an International Cataloguing Code [IME ICC]), the publication of a guide for cataloguing cul-tural objects is timely and purposeful. Compiling this data content standard on behalf of the Visual Resour-ces Association, the five editors—with oversight from an advisory board—have divided the guide into three parts. Following a brief introduction outlining the purpose, intended audience, and scope and me-thodology for the publication, Part One, General Guidelines, explains both what the Cataloging Cultu-ral Objects (CCO) guide is—“a broad document that includes rules for formatting data, suggestions for re-quired information, controlled vocabulary require-ments, and display issues” (p. 1)—and is not—“not a metadata element set per se” (p. 1). Part Two, Ele-ments, is further divided into nine chapters dealing with one or more metadata elements, and describing the relationships between and among each element. Part Three, Authorities, discusses what elements to include in building authority records. A Selected Bi-bliography, Glossary, and Index, respectively, round out the guide.

As the editors note in their introduction, “Stan-dards that guide data structure, data values, and data content form the basis for a set of tools that can lead to good descriptive cataloging, consistent documen-tation, shared records, and increased end-user access” (p. xi). The VRA Core Categories, for example, re-present a set of metadata elements expressed within an XML structure (data structure). Likewise, the Art & Architecture Thesaurus contains sets of terms and

relationships, or defined data values. While much ef-fort has been expended on developing both data structures and values, the editors argue, the third leg of the stool, data content, has received less attention. Unlike the library community with its Anglo-American Cataloging Rules [sic—though RDA is refe-renced in the Selected Bibliography], or its archival equivalent, Describing Archives: A Content Standard (DACS), those in the domain of cultural heritage re-sponsible for describing and documenting works of art, architecture, cultural artifacts, and their respecti-ve images, have not had the benefit of such data con-tent standards. CCO is intended to address (or re-dress) that gap, emphasizing the exercise of good judgment and cataloguer discretion over the applica-tion of “rigid rules” [p. xii], and building on existing standards.

Part One, General Guidelines, sets the foundation. Beginning with the question, “What are you Catalo-guing?”, this 41-page section articulates the difference between a work and an image, and continues with what institutions need to consider in determining what kinds of, and how much information to include in, a minimal description for a Work Record–elements subsequently covered in Chapters 1–8 of Part 2—an Image Record—dealt with in Chapter 9 of Part 2—records for a group, collection, or series of cultural objects, and related works, or, “those having an important conceptual relationship to each other” (p. 13). Less familiar, perhaps, to the eyes of those re-sponsible for bibliographic or archival description, is the inclusion of recommendations concerning databa-se design, field structures, database construction, and the purpose of a database–as a cataloguing tool? col-lection management system? digital asset manage-ment system? online catalogue? This latter part, while a useful inclusion, seems somewhat contradictory within a set of guidelines that profess to be “system independent”. Part One concludes with definitions of, and guidelines for, creating and maintaining con-trolled vocabularies and authority files, respectively. Examples of work records (Figures 1–7), and a work record with two related image records (Figure 8) pro-

Page 76: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Book Reviews

265

vide concrete, visual samples of the issues covered throughout the General Guidelines, and foreshadow the part to follow.

Part Two, Elements, provides (1) definition, con-text, and terminology, (2) cataloguing rules, and (3) guidelines on presentation of data for each of eight broad metadata element types, grouped by purpose, and associated with a work record (e.g., object na-ming [work type/title]; creator information [crea-tor/creator role]; stylistic, cultural, and chronologi-cal information [style/culture/date]; subject; etc.). The ninth chapter, view information elements, ad-dresses how to describe aspects of a work as captu-red in its surrogate, an image of the work. Each chapter within Part Two concludes with illustrated examples, again, to reinforce concepts and applicati-ons discussed relative to a particular element set. Those expecting the inclusion of administrative, structural, and/or technical metadata for creating and managing digital repositories, will be disappoin-ted. The list of elements in Part Two is explicitly re-stricted to descriptive metadata.

Part Three, Authorities, follows a similar format as Part Two, including discussion and terminology, edi-torial rules, and presentation of data for (1) personal and corporate name authority, (2) geographic place authority, (3) concept authority, and (4) subject authority. As with Part Two, examples liberally popu-late the text of each chapter, with specific illustrations of the four types of authority record coming at the end of respective chapters 1–4.

The consistent formatting of chapters within the text, overall, ensures that perspective cataloguers un-derstand the meaning, context, terminology, and app-lication of guidelines for descriptive metadata and authority control. Thus, in its own internal structure, CCO remains true to its stated objective of promo-ting consistency of interpretation and implementati-on. Bolded recommendations throughout Part One are, in some instances broad level—“CCO recom-mends good and versatile database design and consi-stent cataloging rules” (p. 25)—and in others, appro-priately specific—“Because of the complexity of cul-tural information and the importance of Authority Records, CCO recommends using a relational data-base” (p. 20). Regardless of their degree of specificity, recommendations provide clear, logical, and princi-ples-based guideposts for both institutions and indi-vidual cataloguers, alike. They also provide context for the series of “rules” which follow in Parts Two and Three. The rules, while named as such, and arti-culated in a prescriptive tone, are discussed and pre-

sented throughout in a spirit of “recommended best practice”. This is to allow for individual institutions to “make and enforce” local rules that accommodate their requirements and those of their end-users most effectively and efficiently (p. 2).

This manual will serve as an important tool for museum documentation specialists, visual resources curators, archivists, librarians, or others responsible for providing descriptive metadata and authority con-trol for a variety of cultural objects, including archi-tecture, paintings, sculpture, prints, manuscripts, photographs and other visual media, performance art, archeological sites and artifacts, and different func-tional objects associated with material culture. While its coverage is impressively wide-ranging, CCO is not intended for natural history or scientific collections.

Cataloging Cultural Objects, in linking the work of cataloguers from different institutional contexts, pro-vides a timely and useful content standard for cross-domain application. It also serves as an effective tea-ching tool for those who recognize and value, less the location—museum, archive, library—where descripti-ve metadata are to be assigned, and more the purpose for which they are intended, namely to facilitate ac-cess to, and sharing of both records and their cor-responding objects. While this reviewer would have appreciated more than a “Selected Bibliography”, and an expanded Glossary (e.g., where is a definition of “format controlled” among “controlled fields”, “con-trolled list”, and “controlled vocabulary”?), the inclu-sion of additional specialized sources for cataloguing museum collections, and within-chapter references to standard tools for particular metadata elements, are especially foresighted, and commendable. There is mention throughout the text of a “CCO website”. A URL or other link eluded this reviewer, though a Google™ search led to http://vraweb.org/ccoweb/ cco/index.html [accessed September 28, 2007].

Overall, Cataloging Cultural Objects with its atten-ding guidelines for descriptive metadata and authority control for “one-of-a-kind cultural objects” should merit a place among the “well-established” data con-tent standards of the library and archival communities that CCO references with obvious regard.

Lynne C. Howarth Professor Faculty of Information Studies University of Toronto 140 St. George Street, Toronto, Ontario M5S 3G6, Canada E-mail: [email protected]

Page 77: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Book Reviews

266

Patrick Lambe. Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness. Oxford: Chandos, 2007. xix, 277 p. ISBN 978-1-84334-228-1 (hbk.); 978-1-84334-227-4 (pbk.)

The knowledge and information world we live in can rarely be described from a single coherent and pre-dictable point of view. In the global economy and mass society, an explosion of knowledge sources, dif-ferent paradigms and information-seeking behaviors, fruition contexts and access devices are overloading our existence with an incredible amount of signals and stimulations, all competing for our limited atten-tion. Taxonomies are often cited as tools to cope with, organize and make sense of this complex and ambiguous environment.

Leveraging an extensive review of literature from a variety of disciplines, as well as a wide range of rele-vant real-life case studies, Organising Knowledge by Patrick Lambe has the great merit of liberating tax-onomies from their recurring obscure and limitative definition, making them living, evolving and working tools to manage knowledge within organizations. Primarily written for knowledge and information managers, this book can help a much larger audience of practitioners and students who wish to design, de-velop and maintain taxonomies for large-scale coor-dination and organizational effectiveness both within and across societies. Patrick Lambe opens ours eyes to the fact that, far from being just a synonym for pure hierarchical trees to improve navigation, find-ability and information retrieval, taxonomies take multiple forms (from lists, to trees, facets and system maps) and play different roles, ranging from basic in-formation organization to more subtle tasks, such as establishing common ground, overcoming bounda-ries, discovering new opportunities and helping in sense-making.

Over the course of the book, a number of miscon-ceptions haunting taxonomy work are addressed and carefully dispelled. Taxonomy development is often thought to be an abstract task of analyzing and classi-fying entities, performed in complete isolation. On the contrary, taxonomies are to a large extent prod-ucts of users’ perceptions and worldviews, strongly influenced by the pre-existing information infrastruc-ture. They can also be dangerous tools having the po-tential to reveal and clarify but also to exclude and conceal critical details that can have a large impact on basic business activities such as managing risk, con-trolling costs, understanding customers and support-ing innovation.

If the first part of the book introduces concepts, provides definitions and challenges wrong assump-tions about taxonomies and the work of taxonomy-building, the second one takes us step-by-step through a typical project. From here on, insights be-come part of practicable frameworks that form the basis of a concrete information-management strategy and process so flexible so as to be used in very differ-ent organizational environments and scenarios. Start-ing from the definition of stakeholders, purpose and scope and ending with deployment, validation and governance, a taxonomy-building project is realisti-cally presented as an iterative and fascinating journey over competing needs, changing goals, mixed cues and technical and cognitive constraints.

Beyond introducing fundamental guiding princi-ples and addressing relevant implementation chal-lenges, Organising Knowledge provides a large dose of political and pragmatic advice to make your efforts useful in contributing to the overall knowledge and information infrastructure. Taxonomies, much like architect’s blueprints, only represent theory until they are implemented in practice involving real people and real content. As Lambe explains, this step re-quires crossing over to the other side of the barricade, wearing the user’s shoes and constructing an infor-mation neighborhood, designing and populating a metadata framework, solving usability issues and suc-cessfully dealing with records management and in-formation architecture concerns.

While each single paragraph of the book is packed with valuable advice and real-life experience, I con-sider the last chapter to be the most intriguing and ground-breaking one. It’s only here that taxonomists meet folksonomists and ontologists in a fundamental attempt to write a new page on the relative position between old and emerging classification techniques. In a well-balanced and sober analysis that foregoes excessive enthusiasm in favor of more appropriate considerations about content scale, domain maturity, precision and cost, knowledge infrastructure tools are all arrayed from inexpensive and expressive folk-sonomies on one side, to the smart, formal, machine-readable but expensive world of ontologies on the other. In light of so many different tools, informa-tion infrastructure clearly appears more as a complex dynamic ecosystem than a static overly designed en-vironment. Such a variety of tasks, perspectives, work activities and paradigms calls for a resilient, adaptive and flexible knowledge environment with a minimum of standardization and uniformity. The right mix of tools and approaches can only be deter-

Page 78: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Book Reviews

267

mined case by case, by carefully considering the par-ticular objectives and requirements of the organiza-tion while aiming to maximize its overall perform-ance and effectiveness.

Starting from the history of taxonomy-building and ending with the emerging trends in Web tech-nologies, artificial intelligence and social computing, Organising Knowledge is thus both a guiding tool and

inspirational reading, not only about taxonomies, but also about effectiveness, collaboration and finding middle ground: exactly the right principles to make your intranet, portal or document management tool a rich, evolving and long-lasting ecosystem.

Emanuele Quintarelli E-mail: [email protected]

Page 79: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 ISKO News

268

ISKO News Edited by Hanne Albrechtsen

Communications Editor

ISKO’s Nordic chapter was founded November

8th 2007 in Copenhagen and covers Sweden, Den-mark, Norway, Finland, Iceland & the Faroe Islands. We will eventually approach the Baltic countries and determine their interest in the project. Its first board constituted itself with Mikkel Christoffersen (DK) as chairman and as board members professor Birger Hjørland (DK), Hanne Albrechtsen (DK) and Per Nyström (SWE). 22 people have so far expressed in-terest in the chapter, but before we see who pays the membership fee for 2008, we do not yet know how many members we will be.

We will establish a web presence soon, and the plan is to hold a conference in odd years with the

first one being 2009 in Sweden. The theme of the first conference will probably be whether there is a Nordic school of thought in knowledge organisa-tion. We hope the chapter will bring together the Nordic researchers in KO and facilitate more com-munication and exchange of ideas as well as coopera-tion and general awareness of ongoing research and the involved parties therein.

Mikkel Christoffersen, ph.d.-stipendiat Danmarks Biblioteksskole / Royal School of Library and Information Science

Page 80: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

269

Knowledge Organization Literature Ia C. McIlwaine: Literature Editor

Assisted by: Marie Baliková, Victoria Frâncu, Claudio Gnoli, Ágnes Barátné Hajdu, John McIlwaine, Gerhard Ri-esthuis, Aida Slavic, Rosa San Segundo, Alenka Sauperl, Nancy Williamson. Without their assistance the task would not be possible, and their help is greatly appreciated, as would be contribu-tions from any other willing person. ICM 0 Form division 02 Literature Reviews in Knowledge Organization 0479 021 Miksa, F. – “The power to name”: a review essay (Lang.: eng). – In: Libraries and the Cultural Record, 42(2007)1, p.75-79. 0480 021;182 Broughton, V. – Classification and subject organization and retrieval. British librarianship and information work, 1991-2000; ed. J. H. Bowman (Lang.: eng). – London: Ashgate Publishing, 2006, p.494-516. 0481 021;182 Broughton, V. - Classification and subject organization and retrieval. British librarianship and information work, 2001-2005; ed. J. H. Bowman (Lang.: eng). London: Ashgate Publishing, 2007, p.467-488. 0482 026 Andrews, J.E. – (Book review of) Spink, A., Cole, C., eds. - New directions in cognitive information – Dordrecht: Springer, 2005 - viii, 250 p. - ISBN: 140204013X (HB); 9781402040139 (HB); 1402040148 (e-book); 9781402040146 (e-book) (Lang.: eng). - In: LIBRES: Li-brary & Information Science Research, 29(2007)1, p.146-147. 04 Universal Classification Systems 0483 042.1 Universal'naja destjatičnaja klasifikacija (UDK): T. 5: 61 Medicinske nauki [Universal Decimal Classification: Vo-lume 5: 61 Medical sciences]. 4th full ed. (Lang.: rus). Edi-tor in chief Ju. M.Arskij. - Moscow: VINITI; RAN, 2006. - 305p. (Publication No: UDC-PO51).

0484 042.1 Universal'naja destjatičnaja klasifikacija (UDK): T.8: 66 Himničeskaja tehnologija. Himničeskaja prom'išlenost' . Piš-čevaja prom'išlennost'. Metallurgija. Rodstvenn'ie otrasli [Universal Decimal Classification: Volume 8: 66: Chemical technology]. 4th full ed. (Lang.: rus). Editor in chief Ju. M. Arskij. - Moscow: VINITI; RAN, 2007. - 310p. (Publicati-on No: UDC-PO51). - ISBN 5-94577-031-0. 0485 042.2 Universalioji dešimtainė klasifikacija (UDK) : lentelės mok-slinėms bibliotekom [Universal Decimal Classification: standard edition] (Lang.: lith). - Vilnius: Lietuvos nacion-alinė Martyno Mažvydo biblioteka, 2006. – 2 vols. - ISBN: 9955-541-56-3. 0486 042.3 Universal'naja destjatičnaja klasifikacija (UDK): Vtoroe sokraščenn'ie tablic'i [Universal Decimal Classification: Abridged Tables]. Editor in chief Ju. M. Arskij (Lang.: rus). - Moscow: VINITI; RAN, 2006. - 150p. 0487 042.5 UDK Täiendusvihik [UDC update according to the Extensions and Corrections to the UDC 20 – 27]. Online edition. Translated and edited by Katrin Karus and Sirje Nilbe (Lang.: est). - Tallin: Eesti Raamatukoguhoidjate Ühing; ELNET Konsortsium, 2007. - 58p. URL: http://www2.nlib.ee/ERY/liigit_marks_toimk/UDK _TV.pdf 0488 042.5 Universal'naja destjatičnaja klasifikacija (UDK): izmenene-nija i dopolnenija: v'ipusk 4 [Universal Decimal Classifica-tion: corrections and extensions: issue 4]. Prepared by Rossijskaja akademija nauk, VINITI. Editor in chief: Ju. M. Arskij. (Lang.: rus). - Moscow: VINITI, 2006. - 145p. 0489 042.5 Universal'na desjatkova klasifikacija (UDK): zmini ta dop.(1998-1999, 2001-2002) [Universal Decimal Classifi-cation: corrections and extensions]. M. I. Ahverdova [ed.] (Lang.: ukr).- Kiev: Knižkova palata Ukraini, 2006. - 199 p.- ISBN 966-647-065-9 06 Conference Reports and Proceedings 0490 06.04-07-06.14/15 Smiraglia, R. – A glimpse at knowledge organization in North America (Lang.: eng). – In: Knowledge Organiza-tion, 34(2007)2, p.69-71.

Page 81: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

270

07 Textbooks (whole field) 0491 07.23 Broughton, V. - Essential thesaurus construction (Lang.: eng). - London: Facet Publishing, 2006. – viii, 296p.- ISBN: 13:978-1-85604-565-0 Book reviews by: Quinn, S. (0492) - Words at work (Lang.: eng). – In: Aus-tralian Library Journal, 56(2007)2, p.183-184. Trickey, K.V. (0493) (Lang.: eng). – In: New Library World, 108(2007)3/4, p.190-191. 0494 07.3 Taylor, A.G., Miller, D. P. - Introduction to cataloging and classification. 10th ed. (Lang.: eng). - Westport, CN.: Li-braries Unlimited, 2006. - xviii, 589 p. - ISBN: 159158230X;1591582350. Book reviews by: Conway, C.N. (0495) (Lang.: eng). – In: Reference and User Services Quarterly, 46(2007)3, p.104-105; Intner, S.S. (0496) (Lang.: eng). - In: Technicalities, 27(2007)2, p.19-20. 1 Theoretical Foundations and general Problems 11 Order and Knowledge Organization 0497 111 Du Preez, M. - (Book review of) Nissen, M.E. - Harnessing knowledge dynamics: principled organizational knowing & learning – Hershey, PA: IRM Press, 2006. - xix, 278 p. - ISBN: 1591407737;1591407745;1591407753 (ebook) (Lang.: eng). – In: The Electronic Library, 25(2007)1, p.118-119. 0498 111 Salo, J.- A conceptual model of trust in the online environ-ment (Lang.: eng). – In: Online Information Review, 31(2007)5, p.604-621. 12 Conceptology in Knowledge Organization 0499 122 Barátné Hajdu, Á. - Human perception and knowledge or-ganization: visual imagery (Lang.: eng). – In: Library Hi Tech, 25(2007)3, p.338-351. 0500 122 Barátné Hajdu, Á. - A percepció és megjelenítés jelentősége az információkereső nyelvekben [The importance of perception and visualisation in information retrieval] (Lang.: hun). In: Tudományos és Műszaki Tájékoztatás, 54(2007)10. URL: http://tmt.omikk.bme.hu/show_news.html?id=4785 &issue_id=487 0501 122 Karamuftuoglu, M. - Need for a systemic theory of classifica-tion in information science (Lang.: eng). – In: Journal of the American Society for Information Science & Technology, 58(2007)13, p.1977-1987.

0502 122 Ponnusamy, R., Gopal, T.V. – A concept matrix based ap-proach (Lang.: eng). – In: Information studies, 12(2006)3, p.179-195. 13 Mathematics in Knowledge Organization 0503 131 Wang, Tai-Yue, Chiang, Huei-Min - Fuzzy support vector machine for multi-class text categorization. (Lang.: eng). – In: Information Processing & Management, 43(2007)4, p.914-929. 14 System Theory and Knowledge Organization 0504 147 Manevitz, L., Yousef, M. - One-class document classification via Neural Networks (Lang.: eng). – In: Neurocomputing, 70(2007)7/9, p.1466-1481. 0505 149;918 Angrosh, M. L., Urs, S. R. - Ontology-driven knowledge management systems for digital libraries: towards creating semantic metadata-based information services (Lang.: eng). – In: Information Studies, 12(2006)3, p.151-168. 0506 149 Kasten, J. – Thoughts on the relationship of knowledge or-ganization to knowledge management (Lang.: eng). – In: Knowledge Organization, 34(2007)1, p.9-15. 15 Psychology and Knowledge Organization 0507 157 Chen, Z., Lu, K. - A preprocess algorithm of filtering irrele-vant information based on the minimum class difference (Lang.: eng). – In: Knowledge-based Systems, 19(2006)6, p.422-429. 18 Classification and Indexing Research 0508 182 Panici, A. - Noutăţile catalogării, clasificării şi indexării re-surselor bibliografice [Novelties in cataloguing, classification and indexing of the bibliographic resources] (Lang.: rom). - In: Magazin Bibliologic, 1 (2006), p.23-26 . 19 History of Knowledge Organization 0509 191 Van der Linden, H. M.M. - De actualiteit van een 19e eeu-wse classificatietheorie in de digitale wereld: over brievenbus-sen en andere ordeningen [The actuality of a 19th-century classification theory in the digital world: about letter boxes and other forms of organization] (Lang.: du). – In: Infor-matie Professional, 11(2007)7/8, p.12-17.

Page 82: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

271

2 Classification Systems and Thesauri, Structure and Construction 21 General Problems of Classification Systems and Thesauri 0510 211 Dalbin, S. - Thesaurus et informatique documentaires: parte-naires de toujours? / Dokumentarische Thesauri und doku-mentarische Informatik: Partner fur immer? / Tesauros e in-formatica documentales: socios desde siempre? / The thesau-rus and the digital library: still partners? (Lang. : fr). – In: Documentaliste - Sciences de l'Information, 44(2007)1, p.42-55. 0511 211 Dalbin, S. - Thesaurus et informatique documentaires: des Noces d'Or. / Dokumentarischen Thesauri und dokumenta-rische Informatik: die Goldene Hochzeit. / Tesauros e infor-matica documentales: Bodas de oro. / Information languages and the thesaurus: celebrating their golden anniversary (Lang.: fr). – In: Documentaliste - Sciences de l'Informati-on, 44(2007)1, p.76-80. 0512 211 Hjørland, B. – Information: objective or subjec-tive/situational? (Lang.: eng). – In: Journal of the Ameri-can Society for Information Science and Technology, 58(2007)10, p.1448-1457. 0513 212 Kishida, K. - Effectiveness and functionality of controlled vo-cabulary in the Internet age (Lang.: jap). – In: Journal of In-formation Science and Technology Association (Joho no Kagaku to Gijutsu), 57 (2007)2, p.62-67. 0514 214 Bianchini, D. et al. - Ontology-based methodology for e-service discovery (Lang.: eng). – In: Information Systems, 31(2006)4-5, p.361-380. 0515 214 Jimenez, A. G. - Una aproximacio als llenguatges 'documen-tals' en la web semantica [An approach to "bibliographic" languages on the semantic web] (Lang.: cat). - In: Item, 42(2006) p.33-50. 0516 214 Madalli, P. – Ontologies as knowledge structures for semantic retrieval (Lang.: eng). – In: Information Studies, 12(2006)4, p.205-212. 0517 214 Ungváry, R. - Az ontológiák és legfontosabb fogalmaik [Ontologies and their most general concepts] (Lang.: hun). - In: Tudományos és Műszaki Tájékoztatás, 54(2007)10, 2007. URL: http://tmt.omikk.bme.hu/show_news.html?id=4789 &issue_id=487

22 Structure and elements of CS & T 0518 225 Hunt, K. - Faceted browsing: breaking the tyranny of key-word searching (Lang.: eng). - In: Feliciter, 52(2006), p.36-37. 0519 225 Lin, Wen-Yau C. - The concept and applications of faceted classification (Lang.: chi). – In: Journal of Educational Me-dia & Library Sciences, 44(2006)2, p.153-171. 0520 225 Miksa, S.D., et al. - The development of a facet analysis sys-tem to identify and measure the dimensions of interaction in online learning (Lang.: eng). – In: Journal of the American Society for Information Science and Technology, 58(2007)11, p.1569-1578. 0521 225;752 La Barre, K. – Faceted navigation and browsing features in new OPACs: robust support for scholarly information seek-ing? – (Lang.: eng). - In: Knowledge Organization, 34(2007)2, p.78-90. 0522 226 Buizza, P. – (Book review of) Biblioteca nazionale centrale di Firenze. Nuovo soggettario: guida al sistema italiano di indicazzazione per soggetto, prototipo del thesaurus [Natio-nal Central Library of Florence. New subject headings: a guide to the Italian system of subject description, proto-type of a thesaurus] (Lang.: eng). – Milan: Bibliografica, 2007. – 246p.1 CD-ROM. – ISBN 978-88-7075-633-3(bb) – In: Knowledge Organization, 35(2007)1, p.58-60. 0523 226 Soonja Lee Koh, G.- Capturing the intended messages of subject headings as exemplified in The List of Korean Subject Headings (Lang.: eng). - In: International Cataloguing & Bibliographic Control, 36(2007)2, p.27-36. 0524 226 Ojala, M. - Finding and using the magic words: keywords, thesauri, and free text search (Lang.: eng). – In: Online, 31(2007)4, p.40-42. 0525 226 Shimada, M. - The revision of the National Diet Library List of Subject Headings (NDLSH), and its future (Lang.: jap). – In: Journal of Information Science and Technology As-sociation (Joho no Kagaku to Gijutsu), 57(2007)2, p.73-78. 0526 229 Hunter, J., Cheung, K. - Provenance Explorer - a graphical interface for constructing scientific publication packages from provenance trails (Lang.: eng). – In: International Journal on Digital Libraries, 7(2007)1/2, p.99-107.

Page 83: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

272

23 Construction of Classification Systems and Thesauri 0527 232 Rowley, J. – (Book review of) British Standard. Structured vocabularies for information retrieval – Guide. Part 1: Defi-nitions, symbols and abbreviations. Part 2: Thesauri (BS8723-1 & 2: 2005) (Lang.: eng). – In: Journal of Documentation, 63(2007)3, p.428-431. 24 Relationships 0528 241;761 Mortchev-Bouveret, M. - Fonctions lexicales pour le typage de relations syntagmatiques et paradigmatiques : une ap-proche lexicographique du terme [Lexical functions for the characterisation of syntagmatic and paradigmatic relations : a lexicographic approach to terms] (Lang.: fr). - In: Ter-minology, 12 (2006)2, p.235-259. 25 Numerical Taxonomy 0529 252 Cathey, R.J. et al. – Exploiting parallelism to support scalable hierarchical clustering (Lang.: eng). – In: Journal of the American Society for Information Science and Technology, 58(2007)8, p.1207-1222. 0530 252 Dunlavy, D.M. et al. - QCS: a system for querying, cluster-ing and summarizing documents (Lang.: eng). – In: Infor-mation Processing & Management, 43(2007)6, p.1588-1605. 26 Notation. Codes 0531 265 Satija, M. P. – Book numbers in India with special reference to the author designed and used by the National Library of India (Lang.: eng). – In: Knowledge Organization, 34(2007)1, p.34-40. 29 Evaluation of C S & T 0532 292;048-46 Kim, S., Beck, H. W. - A practical comparison between the-saurus and ontology techniques as a basis for search im-provement (Lang.: eng). – In: Journal of Agricultural & Food Information, 7(2006)4, p.23-42. 0533 294 Harper, C.A., Tillett, B. - Library of Congress controlled vo-cabularies and their application to the Semantic Web (Lang.: eng). – In: Cataloging & Classification Quarterly, 43(2007)3/4, p.47-68. 34 Classing and Indexing 0534 34 Sukula, S. K. - Indexing in electronic environment (Lang.: eng). – In: SRELS journal of information management, 44(2007)3, p.249-254.

0535 343 Arazy, O., Woo, C. – Enhancing information retrieval through statistical natural language processing: a study of col-location indexing (Lang.: eng). – In: MIS Quarterly, 31(2007)3, p.525-547. 0536 344 Basile, P.et al. - The JIGSAW Algorithm for word sense dis-ambiguation and semantic indexing of documents (Lang.: eng). – In: Lecture Notes in Computer Science, 4733(2007), p.313-325. 0537 344 Costa, V. S., Sagonas, K., Lopes, R. - Demand-driven in-dexing of prolog clauses (Lang.: eng). – In: Lecture Notes in Computer Science, 4670 (2007), p.395-409. 0538 344 Peng, D. - Automatic conceptual indexing of Web services and its application to service retrieval (Lang.: eng). – In: Lecture Notes in Computer Science, 4494(2007), p.290-301. 0539 344 Samantray, S. D., Vasudev, P.- A data mining approach for concept based document classification and automated text summarization (Lang.: eng). - In: International Conference Multidisciplinary Information Sciences and Technologies, 1, 2(2006), p.3-7. 0540 344 Shen, J-J., Chang, C-C., Li, Y-C. – Combined association rules for dealing with missing values (Lang.: eng). – In: Journal of Information Science, 33(2007)4, p.468-481. 0541 344 Zhan, J., Loh, H. T. - Using latent semantic indexing to im-prove the accuracy of document clustering (Lang.: eng). – In: Journal of Information and Knowledge Management, 6(2007)3, p.181-188. 0542 346 De Campos, L. M. et al. – Automatic indexing from a the-saurus using Bayesian networks: application to the classifica-tion of parliamentary initiatives (Lang.: eng). – In: Lecture Notes in Computer Science, 4724 (2007), p.865-877. 0543 347 Lioma, C., Ounis, I. - A syntactically-based query reformu-lation technique for information retrieval (Lang.: eng). – In: Information Processing & Management, 44(2008)1, p.143-162. 0544 348 Giunchiglia, F., Zaihrayeu, I., Kharkevich, U. - Formalizing the get-specific document classification algorithm (Lang.: eng). – In: Lecture Notes in Computer Science, 4675(2007), p.26-37. 0545 348 Li, T., Zhu, S., Ogihara, M. - Hierarchical document classifi-cation using automatically generated hierarchy (Lang.: eng).

Page 84: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

273

– In: Journal of Intelligent Information Systems, 29(2007)2, p.211-230. 0546 348 Ru, Y., Horowitz, E. - Automated classification of HTML forms on e-commerce web sites (Lang.: eng). – In: Online Information Review, 31(2007)4, p.51-466. 0547 348 Song, D. et al. - An intelligent information agent for docu-ment title classification and filtering in document-intensive domains (Lang.: eng). – In: Decision Support Systems, 44(2007)1, p.251-265. 35 Manual and Automatic Order Techniques 0548 356 Agosti, M., Bonfiglio-Dosio, G., Ferro, N. - A historical and contemporary study on annotations to derive key features for systems design (Lang.: eng). – In: International Journal on Digital Libraries, 8(2007)1, p.1-19. 0549 357 Doucet, A., Lehtonen, M. - Unsupervised classification of text-centric XML document collections (Lang.: eng). - In: Lecture Notes in Computer Science, 4518(2007) p.497-509. 0550 357;751 Gery, M. - Indexing "Reading Paths" for a structured infor-mation retrieval at INEX 2006 (Lang.: eng). – In: Lecture Notes in Computer Science, 4518(2007) p.160-164. 36 Coding 0551 361 Tennis, J.T. - Scheme versioning in the Semantic Web (Lang.: eng). – In: Cataloging & Classification Quarterly, 43(2007)3/4, p.85-104. 39 Evaluation of Classing and Indexing 0552 393 Efron, M. - Query expansion and dimensionality reduction: notions of optimality in Rocchio relevance feedback and la-tent semantic indexing (Lang.: eng). – In: Information Processing & Management, 44(2008)1, p.163-180. 4 On Universal Classification Systems and Thesauri 42 On the Universal Decimal Classification 0553 42 Caranfil, L. - Clasificarea Zecimală Universală şi catalogul tematic al Bibliotecii Academiei Române (I-3) [The Uni-versal Decimal Classification and the subject catalogue of the Romanian Academy Library] (Lang.: rom). - In: Bibli-oteca, 18(2007)2, p.50-51; 3/4, p.80-81; 5, p.127-128.

0554 42 Cordeiro, I. M. - The UDC in a time of change: a status re-port (Lang.: eng). Proceedings of the International Con-ference on Future of Knowledge Organization in Net-worked Environment (IKONE 2007), Bangalore, 3-5 Sep-tember 2007; ed. K.S. Raghavan. - Bangalore: Indian Statis-tical Institute, Documentation Research & Training Cen-tre, 2007. (Indian Statistical Institute Platinum Jubilee Conference Series), p.105-114. 0555 42 Frâncu, V. - Seminar internaţional CZU (I) [An international seminar on the UDC] (Lang.: rom). - In : Biblioteca, 18(2007)7, p.190-191. 0556 42 Kovac, T. et al. - Univerzalna decimalna klasifikacija: priroc-nik [Universal Decimal Classification: handbook] (Lang.: slo). – Ljubljana, Narodna in univerzitetna knižnica, 2006. - 130pp. 43 On the Dewey Decimal Classification 0557 43 Fleharty, C., Smith, S. - Biographies: where are they? (Lang.: eng). – In: School Library Media Activities Monthly, 23(2007)9, p.30-31. 0558 43 Khairy, I. - Le projet Web Dewey en arabe de la Bibliothèque Alexandrie [The project for an Arabic Web Dewey in the Biblioteca Alexandrina] (Lang.: fr). - Paper presented to 29th annual conference of MELCOM: the European Mid-dle Eastern Libraries Association, Sarajevo, June 4-6, 2007. 5p. URL: http://www.sant.ox.ac.uk/mec/melcomintl/melcom/ Papers-2007/iman.doc 0559 43 Montgomery, P. – Dewey Decimal Sudoku (Lang.: eng). In: School Library Media Activities Monthly, 23(2007)10, p.16. 0560 43 Petgnet, D. - Y a-t-il une vie après la Dewey? [Is there a life after Dewey?] (Lang.: fr). – In: Bulletin des Bibliothèques de France, 52(2007)3, p.107-108. 0561 43 Richman, D. - Social search comes of age (Lang.: eng). – In: Information Outlook, 11(2007)8, p.18-24. 0562 43 Weihs, J. – (Book review of) Mitchell, J. S., Vizine-Goetz, D. Moving beyond the presentation layer: content and con-text in the Dewey Decimal Classification (DDC) system - New York: Haworth Information Press, 2006. - xix, 239 p.- ISBN: 0789034522; 9780789034526 (Lang.: eng). – In: Fe-liciter, 53(2007)5, p.265-265.

Page 85: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

274

44 On the Library of Congress Classification and Li-brary of Congress Subject Headings 0563 44 Making LC call numbers visual (Lang.: eng) .- In: Library Journal, 130(2005)6, p.15. 0564 44;448 Schiff, A. - New edition of SACO Participants' Manual forthcoming (Lang.: eng). – In: ALCTS Newsletter Online, 16(2005)6, p.1. 0565 448 Kuntz, B. - Arabic librarians talk back to the empire (Lang.: eng). - Paper presented to 29th annual conference of MELCOM: the European Middle Eastern Libraries Asso-ciation, Sarajevo, June 4-6, 2007. 5p. URL: http://www.sant.ox.ac.uk/mec/melcomintl/melcom/Papers-2007/Blair-Kuntz.doc 0566 448 Orphan, S. - EBSCO offers alternate subject headings through A-to-Z service (Lang.: eng). – In: College & Re-search Libraries News, 67(2006)6, p.351. 48 On other Universal Classification Systems and Thesauri 0567 481 Bilodeau, B. - RASUQAM: the thesaurus of descriptors of the Universtié du Québec à Montreal (UQAM) / RASUQAM le thesaurus de descripteurs de l'Université du Québec à Montreal (UQAM) Lang.: eng). – In: Documentation et Bibliothèques, 52(2006)2, p.109-120. 0568 481 Sato, H. - The main purpose and the basic rules of developing "ExpressFinder/Thesaurus" (Lang.: jap). – In: Journal of In-formation Science and Technology Association (Joho no Kagaku to Gijutsu), 57(2007)2, p.84-88. 6 On Special Subjects Classifications and Thesauri 0569 6 Gagnon, G. – (Book review of) Johnston, B. H. - An-ishinaubae thesaurus – East Lansing, MI: Michigan State University Press, 2006 – 320p. - ISBN 0-87013-753-0 (Lang.: eng). – In: Choice: Current Reviews for Academic Libraries, 45(2007)1, p.62. 62 On C S & T in Physics, Chemistry, Electronics, En-ergy 0570 624 Access Innovations develops a thesaurus for the Institute of Electrical and Electronic Engineers (Lang.: eng). – In: Key Words, 15(2007)1, p.10.

64 On C S & T in Biological, Veterinary Science, Agri-culture, Food Sciences, Ecology 0571 649;221 Stirling, D. A. - EPA glossaries: the struggle to define envi-ronmental terms (Lang.: eng). – In: Government Informa-tion Quarterly, 24(2007)2, p.414-428. 65 On C S & T in Human Biology, Medicine, Psychol-ogy, Education, Labour, Sports, Household 0572 651/4 Arencibia-Jorge, R., Vega-Almeida, R.L., Martí-Laher, Y. – Domain analysis for the construction of a conceptual struc-ture: a case study (Lang: eng). – In: LIBRES: Library and Information Science Research Electronic Journal, 17(2007)2. URL: http://libres.curtin.edu.aul 0573 651/4 Lacoste, C. et al. - Inter-media concept-based medical image indexing and retrieval with UMLS at IPAL (Lang.: eng). – In: Lecture Notes in Computer Science, 4730(2007) p.694-701. 0574 651/4 Muh-Chyun Tang - Browsing and searching in a faceted in-formation space: a naturalistic study of PubMed users' inter-action with a display tool (Lang.: eng). – In: Journal of the American Society for Information Science & Technology, 58(2007)13, p.1998-2006. 0575 651/4; 743 Stojmirović, A., Pestov, V. - Indexing schemes for similarity search in datasets of short protein fragments (Lang.: eng). – In: Information Systems, 32(2007)8, p.1145-1165. 66 On C S & T in Sociology, Politics, Social Policy, Law, Area Planning, Military Science, History 0576 66 Levinson, D. - Anthropology, taxonomies, and publishing (Lang.: eng). – In: Online, 30(2006)4, p.28-30. 0577 661 López-Huertas, M. J., Ramírez, I de T.- Gender terminol-ogy and indexing systems: the case of woman's body, image and visualization (Lang.: eng). – In: Libri: International Journal of Libraries & Information Services, 57(2007)1, p.34-44. 0578 666 Dabney, D. - The universe of unthinkable thoughts: literary warrant and West’s Key Number System (Lang.: eng). – In: Law Library Journal, 99(2007)2, p.229-247. 0579 666 Hickey, L. - (Book review of) Burton, W.C. - Burton's legal thesaurus. 4th ed. - New York : McGraw-Hill, c2007. - xvii, 1063 p.– ISBN 0071472622; 9780071472623 (Lang.: eng). –

Page 86: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

275

In: Choice: Current Reviews for Academic Libraries, 45(2007)1, p.70. 0580 666 Modeste, J., Dina, Y. – Use of the Elizabeth Moys Classifica-tion Scheme for Legal Materials in the Caribbean (Lang.: eng). – In: Caribbean Libraries in the 21st Century: Changes, Challenges and Choices; ed. by C. Peltier-Davies & S. Renwick. – Medford, NJ: Information Today, 2007 – ISBN 978-1-57387-301-7 – p.119-129. 68 On C S & T in Science of Science, Information Sci-ence, Computer Science, Communication Science, Semiotics 0581 682 Naumis-Peña, C. - Estudio comparativo de tesauros bibliote-cológicos en lengua española [Comparative study of Spanish thesauri in LIS] (Lang.: sp). – In: Investigacion Biblioteco-logica, 21(2007)42, p.195-210. 0582 69 Broughton, V., Slavic, A. - Building of a faceted classifica-tion for humanities: method and procedure (Lang.: eng). – In: Journal of Documentation, 63(2007)5, p.727-754. URL: http://dlist.sir.arizona.edu/1976/. 7 Knowledge Representation by Language and Terminology 71 General Problems of Natural Language in Relation to Knowledge Organization 0583 715 Ahn, H., Kim, K., Han, I. - Global optimization of feature weights and the number of neighbors that combine in a case-based reasoning system (Lang.: eng). – In: Expert Systems, 23(2006)5, p.290-301. 73 Automatic Language Processing 0584 732 Niemi, T., Jamsen, J. – A query language for discovering se-mantic associations, Pt. 1: approach and formal definition of query primitives; Part 2: sample queries and query evaluation (Lang.: eng). – In: Journal of the American Society for In-formation Science and Technology, 58(2007)11, p. 1559-1568; 1686-1701. 0585 733 Lazarinis, F. – Engineering and utilizing a stopword list in Greek Web retrieval (Lang.: eng). – In: Journal of the American Society for Information Science and Technology, 58(2007)11, p.1645-1653. 0586 733 Strossa, P. - Komunikace mezi člověkem a počítačem v při-rozeném jazyce [Communication between man and compu-ter in natural language] (Lang.: cz). - Science WORLD [Online]. Praha, 2004.

URL: http://scienceworld.cz/sw.nsf/ID/10D74E2E7ED7 559EC1256F32005ACF20?OpenDocument, October 19, 2007. 0587 736 Ou, S., Khoo, C.S.G., Goh, D.H. – Automatic multidocu-ment summarization of research abstracts: design and user evaluation (Lang. eng). – In: Journal of the American Soci-ety for Information Science and Technology, 58(2007)10, p. 1419-1434. 74 Grammar Problems 0588 743 Dehuri, S., Mall, R. - Predictive and comprehensible rule discovery using a multi-objective genetic algorithm (Lang.: eng). – In: Knowledge-based Systems, 19(2006)6, p.413-421. 0589 744 Hu, J. et al. - Locality discriminating indexing for document classification keywords: information retrieval (Lang.: eng). – In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development, (2007), p.689-690. 0590 744 Ke, W., Mostafa, J., Fu, Y. - Collaborative classifier agents: studying the impact of learning in distributed document clas-sification keywords (Lang.: eng). – In: Joint Conference on Digital Libraries, 7(2007), p.428-437. 75 On-Line Retrieval Systems and Technologies 751 General and Theoretical Problems 0591 751 Hutchinson, H.B., Druin, A., Bederson, B. – Supporting elementary-age children’s searching and browsing: design and evaluation using the International Children’s Digital Library (Lang.: eng). – In: Journal of the American Society for In-formation Science and Technology, 58(2007)11, p. 1618-1631. 0592 751 Rédy, G., Neumann, A., Sutó, Z. - Információkeresés [In-formation retrieval] (Lang.: hun). – In: Tudomanyos es Muszaki Tajekoztatas, 54(2007)2, p.55-61. 0593 751 Ryoo, J., Saiedian, H. - A framework for classifying and de-veloping extensible architectural views (Lang.: eng). – In: In-formation and Software Technology, 48(2006)7, p.456-470. 0594 751 Williamson, N.J. - Knowledge structures and the Internet: progress and prospects (Lang.: eng). – In: Cataloging & Classification Quarterly, 44(2007)3/4, p.329-342.

Page 87: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

276

0595 751 Zhang, R., Hu, Y.C. - Assisted peer-to-peer search with par-tial indexing (Lang.: eng). – In: IEEE Transactions on Par-allel and Distributed Systems, 18(2007), p.1146-1158. 0596 751 Zhu Xiaomin, Liao Jianxin – (Book review of) Lazar, J. - Web usability : a user-centered design approach. - Boston: Addison Wesley, 2006. - xxi, 394pp.ISBN: 9780321321350 (Lang.: eng). – In: Journal of the American Society for In-formation Science & Technology, 58(2007)7, p.1066-1067. 0597 751;225 Uddin, M.N., Janecek, P. – Faceted classification in web in-formation architecture (Lang.: eng). – In: The Electronic Library, 25(2007)2, p.218-233. 752 Dialogue systems. Interactive Catalogues 0598 752 Kapoor, K., Goyal, O.P. – Web-based OPACs in Indian academic libraries: a functional comparison (Lang.: eng). – In: Program: Electronic Library and Information Systems, 41(2007)3, p.291-310. 0599 752 Wells, D. - What is a library OPAC? (Lang.: eng). – In: The Electronic Library, 25(2007)4, p.386-394. 0600 752.3 Harcourt, K., Wacker, M., Wolley, I. - Automated access level cataloging for Internet resources at Columbia Univer-sity Libraries (Lang.: eng). – In: Library Resources and Technical Services, 51(2007)3, p.212-225. 0601 752.3 O'Leary, M. - Northern light: better the second time around? (Lang.: eng). – In: Information Today, 24(2007)2, p.33,36. 753 Online Access, Queries, Free Text Searching 0602 753 Mandl, T. – The impact of web site structure on link analysis (Lang.: eng). – In: Internet Research, 17(2007)2, p.196-207. 754 Programs for on-line queries, e.g. for ranking. Rele-vance ranking 0603 754 Fourie, I., Bothma, T. – Information seeking: an overview of web tracking and the criteria for tracking software (Lang.: eng). – In: Aslib Proceedings: New Information Perspec-tives, 59(2007)3, p.264-285. 757 Expert Systems in Searching. Search Engines 0604 757 Boldiš, P. – Vyhledávače: současné problémy a trendy vývoje [Search engines: present questions and trends] (Lang.: cz). – In: Knihovna plus [Online]. 2005, č. 1.

URL: http://knihovna.nkp.cz/knihovnaplus51/boldis.htm, October 19, 2007. 0605 757 Han, L., Chen, G., Xie, L. – AASA: a method of automati-cally acquiring semantic annotations (Lang.: eng). – In: Journal of Information Science, 33(2007)4, p.435-451. 0606 757 Kuramochi, M., Karypis, G. - Discovering frequent geomet-ric subgraphs (Lang.: eng). – In: Information Systems, 32(2007)8, p.1101-1120. 0607 757 Tho, Q.T., Fong, A.C.M., Hui, S.C. – A scholarly semantic web system for advanced search functions – (Lang.: eng). – In: Online Information Review, 31(2007)3, p.353-365. 0608 757 Xiaodong Shi, Yang, C.C. - Mining related queries from Web search engine query logs using an improved association rule mining model (Lang.: eng). – In: Journal of the American Society for Information Science & Technology, 58(2007)12, p.1871-1883. 759 Evaluation of On-Line Information Retrieval Sys-tems and Techniques 0609 759 Chen, Y-L., Cheng, L-C., Cheng, Y-L. – Using position, fonts and cited references to retrieve scientific documents (Lang.: eng). – In: Journal of Information Science, 33(2007)4, p.492-519. 0610 759 Raban, D.R. – User-centred evaluation of information: a re-search challenge – (Lang.: eng). – In: Internet Research, 17(2007)3, p.306-323. 76 Lexicon/Dictionary Problems 0611 761;722 Gómez González-Jover, A. - Meaning and anisomorphism in modern lexicography (Lang.: eng). – In: Terminology, 12 (2006)2, p.215-234. 0612 762 Bergenholtz, H., Nielsen, S. - Subject-field components as integrated parts of LSP dictionaries (Lang.: eng). – In: Ter-minology, 12 (2006)2, p.281-303. 77 Problems of Terminology 0613 773 L'Homme, M.C. - The processing of terms in dictionaries: new models and techniques. State of the art (Lang.: eng). – In: Terminology, 12 (2006)2, p.181-188.

Page 88: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

277

0614 773;78-75 Faber, P.et al. - Process-oriented terminology management in the domain of coastal engineering (Lang.: eng). – In: Termi-nology, 12 (2006)2, p.189-213. 0615 777 Chou, C.-H., Han, C.-C., Chen, Y.-H. - GA based optimal keyword extraction in an automatic Chinese Web document classification system (Lang.: eng). – In: Lecture Notes in Computer Science, 4743(2007), p.224-234. 0616 777 Fu Lee Wang, Yang, C. - Mining Web data for Chinese seg-mentation (Lang.: eng). – In: Journal of the American So-ciety for Information Science & Technology, 58(2007)12, p.1820-1837. 78 Subject-Oriented Terminology Work 0617 78-49 Chen, Z. et al. - Semantic integration of government data for water quality management (Lang.: eng). – In: Government Information Quarterly, 24(2007)4, p.716-735. 0618 78-66;448 Whited, M. - ALA/ALCTS Cataloging and Classification Section/Subject Analysis Committee (Lang.: eng). – In: Law Library Journal, 96(2004)4, p.862-863. 0619 78-93 Nero, L. M., Mitchell, J. S.,Vizine-Goetz, D. - Classifying the popular music of Trinidad and Tobago (Lang.: eng). – In: Cataloging & Classification Quarterly, 42(2006)3/4, p.119-133. 0620 78-93;357 Pinto, A., Haus, G. - A novel XML music information re-trieval method using graph invariants (Lang.: eng). – In: ACM Transactions on Information Systems, 25(2007)4, p.19-44. 0621 78-97 Tiberi, M., Mazzocchi, F. - La gestione della polisemia nei thesauri: il caso dei termini filosofici [Management of pol-ysemy in thesauri: the case of philosophical terms] (Lang.: it) – In: Bollettino AIB, 47(2007)1/2, p.93-107. 79 Multilingual Systems and Translation 0622 791 Amaral, C., Laurent, D. - Implementation of a QA system in a real context (Lang.: eng). - Paper presented at TEL-ME-MOR/M-CAST Seminar On Subject Access, Prague, No-vember 24, 2006. URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ TellMeMore_version_2.ppt?PHPSESSID=e93ac95089981 7d81e359e42355fb5ae, October 19, 2007. 0623 791 Balíková, M. - M-CAST in libraries (Lang.: eng). - Paper presented at TEL-ME-MOR/M-CAST Seminar On Sub-

ject Access, Prague, November 24, 2006. URL: http:// knihovnam.nkp.cz/docs/telmemor/subject/M-CAST_in_ libraries.ppt?PHPSESSID=e93ac950899817d81e359e4235 5fb5ae, October 19, 2007. 0624 791 Czerniejewski, B. - Multilingual Content Aggregation Sys-tem based on TRUST Search Engine (M-CAST) (Lang.: eng). - Paper presented at TEL-ME-MOR/M-CAST Se-minar On Subject Access, Prague, November 24, 2006. URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ M-CAST-project_presentation-final.ppt?PHPSESSID=e9 3ac950899817d81e359e42355fb5ae, October 19, 2007. 0625 791 Heuwing, B., Mandl, T., Strotgen, R. - Multilingual web re-trieval experiments with field specific indexing strategies for WebCLEF 2006 at the University of Hildesheim (Lang.: eng). – In: Lecture Notes in Computer Science, 4730(2007), p.834-837. 0626 791 Lisek, S. - P2P networks for distributed queries (Lang.: eng). - Paper presented at TEL-ME-MOR/M-CAST Seminar On Subject Access, Prague, November 24, 2006. URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ M-CAST-P2P.ppt?PHPSESSID=e93ac950899817d81e359e 42355fb5ae, October 19, 2007. 0627 791;871 Ménard, E. – Indexing and retrieving images in a multilin-gual world (Lang.: eng). – In: Knowledge Organization, 34(2007)2, p.91-100. 0628 793 Strossa, P. - Information query formulation in a Slavonic language and its automatic processing : experience from Po-lish and Czech in comparison to Western European languages (Lang.: eng). - Paper Presented at TEL-ME-MOR/M-CAST Seminar On Subject Access, Prague, November 24, 2006. URL: http://knihovnam.nkp.cz/docs/telmemor/subject/ TEL-ME-MOR-PS.ppt?PHPSESSID=e93ac950899817d8 1e359e42355fb5ae. 0629 799 Clavel, P.– How localization challenges international portals: character sets and international access (Lang.: eng). – In: In-ternational Cataloguing and Bibliographic Control, 36(2007)3, p.51-55. 0630 799;945 Harai, N. – Japanese scripts and UNIMARC (Lang.: eng). – In: International Cataloguing and Bibliographic Control, 36(2007)3, p.55-58.

Page 89: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

278

8 Applied Classing and Indexing 81 General Problems, Catalogues, Guidelines, Rules, In-dexes 0631 811 Haslinger, I., Van Otegem, M. - Machines moeten zoeken, mensen willen vinden [Engines should search, people want to find] (Lang.: du). –In: Informatie Professional, 11(2007)1, p.16-19. 82 Data Classing and Indexing 0632 82-92;43 Hu Yuefang, Chen Yintao - Differences between the DDC and the CLC in classifying works of literature (Lang.: eng). – In: Illinois Libraries, 86(2007)4, p.5-10. 83 Titled classing and Indexing. Derived Indexing, Folk-sonomy 0633 831 Kelsey, P.J. - The Financial Counseling and Planning Index-ing Project: establishing a correlation between indexing, total citations, and library holdings (Lang.: eng). – In: Financial Counseling and Planning, 18(2007)1, p.19-24. 0634 835 Abbas, J. – In the margins: reflections on scribbles, knowl-edge organization and access (Lang.: eng) – In: Knowledge Organization, 34(2007)2, p.72-77. 0635 835 Francis, E., Quesnel, O. - Indexation collaborative et folk-sonomies. / Gemeinschaftliche Indexierung und Folksonomi-en. / Indizacion colaborativa y folksonomias. / Collaborative indexing and folksonomies (Lang.: fr). -In Documentaliste - Sciences de l'Information, vol. 44(2007)1, p 58-63. 0636 835 Munk, T.B., Mørk, K. – Folksonomy, the power law & the significance of the least effort (Lang.: eng). – In: Knowledge Organization, 34(2007)1, p.16-33. 84 Primary Literature Classification and Indexing 0637 842 Blanchard, A. - Understanding and customizing stopword lists for enhanced patent mapping (Lang.: eng). – In: World Patent Information, 29(2007)4, p.308-316. 0638 842 Kang, In-Su, et al. - Cluster-based patent retrieval (Lang.: eng). - In: Information Processing & Management, 43(2007)5, p.1173-1182. 0639 842 Kim, Jae-Ho, Choi, Key-Sun - Patent document categoriza-tion based on semantic structural information (Lang.: eng). –

In: Information Processing & Management, 43(2007)5, p.1200-1215. 0640 842 Li, Y., Shawe-Taylor, J. - Advanced learning algorithms for cross-language patent retrieval and classification (Lang.: eng). – In: Information Processing & Management, 43(2007)5, p.1183-1199. 0641 844;918 Cuvillier, J. - Indexing grey resources: considering usual be-havior of library users and the use of Dublin Core metadata via a database of specialized vocabulary (Lang.: eng). - In: Publishing Research Quarterly, 23(2007)1, p.78-88. 0642 846;88-46 Hoover, L. L. - Agriculture and food related theses and dis-sertations available on the Web (Lang.: eng). - In: Journal of Agricultural & Food Information, 7(2006)2/3, p.87-108. 85 (Back of the) Book Classification and Indexing 0643 854 Kunze, H., Dahlberg, I. – (Book review of) Fugmann, R. Die Buchregister: Methodische Grundlagen und praktikische Anwendungen [The book index: methodological founda-tions and practical applications] – Frankfurt am Main: DGI, 2006. – 136pp.– (Reihe Informationswissenschaft der DGI, Bd. 10) – ISBN 978-3-925474-59-0; 3-925474-59-5 (Lang.: eng).- In: Knowledge Organization, 34(2007)1, p.60-61. 0644 854 Matthews, D. - Indexing (Lang.: eng). - In: Author, 18(2007)2, p.61-62. 86 Secondary Literature Classification and Indexing 0645 864 Bornmann, L., Daniel, H-D. – Multiple publication on a single research study: does it pay? The influence of number of research articles on total citation counts in biomedicine (Lang.: eng). – In: Journal of the American Society for In-formation Science and Technology, 58(2007)8, p.1100-1108. 0646 864 Kousha, K., Thelwall, M. - How is science cited on the Web? A classification of Google unique Web citations (Lang.: eng). - In: Journal of the American Society for Information Sci-ence & Technology, 58(2007)11, p.1631-1644. 0647 864 Rousseau, R. – On Egghe’s construction of Lorenz curves (Lang.: eng). – In: Journal of the American Society for In-formation Science and Technology, 58(2007)10, p.1551-1552. 0648 864 Sawyer, S., Huang, H. – Conceptualizing information, tech-nology and people: comparing information science and in-

Page 90: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

279

formation systems literature (Lang.: eng). – In: Journal of the American Society for Information Science and Tech-nology, 58(2007)10, p.1436-1448. 0649 864 Schneider, J.W., Borlund, P. – Matrix comparison: motiva-tion and important issues for measuring the resemblance be-tween proximity measures or ordination results. Parts 1 & 2 (Lang.: eng). – In: Journal of the American Society for In-formation Science and Technology, 58(2007)11, p.1586-95; 1596-1610. 0650 864 Vanclay, J.K. – On the robustness of the h-index (Lang.: eng). – In: Journal of the American Society for Informa-tion Science and Technology, 58(2007)11, p. 1547-1550. 0651 864 Zanotto, E. D. - The scientists pyramid (Lang.: eng). – In: Scientometrics, 69(2006)1, p.175-181. 0652 864;757 Schneider, J.W. - Concept symbols revisited: naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols (Lang.: eng). – In: Scientometrics, 68(2006)3, p.573-593. 87 Classification and Indexing of Non-Book Materials 0653 871 Chatterjee, K., Chen, S.-C. - A novel indexing and access mechanism using affinity hybrid tree for content-based image retrieval in multimedia databases (Lang.: eng). – In: Inter-national Journal of Semantic Computing, 1(2007)2, p.147-170. 0654 871 Dmitry, M., Bovbel, E. - Indexing and retrieval scheme for content-based multimedia applications (Lang.: eng). - In: Lec-ture Notes in Computer Science, 4629(2007), p.162-169. 0655 871 Hsieh-Lee, I. – Organizing audio-visual and electronic re-sources for access: a cataloguing guide. 2nd ed. – Westport, CN: Libraries Unlimited, 2006. – xix, 376p.(Lang.: eng). Book reviews by: Hillson, B. (0656) - (Lang.: eng). - Reference and User Services Quarterly, 46(2007)3, p.105-106; Seabrook, N. (0657) - (Lang.: eng). – Library Review, 56(2007)7, p.629-631. 0658 871;847 Conduit, N., Rafferty, P. - Constructing an image indexing template for the Children's Society: users' queries and archi-vists' practice (Lang.: eng). – In: Journal of Documentation, 63(2007)6, p.898-919. 0659 872 Fielding, E. - Unlocking the garage: a web portal for car en-thusiasts (Lang.: eng). – In: Electronic Library, 25(2007)4, p.453-464.

0660 872 Rorissa, A. – Relationships between perceived features and similarity images: a test of Tversky’s contrast model (Lang.: eng). – In: Journal of the American Society for Informa-tion Science and Technology, 58(2007)10, p.1401-1419. 0661 872 Yang, S. et al. - Semantic categorization of digital home photo using photographic region templates (Lang.: eng). – In: Infor-mation Processing & Management, 43(2007)2, p.503-514. 0662 875 Balkhatir, M., Charhad, M. - A conceptual framework for automatic text-based indexing and retrieval in digital video collections (Lang.: eng). – In: Lecture Notes in Computer Science, 4653(2007), p.392-403. 0663 876 Mangan, E. - Cartographic materials: a century of cataloging at Library of Congress and beyond (Lang.: eng). – In: Jour-nal of Map & Geography Libraries, 3(2007)2, p.23-44. 0664 878 Baca, M. et al. - Cataloging cultural objects : a guide to de-scribing cultural works and their images. (Lang.: eng). - Chicago, IL.: American Library Association, 2006.- xiii, 396 p. - ISBN: 0838935648; 9780838935644. Book reviews by: Chapman, J.W. (0665) – (Lang.: eng). - In: Technicalities, 27(2007)2, p.15-16; Frosch, P. (0666) - (Lang.: eng) – In: Library Journal, 132(2007)3, p.152. 0667 878 Leman, S. - Let op uw woorden: thesauri in de dagelijkse museumpraktijk [Take care of your words: thesauri in day-to-day practice in museums] (Lang.: du). – In: Biblio-theek- en Archiefgids, 83(2007)1, p.36-38. 0668 878 Uralman, N. H. - 21. yuzyila girerken bir bilgi kurumu olarak muze [The museum as an information institution in the 21st century] (Lang.: tur). – In: Bilgi Dunyasi / Infor-mation World, 7(2006)2, p 250-266. 88 Classification and Indexing in Subject Fields 0669 88-4/5 Acosta, S. - Classification of life (Lang.: eng). – In: Library Media Connection, 25(2007)4, p.95. 9 Knowledge Organization Environment 91 Professional and Organisational Problems in General and in Institutions 0670 918 Calhoun, K. - Being a librarian: metadata and metadata spe-cialists in the twenty-first century (Lang.: eng). – In: Library Hi-Tech, 25(2007)2, p.174-187.

Page 91: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

280

0671 918 Chapman, A. - Resource discovery: catalogs, cataloging and the user (Lang.: eng). – In: Library Trends, 55(2007)4, p.917-931. 0672 918 D’Ambrosio, D.M. – Conceptualizing metadata via reper-tory grids: exploring a method for the development of do-main-specific systems for knowledge organization (Lang.: eng). – In: Knowledge Organization, 34(2007)1, p.41-57. 0673 918 Hert, C.A., et al. - Investigating and modelling metadata use to support information architecture development in the statis-tical knowledge network (Lang.: eng). – In: Journal of the American Society for Information Science and Technology, 58(2007)9, p. 1267-1285. 0674 918 Metadata and its impact on libraries; ed. by Intner, S.S., Lazinger, S.S., Weihs, J. Westport, CO & London: Librar-ies Unlimited, 2006. - 272pp. (Lang.: eng). Book reviews by: Arnott, G. (0675) – (Lang: eng). – In: Journal of Librari-anship and Information Science, 39(2007)1, p.59-60; Petrou, A.D. (0676) – (Lang.: eng). In: – Journal of the American Society for Information Science and Technology, 58(2007)6, p.909-910. 0677 918;88-51/4 Michon, J. – Biomedicine and the Semantic Web: a knowl-edge model for visual phenotype (Lang.: eng). – In: Catalog-ing & Classification Quarterly, 43(2007)3/4, p.149-160. 0678 918;88-54 Ferraioli, L. - An exploratory study of metadata creation in a health care agency (Lang.: eng). – In: Cataloging & Classi-fication Quarterly, 40(2005)3/4, p.75-102. 0679 918;88-54 Hatfield, A.J., Kelley, S.D. - Case study: lessons learned through digitizing the National Commission for the Protec-tion of Human Subjects of Biomedical and Behavioral Re-search Collection (Lang.: eng). – In: Journal of the Medical Library Association; 95(2007)3, p.267-270. 92 Persons and Institutions in Knowledge Organization 0680 922 Kofnovec, L.- Ing. Dušan Simandl - život a dílo [Dusan Simandl - life and work] (Lang.: cz). – In: Čtenář, 59 (2007)2, p.56-58. 94 Bibliographic Control. Bibliographic Records 0681 942 Danskin, A. – “Tomorrow never knows”: the end of cata-loguing (Lang.: eng). In: IFLA Journal, 33(2007)3, p.205-209.

0682 942 Rafferty, P., Hidderley, R. - Flickr and Democratic Index-ing: dialogic approaches to indexing (Lang.: eng). – In: Aslib Proceedings: New Information Perspectives, 59(2007), p.397-410. 0683 942;918 Yee, M. M. - Cataloging compared to descriptive bibliogra-phy, abstracting and indexing services, and metadata (Lang.: eng). – In: Cataloging and Classification Quarterly, 44(2007)3/4, p.307-328. 0684 944; 357 Isaac, A. - SKOS: simple knowledge organisation system (Lang.: du). – In: Informatie Professional, 10(2006)11, p.40-43. 0685 944 Semantic Web Deployment Working Group requests com-ments (Lang.: eng). – In: Library Hi Tech News, 24(2007)6, p.48-49. 0686 945 Mao, C-C. A., Hsu, C-f. F. - Chinese MARC (Taiwan) and its bibliographic database (Lang.: eng). – In: International Cataloguing and Bibliographic Control, 36(2007)3, p.58-60. 0687 945 Panici, A. - De la MARC la UNIMARC : etapele dezvoltă-rii, semnificaţii, importanţă, utilitate [From MARC to UNIMARC : stages of development, significance, impor-tance and usefulness] (Lang.: rom). - In: Magazin Biblio-logic, (2006)2-3, p.10-13. 0688 949 Chihaia, L., Chipcea, E., Covaci, M. - Fişierul de autoritate pentru nume de persoane: ghid practic de utilizare în Aleph 500 [Authority files for personal names: a practical guide to their use in Aleph 500] (Lang.: rom). - Iaşi: Biblioteca Centrală Universitară „Mihai Eminescu”, 2005. 0689 949 Chihaia, L., Popa, M. - Fişier de autoritate pentru nume de colectivitate în Aleph 500.16.02 [Authority file for corporate body names in Aleph 500.16.02] (Lang.: rom). - Iaşi: Biblioteca Centrală Universitară, „Mihai Eminescu”, 2006. - 65 p. 0690 949 Galvez, C., Moya-Anegón, F. - Approximate personal name-matching through finite-state graphs (Lang.: eng). – In: Journal of the American Society for Information Science & Technology, 58(2007)13, p.1960-1976. 95 Education and Training in Knowledge Organization 0691 952 Bawden, D. - Information seeking and information retrieval: the core of the information curriculum? (Lang.: eng). – In:

Page 92: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Knowledge Organization Literature

281

Journal of Education for Library & Information Science, 48(2007)2, p.125-138. 0692 952 Brunt, R. – Information storage and retrieval in the profes-sional curriculum (Lang.: eng). – In: Library Review, 56(2007)7, p.552-567. 0693 952 Lørring, L. - Didactical models behind the construction of an LIS curriculum (Lang.: eng). -In: Journal of Education for Library & Information Science, 48(2007)2, p.82-93. 0694 952 Poulter, A. – On reading “Information storage and retrieval in the professional curriculum” by Rodney Brunt (Lang.: eng). – In: Library Review, 56(2007)7, p.557-560. 0695 953 Spotti Lopes Fujita, M. - La enseñanza de la lectura docu-mentaria en el abordaje cognitivo y socio-cognitivo: orienta-ciones a la formación del indizador [Teaching documentary reading from a cognitive and sociocognitive approach: o-rientation for the training of the novice learner] (Lang.: sp). – In: Anales de Documentacion, 10(2007), p.397-412. 98 User Studies 0696 981 Chun-Yao Huang et al. - Characterizing Web users' online information behavior (Lang.: eng). – In: Journal of the American Society for Information Science & Technology, 58(2007)13, p.1988-1997.

0697 982 Cole, C. et al. - A classification of mental models of under-graduates seeking information for a course essay in history and psychology: preliminary investigations into aligning their mental models with online thesauri (Lang.: eng). – In: Jour-nal of the American Society for Information Science & Technology, 58(2007)13, p.2092-2104. 0698 982 Given, L.M. et al. – Inclusive interface design for seniors: image-browsing for a health information context (Lang.: eng). – In: Journal of the American Society for Informa-tion Science and Technology, 58(2007)11, p.1610-1618. 0699 982 Moore, J.L., Erdelez, S., He, W. – The search experience variable in information behaviour research (Lang.: eng). – In: Journal of the American Society for Information Sci-ence and Technology, 58(2007)10, p.1529-1547. 0700 982 Tenopir, C. et al. - Academic users’ interactions with ScienceDirect in search tasks: Affective and cognitive behav-iors (Lang.: eng). – In: Information Processing & Man-agement, 44(2008)1, p.105-121.

Page 93: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Personal Author Index 34(2007)

282

Personal Author Index

Abbas, J. 0634 Acosta, S. 0669 Agosti, M., 0548 Ahn, H. 0583 Amaral, C. 0622 Andrews, J.E. 0482 Angrosh, M. L. 0505 Arazy, O. 0535 Arencibia-Jorge, R.

0572 Arnott, G. 0675 Baca, M. 0664 Balíková, M. 0623 Balkhatir, M. 0662 Barátné Hajdu, Á.

0499, 0500 Basile, P. 0536 Bawden, D. 0691 Beck, H.W. 0532 Bederson, B. 0591 Bergenholtz, H.

0612 Bianchini, D. 0514 Bilodeau, B. 0567 Blanchard, A. 0637 Boldiš, P. 0604 Bonfiglio-Dosio, G.

0548 Borlund, P. 0649 Bornmann, L. 0645 Bothma, T. 0603 Bovbel, E. 0654 Broughton, V. 0480,

0481, 0491, 0582 Brunt, R. 0692 Buizza, P. 0522 Burton, W.C. 0579 Calhoun, K. 0670 Caranfil, L. 0553 Cathey, R.J. 0529 Chang, C-C. 0540 Chapman, A. 0671 Chapman, J.W. 0665 Charhad, M. 0662 Chatterjee, K. 0653 Chen, G. 0605 Chen, S.-C. 0653 Chen, Y.-H. 0615 Chen, Y-L. 0609 Chen Yintao 0632 Chen, Z. 0507, 0617 Cheng, L-C. 0609 Cheng, Y.-L. 0609 Cheung, K. 0526

Chiang, Huei-Min 0503

Chihaia, L., 0688, 0689

Chipcea, E. 0688 Choi, Key-Sun 0639 Chou, C.-H. 0615 Chun-Yao Huang

0696 Clavel, P. 0629 Cole, C. 0482, 0697 Conduit, N. 0658 Conway, C.N. 0495 Cordeiro, I. M. 0554 Costa, V. S., 0537 Covaci, M. 0688 Cuvillier, J. 0641 Czerniejewski, B.

0624 Dabney, D. 0578 Dahlberg, I. 0643 Dalbin, S. 0510, 0511 D’Ambrosio, D.M.

0672 Daniel, H-D. 0645 Danskin, A. 0681 De Campos, L. M

0542 Dehuri, S. 0588 Dina, Y. 0580 Dmitry, M. 0654 Doucet, A. 0549 Druin, A. 0591 Dunlavy, D.M. 0530 Du Preez, M. 0497 Efron, M. 0552 Erdelez, S. 0699 Faber, P. 0614 Ferraioli, L. 0678 Ferro, N. 0548 Fielding, E. 0659 Fleharty, C 0557 Fong, A.C.M. 0607 Fourie, I. 0603 Francis, E. 0635 Frâncu, V. 0555 Frosch, P. 0666 Fu, Y. 0590 Fu Lee Wang 0616 Fugmann, R. 0643 Gagnon, G. 0569 Galvez, C. 0690 Gery, M. 0550 Giunchiglia, F. 0544

Given, L.M. 0698 Goh, D.H. 0587 Gómez González-

Jover, A. 0611 Gopal, T.V. 0502 Goyal, O.P. 0598 Han, C.-C. 0615 Han, I. 0583 Han, L. 0605 Harai, N. 0630 Harcourt, K 0600 Harper, C.A., 0533 Haslinger, I. 0631 Hatfield, A.J. 0679 Haus, G. 0620 He, W. 0699 Hert, C.A. 0673 Heuwing, B. 0625 Hickey, L. 0579 Hidderley, R. 0682 Hillson, B. 0656 Hjørland, B. 0512 Hoover, L. L. 0642 Horowitz, E. 0546 Hsieh-Lee, I. 0655 Hsu, C-f. F. 0686 Hu, J. 0589 Hu, Y. C. 0595 Hu Yuefang 0632 Huang, Chun-Yao

0696 Huang, H. 0648 Hui, S.C. 0607 Hunt, K. 0518 Hunter, J. 0526 Hutchinson, H.B.

0591 Intner, S.S. 0496,

0674 Isaac, A. 0684 Jamsen, J. 0584 Janecek, P. 0597 Jimenez, A. G. 0515 Johnston, B. H. 0569 Kang, In-Su 0638 Kapoor, K. 0598 Karamuftuoglu, M.

0501 Karypis, G. 0606 Kasten, J. 0506 Ke, W. 0590 Kelley, S.D. 0679 Kelsey, P.J. 0633 Khairy, I. 0558

Kharkevich, U. 0544 Khoo, C.S.G. 0587 Kim, Jae-Ho 0639 Kim, K. 0583 Kim, S. 0532 Kishida, K. 0513 Kofnovec, L. 0680 Kousha, K. 0646 Kovac, T. 0556 Kuntz, B. 0565 Kunze, H. 0643 Kuramochi, M. 0606 La Barre, K. 0521 Lacoste, C. 0573 Laurent, D. 0622 Lazar, J. 0596 Lazarinis, F. 0585 Lazinger, S.S. 0674 Lehtonen, M. 0549 Leman, S. 0667 Levinson, D. 0576 L'Homme, M.C.

0613 Li, T. 0545 Li, Y. 0640 Li, Y-C. 0540 Liao Jianxin 0596 Lin, Wen-Yau C.

0519 Lioma, C. 0543 Lisek, S. 0626 Loh, H. T. 0541 Lopes, R. 0537 López-Huertas, M. J.

0577 Lørring, L. 0693 Lu, K. 0507 Madalli, P. 0516 Mall, R. 0588 Mandl, T. 0602, 0625 Manevitz, L. 0504 Mangan, E. 0663 Mao, C-C. A. 0686 Martí-Laher, Y. 0572 Matthews, D. 0644 Mazzocchi, F. 0621 Ménard, E. 0627 Michon, J. 0677 Miksa, F. 0479 Miksa, S.D. 0520 Miller, D. P. 0494 Mitchell, J. S. 0562,

0619 Modeste, J. 0580

Montgomery, P. 0559 Moore, J.L. 0699 Mørk, K. 0636 Mortchev-Bouveret,

M. 0528 Mostafa, J. 0590 Moya-Anegón, F.

0690 Muh-Chyun Tang

0574 Munk, T.B. 0636 Naumis-Peña, C.

0581 Nero, L. M. 0619 Neumann, A 0592 Nielsen, S. 0612 Niemi, T. 0584 Nissen, M.E. 0497 Ogihara, M 0545 Ojala, M. 0524 O'Leary, M. 0601 Orphan, S. 0566 Ou, S. 0587 Ounis, I. 0543 Panici, A. 0508, 0687 Peng, D. 0538 Pestov, V. 0575 Petgnet, D. 0560 Petrou, A.D. 0676 Pinto, A. 0620 Ponnusamy, R. 0502 Popa, M. 0689 Poulter, A. 0694 Quesnel, O. 0635 Quinn, S. 0492 Raban, D.R. 0610 Rafferty, P. 0658,

0682 Ramírez, I de T.

0577 Rédy, G. 0592 Richman, D. 0561 Rorissa, A. 0660 Rousseau, R. 0647 Rowley, J. 0527 Ru, Y. 0546 Ryoo, J. 0593 Sagonas, K. 0537 Saiedian, H. 0593 Salo, J. 0498 Samantray, S. D.

0539 Satija, M. P. 0531 Sato, H. 0568

Page 94: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

Knowl. Org. 34(2007)No.4 Personal Author Index 34(2007)

283

Sawyer, S. 0648 Schiff, A. 0564 Schneider, J.W. 0649,

0652 Seabrook, N. 0657 Shawe-Taylor, J.

0640 Shen, J-J. 0540 Shi, Xiaodong 0608 Shimada, M. 0525 Slavic, A. 0582 Smiraglia, R. 0490 Smith, S. 0557 Song, D 0547 Soonja Lee Koh, G.

0523

Spink, A. 0482 Spotti Lopes Fujita,

M. 0695 Stirling, D. A. 0571 Stojmirović, A 0575 Strossa, P. 0586, 0628 Strotgen, R. 0625 Sukula, S. K. 0534 Sutó, Z. 0592 Tang, Muh-Chyun

0574 Taylor, A.G. 0494 Tennis, J.T. 0551 Tenopir, C. 0700 Thelwall, M. 0646 Tho, Q.T. 0607

Tiberi, M. 0621 Tillett, B. 0533 Trickey, K.V. 0493 Uddin, M.N. 0597 Ungváry, R. 0517 Uralman, N. H.

0668 Urs, S. R. 0505 Van der Linden, H.

M M. 0509 Van Otegem, M.

0631 Vanclay, J.K. 0650 Vasudev, P. 0539 Vega-Almeida, R.L.

0572

Vizine-Goetz, D. 0562, 0619

Wacker, M. 0600 Wang, Fu Lee 0616 Wang, Tai-Yue, 0503 Weihs, J. 0562, 0674 Wells, D. 0599 Whited, M. 0618 Williamson, N.J.

0594 Wolley, I. 0600 Woo, C. 0535 Xiaodong Shi 0608 Xie, L. 0605 Yang, C. 0616 Yang, C.C. 0608

Yang, S. 0661 Yee, M. M. 0683 Yintao, Chen 0632 Yousef, M. 0504 Yuefang, Hu 0632 Zaihrayeu, I. 0544 Zanotto, E. D. 0651 Zhan, J 0541 Zhang, R. 0595 Zhu, S. 0545 Zhu Xiaomin 0596

Page 95: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,
Page 96: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

In Zeiten schrumpfenderZuschüsse aus der öffentlichen

Handmüssen Wirtschaft und Kultur

zusammenarbeiten,um Kultur als für beide Partner

wichtige Säuleam Leben zu halten.

Wie findet man Sponsoren?

Und wo?Wie spricht man sie an?

Wie findet man zu einer beidseitigfruchtbaren Partnerschaft?

Media-Agentur Schaefer - Dr. Frauke Schaefer - Lange Straße 14 - 04103 LeipzigTel.: +49 341/30 10 620 - Fax: +49 341/30 10 621 - [email protected]

w w w. m e d i a - s c h a e f e r. d e

“Sponsoren gewinnen - aber wie?”

KULTURBRAUCHT

WIRTSCHAFTWIRTSCHAFT

BRAUCHTKULTUR

Trainings für Kulturinstitutionen und Künstler

Page 97: KNOWLEDGE ORGANIZATION KO - Ergon-Verlag · with the aid of the CMapTool and WoPeD graphic applica-tions. Two distinct professional communities have been in-volved in the research,

HARRASSOWITZ VERLAG • WIESBADENwww.harrassowitz-verlag.de · [email protected]

Dan Hazen,James Henry Spohrer (Eds.)

Building Area Studies CollectionsBeiträge zum Buch- und Bibliothekswesen, Volume 522007. VIII, 163 pages, hc ISBN 978-3-447-05512-3€ 68,− (D) / sFr 116,−

These essays by noted Area Studies specialists at a number of US research libraries serve as a practical and theore-tical guide to university and college ad-ministrators, library directors and heads of collection development, as well as se-lection practitioners who work to create foreign-language collections for research libraries. The volume constitutes a gene-ral introduction for new practitioners and even the most experienced Area Studies librarians will fi nd useful practical advice for reviewing and refi ning their existing collecting practices. Coverage includes the Middle East, East Asia, Latin America, Southeast Asia, Africa, and the Romance language areas of Europe, as well as the German/Nordic/Netherlandic countries. Each essay presents the Area Studies topic in question from an historical per-spective and provides background on its present status and anticipated future development. Special emphasis is placed on the techniques of both print and di-gital collecting and on the assessment methods by which collection strengths and future needs are determined. Guide-lines for expenditures for both collections and collateral activities such as providing access and preservation are provided, and contributors also supply extensive documentation for the burgeoning array of online digital resources which have emerged in the past decade. The volu-me editors, Dan C. Hazen (Harvard) and James H. Spohrer (University of Califor-nia, Berkeley), also provide a general introduction to the topic and a detailed summary of current cooperative activities in Area Studies collecting.

Konrad Umlauf

Medienkunde2., aktualisierte und neu gefasste Aufl age Unter Mitarbeit von Susanne Hein und Daniella SarnowskiBibliotheksarbeit 82006. 350 Seiten, 45 Tabellen, brISBN 978-3-447-05052-4€ 34,− (D) / sFr 59,−

In Bibliotheken, Archiven und allgemein in Mediensammlungen sind heute die Bestände multimedial. In diesem Buch werden im Zusammenhang dargestellt:

� die technischen Grundlagen der Non- print-Medien,� ihre Produktion und Distribution,� die Strukturen und Schwerpunkte ihrer

Inhalte und Darstellungsformen: Musik, Film, elektronische Publikationen, Lite-ratur und Kinderprogramme, Compu-ter- und Videospiele

� wesentliche Ergebnisse der Rezepti-onsforschung in Bezug auf Nonprint-Medien.

Ewa Bagłajewska-Miglus,Rainer Berg

POLNISCHWörterbuch fürBibliothekenDeutsch-PolnischPolnisch-DeutschBibliotheksarbeit 132006. XXVIII, 320 Seiten, gbISBN 978-3-447-05323-5€ 49,80 (D) / sFr 86,–

Die deutsch-polnischen Beziehungen werden immer vielfältiger und intensi-ver, auch im bibliothekarischen Umfeld. Das „Wörterbuch für Bibliotheken“ mit seinen rund 7000 Fachtermini wendet sich vorwiegend an alle, die Deutsch und Polnisch in ihrem bibliothekarischen Alltag brauchen. Darüber hinaus kann es aber auch Wissenschaftlern und sons-tigen Interessenten in beiden Ländern von großem Nutzen sein. Im Wörterbuch wird der für das Buch- und Bibliotheks-wesen relevante Wortschatz – und somit auch die Terminologie der Informatik und Computerwelt – möglichst umfassend berücksichtigt.

Jahrbuch der Deutschen BibliothekenBd. 62, 2007/2008Herausgegeben vomVerein Deutscher Bibliothekare

2007. 594 Seiten, gbISBN 978-3-447-05526-0€ 79,− (D) / sFr 134,−

Das Jahrbuch erscheint seit 1902 alle 2 Jahre und informiert u. a. über Perso-nal, Organisation, Sammelgebiete und Etats von über 700 wissenschaftlichen Bibliotheken. Das Verzeichnis der wis-senschaftlichen Bibliotheken enthält die Namen und Anschriften, Telefon- und Faxnummern, E-Mail-Adressen und Homepages, des weiteren den Umfang der Bestände, genauere Angaben zur Art der Bestände, den Umfang der Lehr-buchsammlung, die Höhe der Mittel zur Bucherwerbung, Öffnungs- bzw. Aus-leihzeiten, Anzahl und Einstufung der Mitarbeiter, Namen der wissenschaftli-chen Mitarbeiter, Veröffentlichungen über die Bibliothek und ggf. Angaben über ein bestehendes Pfl ichtexemplarrecht, eine Amtsdruckschriftensammlung und über besondere Sammelgebiete.

Engelbert Plassmann,Hermann Rösch, Jürgen Seefeldt, Konrad Umlauf

Bibliotheken undInformationsgesellschaftin DeutschlandEine Einführung2006. XI, 333 Seiten, 6 Karten, gbISBN 978-3-447-05230-6€ 39,80 (D) / sFr 69,−

Gesellschaftliche Kommunikation stützt sich seit beinahe 3000 Jahren auf Biblio-theken und ihre Vorformen. Mit diesem Band wollen die Autoren belegen, dass professionelles Informationsmanagement in der Informationsgesellschaft wichtiger ist denn je. Darüber hinaus wollen sie un-ter Beweis stellen, dass Bibliotheken und bibliothekarische Techniken sich her-vorragend dafür eignen – vorausgesetzt allerdings, die nötigen Innovationen und Kooperationen werden zügig eingeleitet und konsequent praktiziert.