draft paper for new review of hypermedia and...

48
Faceted classification as a basis for knowledge organization in a digital environment; the Bliss Bibliographic Classification as a model for vocabulary management and the creation of multidimensional knowledge structures. Abstract: The library classification scheme was the first means of subject access to information, but is largely disregarded as a tool for the management of electronic resources; modern classifications built on facet analytical principles are more appropriate to this purpose than is generally realised. Faceted classifications as exemplified by the Bliss Bibliographic Classification (BC2) are powerful tools for the management of vocabulary, characterised by a rigorous analytical approach to terms, and the clear identification of semantic and syntactic relationships and structures. The philosophy and function of BC2 are described, as is the process of building a knowledge structure on facet analytical principles. The range of related functions of such structures when employed as knowledge management tools (as classification, thesaurus, subject heading list, browsable index) is considered, as is the potential of facet analytical knowledge structures for the management of digital materials. Facet analysis is regarded as a powerful methodology for the creation of structures appropriate to specific retrieval requirements in a range of contexts, with emphasis on the problems of complex subject description and retrieval and multidimensionality. 1. The historical perspective; library classifications as retrieval tools Historically, subject access to document collections was only achieved with the arrival of the bibliographic classification scheme, or its close relation the 1

Upload: others

Post on 11-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Faceted classification as a basis for knowledge organization in a digital environment; the Bliss Bibliographic Classification as a model for vocabulary management and the creation of multidimensional knowledge structures.

Abstract: The library classification scheme was the first means of subject access to information, but is largely disregarded as a tool for the management of electronic resources; modern classifications built on facet analytical principles are more appropriate to this purpose than is generally realised. Faceted classifications as exemplified by the Bliss Bibliographic Classification (BC2) are powerful tools for the management of vocabulary, characterised by a rigorous analytical approach to terms, and the clear identification of semantic and syntactic relationships and structures. The philosophy and function of BC2 are described, as is the process of building a knowledge structure on facet analytical principles. The range of related functions of such structures when employed as knowledge management tools (as classification, thesaurus, subject heading list, browsable index) is considered, as is the potential of facet analytical knowledge structures for the management of digital materials. Facet analysis is regarded as a powerful methodology for the creation of structures appropriate to specific retrieval requirements in a range of contexts, with emphasis on the problems of complex subject description and retrieval and multidimensionality.

1. The historical perspective; library classifications as retrieval tools

Historically, subject access to document collections was only achieved with the arrival of the bibliographic classification scheme, or its close relation the alphabetical subject heading list. Prior to this, arrangement of books was usually arbitrary; some principle of organization might be employed, such as the date of accession, or the name of the author, but the retrieval of a document by its intellectual content was usually impossible. Catalogues operated on the basis of author or title entry, and known-item retrieval was the only way to access the collection (except insofar as the librarian was familiar with the contents of the library).

At the end of the nineteenth century therefore the classification scheme was innovative in its approach, effective in its purpose and intellectually credible.

Classification schemes at that time had two distinct, but interdependent, functions. The first, and always dominant, purpose was the physical arrangement of the items in the collection. In addition to broad subject grouping the library organized by a classification scheme offered the advantage of browsability; not only were documents about the same subject placed together, they were contiguous with documents about related subjects, and there was a general progression along the shelves (and through the classified file or catalogue).

The second purpose was retrievability. Not only could the user scan the shelves or files for all that was available in a broad subject area, he could also retrieve an item on a specific subject (provided of course that the detail of the classification allowed for this). This had always been recognised as a function of the classification scheme,

1

Page 2: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

from its inception. Dewey (1) himself recommends the close classification of all documents as essential to the function of the classification scheme. “Assign every book to the most specific hed that wil contain it: do not mark all ancient history 930, but put history of ancient Rome in 937, of ancient Greece in 938, etc. If there ar only 10 books on a given subject it is useful to hav them stil farther groupt by topics or form of treatment, for otherwise, they hav only accidental order, which servs no one”. [The spelling is Dewey’s own. He was an advocate of spelling reform, and the Introductory matter in the Decimal Classification was in simplified spelling until the 16th edition.]

Nevertheless, in the twenty-first century classification schemes are largely disregarded as effective mechanisms for indexing or searching over very large databases. A recent survey by Koch (2) of the use and effectiveness of traditional classification schemes for the organization of managed Internet hubs and gateways was generally critical of them. There are several features of traditional schemes which suggest why they fail in this respect, arising from the lack of internal logic and consistency of structure which characterise those classification schemes originating in the nineteenth century.

2. Characteristics of the traditional library classification scheme

Classifications are principally distinguished by two factors; their approach to terminology, and the basis of their internal structure.

The significant attribute that sets a classification apart from a word list, thesaurus, or set of subject headings, is the systematic or structural basis of the classification. Its approach is to collocate like with like, and to present an ordered representation of knowledge. From this spring some distinctive properties of the classification scheme:

Classifications have traditionally been thought of as managing concepts rather than terms; the classification concerns itself with ideas rather than words, and a class represents the notion of say, a parrot, or a typewriter, rather than the word itself. To this end the control of vocabulary is an important feature of such systems; synonyms and quasi-synonyms are identified and treated as equivalents, and other semantic relationships are incorporated into the structure of the scheme.

Within such a system notational encoding is essential to the management of concepts, since otherwise there is no means of maintaining a linear order (the first requirement of the conventional scheme). The notation is also seen as in some way representative of the concept it is attached to. With the arrival of automation in libraries more emphasis was placed on the retrieval function of classifications; classifications such as the Universal Decimal Classification (3), with its expressive notation and high level of specificity were regarded as particularly suitable for use with mechanised systems since the notational codes could be searched in a similar way to keywords.

There is a generally held view that the use of concepts avoids the imprecision and complications of natural language; although this is true to a large extent and the classification is a good working tool in this respect, difficulties can still be encountered. Perfect synonymy is rare, and particularly in a multilingual environment problems occur with the lack of hierarchical correspondence from the conceptual

2

Page 3: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

structure of one natural language to another. An IFLA working party led by Riesthuis (4) looking at the issue of inter-language switching has identified a number of problems particularly regarding the lack of equivalents for names of groups and sets (as opposed to single entities).

However the major shortcoming of the early library classifications lies in the internal structure. Classifications originating at the turn of the nineteenth century do not, on the whole, display very much in the way of internal logic, or fundamental structural principles; classificatory theory was not expressed in a formal way before the 1920s and 1930s, mainly in the works of H. E. Bliss (5, 6) and S. R. Ranganathan (7, 8). The most that can be said in the majority of cases is that they view knowledge as consisting of a number of main classes (disciplines or fields of study) which are then divided and subdivided into smaller and more specific sub-classes. There are usually no rules governing how this is done. Beginning as they do from the point of view of knowledge as an integral whole, and having a discipline, or aspect (as opposed to entity) based approach to knowledge, traditional schemes are also ineffectual at addressing the specific problems of vocabulary; they do not consider the precise relations between concepts, but concern themselves mainly with displaying relationships of hierarchical subordination whether these are justified or not.

3. Facet analysis as a basis for vocabulary management

Classifications developed from the mid-20th century onward take a fundamentally different approach to the problem of knowledge organization. Instead of dividing a large theoretical ‘homogenous mass’ of knowledge, classificationists began with individual terms or concepts and built these upwards into structures reflecting identified items of knowledge.

The method of facet analysis was conceived by S. R. Ranganathan (7, 8) in the 1930s. It is a methodology which involves creating classificatory structures “from the bottom up”, that is, starting with the individual terms or concepts of a subject, rather than working “from the top down”. Ranganathan, while still working within the broad structure of the traditional disciplines (history, chemistry, religion, arts, etc.) began with the individual terms in each subject area. These terms he organized into five broad categories or facets;

Personality – the most significant group of terms in each discipline e.g. substances in Chemistry, plants in botany, nations in history

Matter – materials and physical constituents of things Energy – action or activity terms Space – where things occur Time – when they occur

Terms from each category are then combined (in the order PMEST) to provide for the description and location of documents with compound subject content. The classification itself consists only of codes representing simple concepts or isolates, which are combined when required, using the facet formula. The thinking behind this approach to the understanding of intellectual content underpins all of modern facet

3

Page 4: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

classificatory theory, and in it lies the essence of the move away from the enumerative systems, such as the Dewey Decimal Classification (9) or the Library of Congress Classification (10) which list almost all possible classes including many compound subjects.

The fundamental PMEST analysis is the foundation for the development of the first faceted scheme of classification, the Colon Classification (11), which is still widely used in India today, and has been occasionally used in the United Kingdom.

Ranganathan’s basic categorization, although it represents a colossal breakthrough in the methodology of indexing, contains a very small number of categories with which to manage the complexities of relationships as they occur in modern documentation. As a consequence it is necessary for the Colon Classification to utilise the notion of different rounds and levels, particularly of Personality and Energy facets in order to cope with obviously distinct sub-facets within a given subject. While the Colon Classification clearly identifies the facet structure of specific subjects, the facet formulae for the relevant classes will normally consist of repetitions of P, M and E. For example, the facet formula which is to be applied in Class L, Medicine, is L, [P]; [M-P]: [E], [2P], and in Class O, Literature, it is O [P], [P2] [P3], [P4]. The Colon classification is therefore rather complex to apply, and has not generally found favour outside the sub-continent. The last known British application of the Colon Classification is at Christ’s College Cambridge, where, sadly, it is in the process of being replaced by the Library of Congress Classification. The Library website (12) says of this decision “… the classification scheme was … becoming increasingly impractical. It was also an extremely difficult system to use. The Library of Congress scheme has been adopted to replace Colon. This will be much more straightforward to use….” The convenience of using an institutionally supported scheme with centralised services commonly overrides questions of effective performance nowadays and it remains to be seen whether the Library of Congress Classification will function as effectively as a retrieval tool.

4. The development of faceted classification

There is currently much interest in faceted classification, although facet analysis has many disparate interpretations. At the broadest level it is taken to mean any system in which there is any element of synthesis in document description, whether this is in the form of words, or an encoded notation. This approach shows little understanding of the techniques involved in facet analysis, or indeed of the complexity of the analysis and the degree of sophistication that can be achieved in document description. Often a faceted classification is understood to be no more than a scheme in which different aspects of a document can be identified and tagged; these aspects might be no more than the form of the document, or the date or place of publication . I have even heard the subject of a document described as ‘one facet’ (this within the context of the Library of Congress Classification, which is indeed by this standard a faceted classification, since the classmarks denote the subject, author and date of publication). Much of the discussion of faceted classification on the Web regards it in this rather basic way.

Nevertheless there have been some serious studies of the role of faceted classification as a retrieval tool, which consider its usefulness as an approach to information

4

Page 5: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

handling beyond the management of physical collections. Prior to the emergence of the World Wide Web these focused on the suitability of facet techniques for storage and retrieval of managed on-line databases.

In a paper which considers the process of searching across on-line databases using faceted classifications for index description (including Library and Information Science Abstracts which employs the CRG LIS classification (12)), Godert (13) demonstrates that faceted classifications are also effective in supporting searching using Boolean operators and truncation. “It also becomes clear … that classificatory structures permit the answering of questions that cannot be answered by verbal questions alone, …. without the elaboration of conceptual networks.”

In an investigation of the application of Ranganathan’s principles to a range of information retrieval problems, Ingwersen and Wormell (14) conclude that “… the discussion demonstrates the suitability of the faceted categorisation, not only for textual documents but also with other forms of carriers of information. Faceted categorisation may provide multi-dimensional and hence structured entry points to document contents, and thus give intellectual access to generated and stored knowledge.”

The effectiveness of a logically structured system applicable to any type of information carrier, and with potential for multiple points of access had therefore already been recognised as particularly appropriate for the management of documents within an electronic environment, at a stage when that environment was still relatively controlled, and Ingwersen and Wormell seem to point the way ahead for the application of facet analysis to digital libraries and to the World Wide Web.

5. Development of facet analysis in the United KingdomFacet analytical theory as it is generally understood in Europe today, was developed largely by the members of the United Kingdom Classification Research Group during the 1950s and 1960s. This group expanded Ranganathan’s original five categories (Personality, Matter, Energy, Space and Time) to a set of thirteen categories (see below) which are used for the analysis and organization of terms. In addition to this categorical analysis a standard citation order, or order of combination between the categories was also developed, this being based on various principles of ordering, such as the progression from general to special, increasing concreteness of terms, and pragmatic order based on preferred collocations of documents.

The combination of categorized sets of terms (which displayed taxonomic features of equivalence, and super- and sub-ordination) and the imposition of rules for combination allowed the generation of complex subject classifications, which are systematic and predictable, and which exhibit a rigorous internal logic. Examples of special classifications developed during this period are those by Coates (15), Foskett (16), Langridge (17), Croghan (18), Broxis (19) and Daniel (20), while the general theory of faceted classification construction is represented in Vickery’s work (21). Vickery’s volume on faceted classification in science (22) also includes model faceted classifications for soil science and container manufacture prepared by him and Foskett. Other classifications, and thesauri built on faceted principles with constituent classificatory structures include the prototype for modern thesauri Thesaurofacet by Jean Aitchison (23), and Construction Industry Thesaurus (24).

5

Page 6: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

These classifications were intended for the organization of documentation, whether in the form of a physical collection, or in a catalogue or published bibliography; the primary purpose of the knowledge structure created was the reduction of a complex n-dimensional subject to a formulaic expression which could be predictably located in a linear sequence. The process is not dissimilar to that of map referencing using gridlines to provide a numerical code for the location of a feature, although in the case of the classification, any number of dimensions can be accommodated. Retrieval from such an organized sequence is extremely efficient, and a considerable advance on the retrievability that can be achieved using older, less rigorously structured classifications.

It seems likely that such logical and predictable structures may be useful not only within the environment of a print-based physical document collection, but also within a digital context, where linearity is not a consideration, but multidimensionality is, and where conventional classification schemes fail to address the problems of subject description and retrieval at an appropriate level of sophistication. The use of both the semantic structures (categorization) and system syntax (citation order) of classical facet analysis to generate knowledge structures which fully express and exploit the multiplicity of attributes and relationships of digital objects, would appear to be a logical next step in the development of the facet analytical methodology.

6. The Bliss Bibliographic Classification

The only example of a universal system of bibliographic classification built on the facet analytical principles developed by the CRG is the Bliss Bibliographic Classification 2nd edition (25), hereafter referred to as BC2.

Originally intended as a straightforward revision of the original Bibliographic Classification of H. E. Bliss (26), BC2 later came to be regarded as the new general classification scheme long envisaged by the UK Classification Research Group, which would embody all of the theory developed by them during the latter half of the twentieth century. The history and background of this proposed new general classification and its relation to BC2 are discussed in a recent article reviewing the work of the CRG (27).

Bliss’s original classification is widely regarded for the scholarly principles underlying its construction, and many of its characteristic features have been preserved. The main class order is maintained, and there is a general correspondence to the overall structure of most classes. But while it sits on the infrastructure of the original scheme much of the detail and the internal structure of BC2 is new. At the broad level notation has been retained as far as possible, but the exponential growth in most disciplines since the time of Bliss, and the consequent increase in the size of terminologies, and the complexity of the subject content of documentation, means that the detail of the new classification is considerably different from the original. The Bliss Classification Association (28) holds the copyright on the new edition and is responsible for the management of the scheme.

6

Page 7: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

The classification is published in separate subject volumes, together with an Introduction (29) which deals at some length with the theory and structure of the classification, and rules for its practical application. (The Introduction to BC2 also constitutes one of the few published statements of modern facet analytical theory.) The development of what is essentially a series of subject specific classifications has provided a testing ground for the underlying theory. The mechanics of schedule construction have been considerably sophisticated during this process, and the method of facet analysis has been found workable across a number of disciplines, and in the most obscure and intractable of subjects.

In the rest of this paper BC2 is used as the model for the modern faceted classification scheme, and examples are taken from BC2 schedules throughout. Although others are also active in this field (notably at the Documentation Research and Training Centre in Bangalore where the Colon Classification continues to be developed) the confines of this paper do not permit a detailed discussion of all current work on faceted classification.

7. Relationships between terms within a controlled vocabulary

Modern controlled vocabularies will normally identify and display two major categories of relationships; semantic relationships, and syntactic relationships. This will be true both of faceted classifications, and thesauri built on the same principles. It is relatively easy to generate a thesaurus from a faceted classification; the resultant thesaurus will preserve the taxonomic relationships of the original classification, as well as a general syntactic relations, and the thesaurus may be regarded in this respect as one embodying facet analytical principles. A thesaurus of this type will normally consist of a the faceted classificatory or systematic display and a complementary alphabetic list of descriptors.

The following discussion only concerns itself with the relationships contained in the systematic structure or classification; The process of conversion from a faceted systematic structure to an alphabetical thesaurus display is clearly articulated in the work of Aitchison and Gilchrist (30) and elsewhere Aitchison (31) has described a methodology for deriving a thesaurus specifically from BC2 structures.

7.1 Semantic relationshipsThese are relationships between terms which deal with meaning and content, and include relationships such as those of containment, equivalence, near equivalence super- and sub-ordination.

Relationships of this type constitute the permanent hierarchical relationships in a taxonomy. These are relationships that will be found between terms in the same facet, and will principally consist of thing-kind and whole-part hierarchical relationships.

The following are examples of hierarchical relationships of both types, taken from BC2 schedules. [Note that the notation in BC2 is not always expressive of hierarchy i.e. the number of characters in the notation does not correspond to the level of

7

Page 8: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

hierarchy in the display; this is because the notation is only required to maintain the linear order.]

Thing-kind (genus-species) relationships

GRC CarnivoraGRE Felidae. Cat familyGRE M Felis domestica. Domestic cat

EHN U LakesEHN UP Open lakesEHN UR Closed lakes

Whole-part relationships

HWJ Mouth (parts)

HWJ SY Glands of the mouthHWJ U Parotid glandHWJ VP PalateHWJ W Lips

EA United KingdomEB EnglandEF East AngliaEFG Suffolk

These relationships are equivalent to the BT (broader term) and NT (narrower term) relationships in a thesaurus. Although there may be occasional examples of relationships which can alter (such as political subdivisions), these relationships are normally permanent.

Also included are relevant relationships in the area of vocabulary control (synonymy, partial synonymy and conceptual equivalence) that are normally described by USE and UF tags in a thesaurus; these are usually collocated in a classification, e.g. in BC2 Class AY/B General Science and Physics, we find:

BKN L Low frequency waves, kilometric waves, LF wavesBKP Very high frequency waves, VHF waves, metric waves,

millimetre waves

Here the notation is used to control the vocabulary, rather than the nomination of preferred terms as in a thesaurus. Preferred terms need not be identified since the notation is used to represent all synonyms or near-synonyms.

Relationships that do not conform to these criteria are considered not to be true relationships of containment or subordination; although such relationships occur

8

Page 9: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

frequently in taxonomies in bibliographic classifications they are more usually examples of syntactic relationships.

7.2 Syntactic relationships

These are relationships between terms that are not significant in terms of hierarchy or meaning. Commonest examples are the relationships between the component words of compound terms and subjects e.g. heart surgery, fashion photography. These relationships are essentially impermanent i.e. they do not state a necessary, continuing, or dependent relationship between the component terms; the relationship only exists when the concepts are brought together within the compound notion. In the example of say , wheat harvesting, there is no essential connection between the two terms; the relationship is that of an object and an operation on it – harvesting is not a kind, or part of wheat, and as a operation it can be linked with any other entity or crop.

It is sometimes the case that these intersections give rise to unique terms; for instance the combination of ‘bones’ and ‘inflammation’ creates the concept of ‘arthritis’, and in chemistry the joining of the notion of a carboxyl radical with a methyl radical gives us the term ‘acetic acid’. From the indexer’s point of view, these terms need to be identified and acknowledged in the systematic display and in accompanying alphabetical indexes and thesauri. The relationship between the contributing elements and the new term seem ‘stronger’ and more permanent than the mere coupling of terms from different areas of the vocabulary. Nevertheless, there is no necessary connection between, for example, the two concepts appendix and surgery except when they occur in combination in the concept appendectomy. It can be seen that appendectomy is not a true sub-class of either of its constituent elements, and should not be subordinated to either (in a hierarchical sense).

Within the context of specific document descriptions, the number and nature of non-permanent syntactic relationships tends to increase; for example, in documents with subjects such as the following:

i. Distribution of long wave photoreceptors in the compound eye of the honey beeii. Non-visual migration orientation of European robiniii. Effect of ionising radiation on the chromosomes in meiotic and mitotic

cells

In example iii the following categories are represented:

Entities (cells) – kinds (meiotic, mitotic) – parts (chromosomes) – relation (effect) – process (ionising) – agent (radiation)

There are clearly multiple and complex relations between the various constituent elements; none of these are topics that can be regarded as simple sub-sets of the broader class Cytology, and the placing of such a compound must be subject to some system of regulation.

9

Page 10: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Nevertheless, conventional library classification and indexing schemes (such as DDC, UDC and LCC) frequently fail to make adequate distinction between permanent hierarchical relationships, and relationships of syntactic association in complexes. As a result, structures are not logical (since the analysis is not rigorous), positioning of compound subjects is not predictable (since no operating rules for combination are normally present), and retrieval is unreliable.

The following illogical, but not untypical, hierarchy can be found in Class H, Social sciences, of the Library of Congress Classification (32);

HV Criminal psychologyHV 6083 Criminal responsibilityHV 6085 Language of criminals

SlangHV 6089 Prison psychology

Intellectual and aesthetic characteristicsHV 6098 TattooingHV 6110 Hypnotism and crime

Here are a number of illogical subordinations of terms which reflect compounding of conceptually distinct terms (criminal responsibility and psychology, tattooing and prison) rather than containment of the one by the other.

In contrast to this the fully faceted classification scheme such as BC2 utilises a sound theoretical basis for the construction of knowledge structures and semantic networks which distinguishes clearly between semantic and syntactic relations.

8. Features of the facet analytical structure of BC2

8.1 The analysis and organization of vocabularyFacet analysis provides a means for the primary organization of any vocabulary using a category based structure.

In BC2 thirteen standard categories are used, which have been developed from Ranganathan’s original five categories of Personality, Matter, Energy, Space and Time. The standard categories recognised in ‘classical’ facet analysis are:

Thing/entityKindPartPropertyMaterialProcessOperationPatientProductBy-productAgentSpace

10

Page 11: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Time

It can be seen that the categorical analysis is on a functional basis in which there are two different types of functionality; linguistic function and operational function. In essence, the categories represent a ‘production process’, and are particularly suitable for the analysis and organization of terms in technology (where originally they were developed). In analysing a given field one can ask, what is being done (or produced), what are its parts and properties, how this is achieved, by what means and by whom, where and when.

The categories can also be approached on a linguistic basis, since they can be equated with various parts of speech; entities and their types and parts are usually nouns, properties are adjectives, processes intransitive verbs, operations transitive verbs, and so on. This is a helpful technique for students and others who are new to the idea of categorical analysis; however categories so regarded cannot be as precisely defined as with the operational function, and that must be considered to be the primary understanding of the category.

These fundamental thirteen categories have been found to be sufficient for the analysis of vocabulary in almost all areas of knowledge. It is however quite likely that other general categories exist; it is certainly the case that there are some domain specific categories, such as those of form and genre in the field of literature.

8.2 Application of standard categories to a subject in BC2

The first stage in organizing the vocabulary of a subject is to allocate terms to various categories. If we take medicine as an example, we can identify most of the standard categories that are listed above:

Entities: human beingsKinds: human beings by age/gender/etc.

Children, the aged, women

Parts: parts, organs, and systems of the bodyHeart, immune system, nervous system

Materials: Constituents of the bodyBody chemicals, cells, membranes

Processes: Internal processesRespiration, disease

Operations: TherapiesSurgery, radiotherapy

Agents: Persons; medical personnelEquipmentInstitutions

Space: Political place

11

Page 12: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Spatial concepts; anterior, central, external

Time: PeriodTemporal concepts; continuous, chronic, terminal

With terms assigned to categories, these categories are now named as facets within the discipline; thus medicine contains a parts/organs/systems facet corresponding to the category parts, a diseases facet to the category processes, a therapy facet to the category operations and so on.

Arrangement of terms within a facet proceeds on a rather more pragmatic basis, clustering terms by the use of different ‘characteristics of division’ . For example, within the parts/organs/systems facet, a number of systems can be identified which are pervasive (vascular system, muscular system) rather than localised (liver, spleen); these will be listed together. Similarly, therapies might be identified as being chemical, surgical, occupational or even alternative. A group of terms generated by the application of a specific principle of division is known as an array. The order of arrays within a facet is decided upon using a variety of ordering criteria; obvious examples are chronological order, developmental order, physical contiguity, reflection of overall main class order, and the commonly used classificatory principles of increasing concreteness (abstract to concrete), increasing speciality (general before special) and increasing complexity (integrative level theory). Order of terms within an array is decided by the same process.

8.3 Examples of arrays and orders within medicine:

HQE Neoplasms, tumours (Types of neoplasms)

(By intensity)HQE JCR ActiveHQE JCS Quiescent

(By cause)HQE NF Radiation inducedHQE NO Viral

(By action)HQE Q BenignHQE S Malignant

(By degree of cell differentiation and histologic origin)HQE TS SimpleHQE TV HistoidHQE TX Organoid

HMW H Teeth (Types of teeth)

HWN C Unerupted tooth

(By position and function)HWN E Maxillary, upper tooth

12

Page 13: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

HWN G Mandibular, lower tooth

HWN H AnteriorHWN L Posterior

(By permanence)HWN Q Deciduous tooth, primary toothHWN R Permanent, secondary tooth

8.4 Examples of order within array:

Chronological order:

Q (Persons by age)QLT V AdultsQLT W Young adultsQLT X Middle-agedQLV Elderly, aged, old persons

Graded order

Auxiliary table of personsU (By rank or function)UD Senior staffUE Middle ranksUF Junior ranksUG Probationers

Quantified order

AS Climatic regions (By rainfall)

ASD Wet, heavy rainfallASE Humid areasASF Moderate rainfall areasASG Arid, dry areas

Within categories, the only relationships identified between terms (intra-category relations) are those of strict sub- and super-ordination, the semantic relations discussed above, which may be regarded as a permanent part of the hierarchical structure.

Other relationships between terms within the structure (inter-category relations) are not part of this permanent structure, and only generated as needed to meet the demands of compound concepts within specific documents. These are the syntactic relations.

13

Page 14: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

In an elaborated faceted classification structure, almost all of the relationships between compounded terms will be of this kind. A knowledge structure can be built as required to accommodate such compounds, but the placing is determined by the application of the system syntax (see below). No inter-category relations are displayed within the permanent structure of the classification; that structure only consists of elementary concepts organised into categories/facets.

9. Compounding terms for subject description; the system syntax

A controlled indexing language such as BC2 must be provided with a methodology for combination of terms, since the intellectual content of documents will not be restricted to simple single terms. The major issue for conventional bibliographic classification is the management of documents with compound subjects, since decisions must be made about the relative importance of the individual constituents of the compound. For example, should a paper about heart surgery be located with other documents about the heart or with other documents about surgery. The subject of a typical modern research paper will contain several different concepts, and the relative ordering of these terms in the subject description is clearly significant. If retrieval is to be achieved at all, then a system of rules for describing and locating subjects which are compounds must be in operation. Such a set of rules for the handling of terms is known as the system syntax.

Much work was carried out in the early days of information retrieval theory on the use of relational operators or indicators to be attached to index terms to specify their functional roles. Indexing using controlled languages of this type was intellectually demanding, and the notational codes often very awkward to manipulate. (PRECIS (33), the subject indexing system developed by Derek Austin for use at the British National Bibliography, which was another product originating in the work of the CRG, was ultimately abandoned partly because of the demands on staff time and energy in its appplication.) The system syntax of BC2 is infinitely simpler than this; it places the responsibility for functional status on the facet structure. In BC2 the syntax deals primarily with rules of order in compounding between categories or facets; this is known as the citation order.

Citation order in BC2 is ‘standard citation order’ and is the order of categories given above. For example, a document about ‘chemotherapy for liver carcinoma’ contains terms from the parts, processes, and operations categories (parts/organs/systems, diseases and therapies facets) and combination will follow the standard citation order parts – processes – operations, or in this specific case, liver – carcinoma – chemotherapy.

This order between facets possesses a measure of intuitive logic, and is based on some general principles, such as pragmatic decisions about the requirements of the subject field, and theoretical considerations, such as increasing concreteness and ‘general before special’.

It is clear that in applying a citation order to compound subjects, some aspects of the subject will be collocated, and others will be distributed according to their position in

14

Page 15: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

the citation order. It is of fundamental importance to establish what are the most significant elements in order to satisfy user needs. For example if historians study principally the history of nations, then Place must be the first cited facet in order to keep material on individual places together. If Period is chosen instead, documents on the history of nations will be scattered under different time periods. Needs of the user therefore, and the way is which the subject is studied (what Bliss called educational consensus) are important considerations in the establishment of citation order.

There are also some theoretical principles to guide us. Ranganathan’s original facet formula of PMEST was based on what he called ‘the principle of increasing concreteness’. That is, the most specific or most concrete aspects of the subject are to be regarded as the most dominant in a subject (using the method of schedule inversion discussed below, the first cited facet appears last in the schedule and files last in a linear sequence). This is equivalent to Bliss’s rule of ‘general-before-special’, and it is true that most searchers will intuitively seek more abstract concepts at the beginning of a sequence, moving onward to the more specific, more concrete and more complex notions.

The notions of user need, and of ‘general-before-special’ are to some extent combined in more recent theory into that of the ‘dependency’ of categories. The primary facet in a subject is considered as the ‘end-product’, or focus, of the subject, and is generally fairly easily arrived at by consideration of the subject, its purpose and users and activity in the field. This decided, the order between other categories follows on in a series of dependent notions. For example, in the area of Agriculture, the ‘crop’ (whether this be a plant or an animal and its products) is the end-product or thing. Dependent on the thing are its kinds followed by parts and properties (e.g. the leaves, roots and fruit of a tree are logically subordinate to the tree itself). Operations can only be performed on the thing as described, and the agent of an operation presumes the existence of the operation. The standard citation order therefore proceeds as a dependent chain from concrete to abstract, from special to general.

Standard citation order has been found to give optimum collocation of topics across a wide range of disciplines, but it is perfectly possible to use a variation on this order. For example, standard citation order will produce a structure in the biological sciences which makes organisms the primary facet, and distributes processes. During the second half of the 20th century the focus in biological science switched away from studies of particular organisms to the study of particular processes (ecology, molecular biology, genetics, etc.). As a consequence it became more useful to make process the first cited facet in any new classification for biological science.

Citation order may be variable, but within the same subject or discipline one order should be adhered to, in the interests of predictability.

9.1 Citation order and schedule inversion

Within a physical environment, with the need to establish a linear order, the principle of schedule inversion deals neatly with a number of practical issues. The constituent facets of a subject are inverted in the classification schedule proper; that is, they are

15

Page 16: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

listed or scheduled in the reverse of the citation order. For example, using the list of medicine facets from above:

Citation order:

(Kinds) Kinds of person(Parts) Parts, organs, systems

(Materials) Constituents (Processes) Diseases

(Operations)Therapies(Agents) Personnel, organizations, equipment

Filing, or schedule order

HH Medical sciences (Agents)

HHG PersonnelHHH OrganizationsHI Medical materials and equipment

(Operations)HJ Preventive medicineHL Curative medicineHN Clinical medicineHNP TherapyHNV Drug therapyHOL Surgery

(Processes)HP Diseases, pathology

(Parts/organs/systems)HTD MaterialsHTE Cells, tissues

HTJ Locomotor systemHUR Nervous systemHWI Digestive system

(Types of persons)

HXF FemalesHXO ChildrenHXW Aged

The basic citation order may repeated at any point in the schedule where needed to qualify any particular facet. For example, a lens is part of a camera, but it in its turn has kinds or types e.g. ‘concave’, ‘fish-eye’, and parts e.g. ‘surfaces’. In the same way the film that is used (itself an agent) may have parts e.g. ‘frame’, properties e.g.

16

Page 17: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

‘light sensitivity’, and processes such as ‘deterioration’. Operations may also have types (‘microsurgery’ is a type of ‘surgery’), and parts (‘opening’ is a part or stage of ‘surgery’). All that matters is that the relative order remains the same.

When compounding terms between facets, the same consistent application of standard citation order ensures that the compound files under the later scheduled term e.g. ‘diseases of the nervous system in women’ has the citation order ‘women – nervous system – diseases’ and will file under women. All of the component parts of a compound must be listed before the compound itself, so that no specific subject precedes any more general subject. This is the general classificatory principle of ‘general-before-special’ found in all bibliographic classifications, and which is logical and natural to users. Earlier terms are continually ‘brought down’ under later terms when compounding, effectively building backwards through the schedule, following the citation order through from the specific terms (at the end of the schedule) to the more general terms (at the beginning).

In practice this is achieved in BC2 by adding the notation in the reverse of alphabetical order (the initial letter of the class is dropped when compounding within the class).

In the example given above, a complex structure will be generated in the classified file and on the shelf, of the form:

HXF WomenHXF P Diseases of women HXF UR The nervous system in womenHXF URP Diseases of the nervous system in women

The application of the system syntax in this manner generates a knowledge structure which accommodates compounds of any degree of complexity in a straightforward, logical and predictable manner. This building of this potential structure is usually realised to some extent within the published schedules, in order to guide the indexer, and to accommodate the large numbers of unique technical terms in subjects such as medicine. An example is given here of the complexity of structure that can be achieved by this mechanical application of the citation order:[The notation used in compounding here is taken from Auxiliary Table H3, and ‘built’ classmarks are indicated by an asterisk.]

HVD Eye, ophthalmology, sightHVD PH Vision disorders *

(Elements derived from other Parts/organs/systems)*HVD QS Adnexia oculi, accessory partsHVD QTL (Bones) *HVD QTM Orbit, eye socketHVD QTM E (Neoplasms) Orbital neoplasms *HVD QTU (Muscles) *HVD QTV Oculomotor muscles *HVD QTV H (Pathology) *HVD QTV RJ Oculomotor paralysis *

17

Page 18: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

HVD QTW E Extrinsic musclesHVD QTW EGL (Surgery) *HVD QTW E GNE TenotomyHVD QUS (Nerves) *

(Cranial nerves) *HVD QUW TB (Sympathetic cervical ganglia)

(Pathology) *HVD QUW TBH Horner’s syndrome

HVE N Lens of the eye, crystalline lensHVE NH (Pathology) *

(Absence) *HVE NJM X Aphakia *

(Optical defects) *HVE NKM Cataract, lens opacities *HVE NKM GL (Surgery) *HVE NKM GNE Cataract extraction *

The compound terms in the above structure are all generated by the systematic application of the citation order to the basic facet structure of the scheme with its elementary single concept classes. Deep levels of the hierarchy are achieved by this mechanical process, and this could be taken further still; for example the last class HVE NKM GNE ‘Cataract extraction’ could have concepts such as the surgical technique, equipment, location and so on, attached if required. The concept analysis or subject string might then appear as:

Eye [entity] – lens [part] – pathology [process] – optical defects [product of process] – cataracts [kind of product of process] – surgery [operation]– corneal

replacement [kind of operation]– (using) lasers [agent 1]– (in) mobile units [agent 2]– (in) India [place]

which has taken us to eleven levels of hierarchy within the expanded structure. In theory there is no limit to the extent of compounding and the degree of complexity that can be attained.

It can be seen that a faceted scheme of classification is not an arbitrary arrangement of classes in a subject field, characterised by particular structural features, although it is often so regarded. Rather it is the end result of a process of analysis and synthesis using a methodology for the management of the vocabulary within the subject field. This methodology can be applied in different ways in different contexts to create different knowledge structures for different needs.

Facet analytical theory is essentially a method for the organization of vocabulary; it creates a systematic knowledge structure which can be used to create workable tools for vocabulary management (index and thesaurus terms, keywords, subject headings, descriptors, etc.). There is no obvious way in which the core vocabulary can be dealt with by machines, except insofar as linguistic categories might be identified mechanically; the initial allocation of vocabulary to categories must be carried out intellectually, but the detailed structure of the system can be generated mechanically

18

Page 19: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

once the infrastructure is in place. Compatible thesauri, subject headings and so on, can also be produced by standard mechanisms from the completed classification.

The essence of facet analytical theory is that:

It provides a basis for the analysis of terminology – categorical analysis The categories can be based on whatever attributes of documents/subjects

need to be identified for indexing and retrieval purposes Categories can therefore vary from one situation to another Intra-category or semantic relations are identified and can be used to create

permanent hierarchical or taxonomic structures within categories Inter-category relations are syntactic and should not be regarded as having

taxonomic value A system syntax is provided to cope with inter-category relations and the

combination of terms in complexes, which determines the correct placing of inter-category compounds.

There must be rules for compounding between terms and between categories, including citation order of terms

There are levels of complexity within subjects that require repetition of the citation order, and rules that allow this to proceed logically

In a physical environment where linear order is a priority, citation order is a necessary part of the system syntax, but this need not be the case in a digital context.

The combination of the faceted taxonomic structure, and the system syntax can be used to create complex knowledge structures (or semantic webs) of a regular and logical form

10. Subject access requirements of the digital libraryAgainst this background, current co-operative work between the School of Library, Archive & Information Studies at University College London, and the Arts and Humanities Data Service (34) and the Humbul Humanities Hub (35) has tried to identify the subject organization needs of these two very large managed collections of digital resources in the humanities. Both of these JISC (Joint Information Systems Committee) funded digital collections are also currently engaged in the development of the Humanities portal, a project which aims not only to identify, evaluate, collect and organize scholarly digital resources within the gateways, but to provide controlled access to the wider world of the Web. To this end they require an effective tool for the management of their resources, and preferably one which does not require tremendous intellectual input, or extensive training on the part of the indexers or meta-taggers (who for the most part may be the originators of the resources).

The identification and description of intellectual content of digital resources does not differ considerably from that associated with print-based material, nor indeed of indexed entities of any other kind. What is peculiar to the digital context is the search facility and the lack of need to adopt a linear arrangement. This can allow us to capitalise on the complexity of the document description and the potential multiplicity of access points. Other requirements of the digital library include a desire to exploit hypertext – most obviously in terms of the layout of the portal – and the

19

Page 20: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

usefulness in this context of the regular web-like or network structure of the expanded faceted knowledge structure.

Therefore attention must be given both to the overall organization of the site, and the way in which the faceted structure can support different means of access to the information contained, and to the creation of controlled vocabularies for attachment to objects in the form of metadata.

Ellis and Vasconcelos (36) in their study of the role of facet analysis in World Wide Web subject access suggest that “facet analysis cannot solve the problems of indexing for the unknown user. However because it adopts an a posteriori not a priori approach to classification, that is the classification is derived inductively from the concepts or terms used in the subject field, it can alleviate some problems in searching the World Wide Web by being applied to using subject directories or search engines.”

A classification or knowledge structure built on faceted principles can certainly achieve both of those objectives, and within the more constrained context of the digital library has additional features to offer.

11. Faceted structures as digital library management tools

At the School of Library, Archive & Information Studies, University College London, we are currently building a prototype of a faceted classification, along the model of BC2, for the organization of the new combined AHDS/Humbul Humanities portal. This will take the form of a faceted terminology for those subject fields covered by the portal, with specific rules for compounding with a view to developing a complex ‘directory’ style structure, populated by the resources themselves. This structure will be browsable via menus with hypertext links, and searchable using the terminology of the base classification. A number of other subject access tools will be developed from the basic structure.

11.1 Document descriptionWith an attached notation a classification provides a linear ordering device for the physical arrangement of documents or other entities, or for the management of files and document surrogates. In its original function as a bibliographic classification scheme, it achieves this by enabling complex subjects or any other n-dimensional objects to be described comprehensively, but in a way that allows reduction to a linear order without sacrificing retrievability. This capacity for complex description is of particular usefulness within the digital collection, even where there is no specific need for reduction to a linear order.

For the AHDS/Humbul project, a very simple faceted classification for religion was built as a demonstration. Terms were combined to show the level of complexity in combination that could be achieved for specific resources/documents/objects/websites (this is comparable to the example from the BC2 medicine schedule given above). It is envisaged that this expanded structure, with the simple classes of the original faceted classification populated by real examples of compound classes will form the basis of a ‘directory’ style approach to

20

Page 21: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

the organization of the portal, using the headings to form a series of ‘menus’ which can be expanded by using hypertext to access the next level of the hierarchy.

Religion, theology

Common subdivisions

Form subdivisions Bibliography Dictionaries, encyclopaedias Journals

Phase relations (influence, bias, comparison)

Place subdivisions Europe

United Kingdom Asia Africa North America

United States

Period subdivisions Middle Ages Fifteenth century Modern times. Post-Renaissance Sixteenth century Nineteenth century Twentieth century

Philosophy and theory of religion

Religious concepts. Religious ideas God The Universe Mankind

Evidences of religion Sacred books Pseudo-canonical works Criticism and interpretation Liturgical texts

Religious activities. Religious practice Moral behaviour. Moral theology. Religious ethics Social behaviour. Social theology Pastoral theology

Ritual behaviour. Worship. Rites and ceremonies Sacraments

21

Page 22: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Feasts and festivals Pilgrimages Reflective religion. Mysticism

Religious organization Ecclesiology Canon law Religious societies Religious orders Sects. Sectarian movements

Religious faiths Prehistoric and primitive religions Taoism Shinto Indian religion. “Hindu” religion broadly

Hinduism narrowlyJainismSikhism

Buddhism Religions of antiquity and minor religions Judaism Christianity

Eastern Church Orthodox churchWestern Church Roman Catholic church Protestant churches

Anglican church Islam

Sunnite IslamShi’ite Islam

Modern spiritual movements

The terms in this basic schedule can now be combined to give more specific classes.

It is not absolutely necessary to use a notation for this structure (although it could be considered helpful in other respects) since the structure may be set up in such a way that it can be browsed using hypertext to expose the deeper levels. If the structure is opened up in successive stages by exposing one additional level of the hierarchy through each link (effectively the same thing as the addition of one compounded concept at a time), the amount of the hierarchy evident at any one stage will hopefully be manageable without the need for an ordering device. The notation may of course be useful to the indexer in combining terms, but need not be made evident to the browser or searcher. Compound classes can be built without reference to notation, provided the schedule order is maintained.

11.2 Compound classesAn example of expansion is here provided:

22

Page 23: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Judaism* Compound classes here are represented in natural language

form, rather than as subject strings (Form subdivisions)

Bibliography of Judaism Encyclopaedia of Judaism (Place subdivisions)

Judaism in Europe (Period subdivisions)

Judaism in the Middle Ages Judaism in the Nineteenth Century Judaism in Nineteenth century Europe (Philosophy and theory of religion)

Religious philosophy of Judaism (Sacred books) Hebrew Bible

Mediaeval Hebrew Bible (Worship) Jewish festivals (Organization of the religion)

Jewish religious law (Sacred books)

The Hebrew Bible in Jewish religious law

Anglican Church Church of England

(Phase relations) Relations with the Roman Catholic church(Period subdivisions) (Sixteenth century) Origins of the Anglican Church(Sacred books) The English Bible(Liturgical texts) The Book of Common Prayer Liturgical reform in the 20th century(Worship) (Sacraments)

Holy Communion in the Anglican traditionThe ordination of women

(Festivals)The celebration of Easter in the Anglican church

(Church organization & administration) Anglican synod(Movements and sects) The Oxford movement

11.3 Subject headingsThe concept analyses or subject strings can themselves be used as subject headings to expand the basic classification providing a hierarchical structure which can be opened up by the use of hypertext. For example the headings ‘Church of England’ or

23

Page 24: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

‘Judaism’ can be expanded to give the following lists of headings. If the density of subject headings warrant this, several levels of the hierarchy might be managed in this way.

AnglicanismChurch of EnglandChurch of England – Relations with Roman Catholic churchChurch of England – Sixteenth centuryChurch of England – Bible Church of England – Liturgical textsChurch of England – Liturgical texts – Twentieth centuryChurch of England – Sacraments – Communion Church of England – Sacraments – Ordination – WomenChurch of England – Festivals – Easter Church of England – Administrative structure – Synod Church of England – Movements – Oxford Movement

JudaismJudaism – BibliographyJudaism – EncyclopaediasJudaism – Europe Judaism – Middle ages Judaism – Nineteenth centuryJudaism – Nineteenth century – Europe Judaism – Religious philosophy Judaism – BibleJudaism – Bible – Middle Ages Judaism – Festivals Judaism – Religious lawJudaism – Religious law – Bible

These can be left in this order, to represent the systematic structure, or they can be alphabetized:

AnglicanismChurch of EnglandChurch of England – Administrative structure – Synod Church of England – England – Bible Church of England – Festivals – Easter Church of England – Liturgical textsChurch of England – Liturgical texts – Twentieth centuryChurch of England – Movements – Oxford MovementChurch of England – Sacraments – Communion Church of England – Sacraments – Ordination – WomenChurch of England – Sixteenth centuryChurch of England – Relations with Roman Catholic churchJudaismJudaism – BibleJudaism – Bible – Middle Ages

24

Page 25: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Judaism – BibliographyJudaism – EncyclopaediasJudaism – Europe Judaism – Festivals Judaism – Middle ages Judaism – Nineteenth centuryJudaism – Nineteenth century – Europe Judaism – Religious lawJudaism – Religious law – Bible Judaism – Religious philosophy

Because the structure is visibly logical and predictable, retrieval by browsing is made much easier.

Another example of a faceted structure with hierarchies used in this way can be found in the UK Government Category List, currently under development by Stella Dextre Clarke, a draft of which is available for comment (37). The GCL is a high level taxonomy, which is essentially a tool for browsing by the enquirer, and which aims to make usability a major feature. The amount of visible text is deliberately limited to maintain clarity, and the searcher proceeds through the structure step by step.

This seems a workable means of access for the average searcher. More complex means of organizing the front end of a gateway have been explored in recent years, going back to the late 1980s and Elizabeth Duncan’s work on concept maps (38) which uses facet analysis to create a two-dimensional approach to subject structure. Different aspects of a central subject domain are displayed in a graphical style, and each icon may be opened up hypertextually, to lead either to text, or to a further expansion. Similar work has been done more recently in the United States by Utta Priss (39) in the investigation of semantic maps and the application of facet techniques to aid the graphical display of subject structure. Nevertheless the nature of digital objects can be very complex, and it is not altogether clear how a two-dimensional map can accommodate n-dimensionality as encountered in digital document description.

This complexity of subject description is a core issue with the management of the resources in the Humanities portal. Not only must the semantic content of the object be described accurately, there is also a need to identify other attributes of the document. It is expected that in addition to the subject content of the digital resource, multi-media components such as graphics, images, sound and animation will need expression, as well as properties special to the digital format, since any or all of these may be sought concepts. Since some of these individual parts may also carry (independent) semantic content, the need for a highly sophisticated form of subject description is clearly required. It is possible to imagine a site on a specific topic (say ‘environmental protection in the Amazonian rainforest and the effects of deforestation’) which contains images (perhaps of particular plant organisms, or ethnic communities, some of which may be created by known artists, photographers or designers) and sound (again of named music by particular composers); the format of these various components may also be significant, as well as the originators and designers of the site. Here the use of facet methodology comes into its own, since categories can be created for any set of attributes that require tagging or indexing;

25

Page 26: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

these categories can be accommodated within the basic classification structure and appropriate citation orders established between them. A complex structured string containing all of the required descriptors can be produced, which can then be manipulated and utilised for any of the methods of subject access described here.

11.5 Inversion and conversion to index entries

Within the directory structure derived from the expanded classification the citation strings (subject strings) created for specific documents or entities are in essence being utilised as subject headings for the directory. They can also be employed, either in authority files or in user accessible files, to give alphabetical subject access to material as an alternative to the classified sequence.

Alternatively these subject strings can also be inverted to provide alphabetical access through the medium of a browsable index; such an index will collocate the ‘distributed relatives’ dispersed in linear sequences by virtue of their subordinate position in citation order

AnglicanismBible – Church of EnglandBible – Judaism Bible – Religious law – Judaism Bibliographies – Judaism Church of EnglandCommunion – Sacraments - Church of England Encyclopaedias – Judaism Europe – Judaism Europe – Nineteenth century – Judaism Festivals – Church of EnglandFestivals – Judaism JudaismLiturgical texts - Church of England Middle Ages – Bible – Judaism Middle Ages – Judaism Nineteenth century – Judaism Oxford Movement - Church of EnglandReligious law – Judaism Religious philosophy – Judaism Roman Catholic Church – relations with - Church of EnglandSixteenth century - Church of EnglandSynod – Administrative structure – Church of EnglandTwentieth century – Liturgical texts - Church of EnglandWomen – Ordination – Sacraments - Church of England

In addition to this, the structure is very easily converted to a thesaural format for use as a controlled vocabulary source of indexing terms, keywords and subject descriptors for meta-tagging by resource originators, if this is required. Given the regularity of the structure, and the clear rules for combination, it is possible that subject strings (for use as subject headings, or index entries) could be created mechanically from sets of descriptors provided by authors.

26

Page 27: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

There are also a number of ways in which this faceted approach can improve the searching as well as the browsing function. The use of a controlled vocabulary, with or without representative notational codes, will obviously facilitate retrieval in the way described by Godert (13), where sets of descriptors (either from a controlled vocabulary or with notational encoding) are attached to a given document or object.

The faceted structure can also support searching in a more open-ended environment by providing the search software with subject taxonomies which either the searcher (if they are visible to him) or the programme if not, can exploit to modify the search; this might be by moving up the hierarchy to a superordinate term in order to broaden the search if results are poor, or down the structure to a subordinate term to narrow the search if results are too numerous.

This process can also explore syntactic relations to sophisticate the search if required. A good example of this kind of searching over a managed database is that of Pollitt’s (40) View-Based Searching. This system permits the searcher to open and examine hierarchies within a specific aspect or facet of the subject; he can then open additional windows to reveal the contents of additional facet hierarchies, and may have a number of different facets displayed on a single screen at any one time. By combination of terms from successive facets it is possible to constantly interactively modify the search query according to results. This technique allows the searcher to explore the n-dimensional space of the faceted structure in a guided fashion. As Pollitt explains, “It is important to provide maps to the user for his or her exploration. These maps may be the simple devices that group objects into simple sets …. Or more complicated structures as used in classification schemes or thesauri as advocated by the British Classification Research Group.”

12. The creation of knowledge structures and networks using facet analytical theory; conclusions

It is clear that many of the features of a knowledge structure based on facet analytical principles are relevant to a electronic context. If the methodology can be used flexibly, facet analytical theory may have much to offer in the organization of resources in an electronic environment and of digital objects themselves.

In the first instance, the rigour of analysis used, and the high level of the internal logic of the structures appear peculiarly appropriate to machine manipulation, particularly when compared to traditional classification systems of a hierarchical type.

All of the knowledge management functions described above (linear ordering, index generation and conversion to a thesaurus) are desirable within an electronic context; in addition the regularity and multidimensionality of the faceted network may be especially hospitable to hypertext links. There also seems to be some potential for the incorporation of associated index and thesaural systems into search query software for the interrogation of any database associated with the faceted structure.

In terms of the building of a knowledge structure or semantic network itself, the facet analytical method may be better suited to digital materials than other existing

27

Page 28: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

systems. The range of standard categories could be extended to accommodate additional properties of digital materials, and terminology relevant to the management of those materials. Rules for combination need be less complex than in the case of a physical collection where reduction to a linear order is paramount, but are still appropriate in order to create predictable structures for the accommodation of complex objects.

The characteristics of structures or networks generated by the application of facet analytical theory can be summarised as follows:

The combination of categorization with citation order can create mathematically regular, predictable and therefore efficiently searchable structures

Such structures display both semantic and syntactic relationships, and are rigorous about the distinction between these

Such structures carry the capacity for accommodation of previously unspecified complexes. Newly occurring combinations can be processed as they arise without the need for intellectual decisions about their relationship to existing objects or classes

Application of the system syntax allows the accurate placing of new compounds and complexes in the network or structure; since the structure and its rules are inherently logical, there is reason to think this might be carried out automatically.

New individual concepts or terms can also be assimilated; provided the original analysis is accurate the logical location even of quite new elementary concepts ought to be evident

Generally all that is required for the creation of a structure is that there be an internal logic to the system; categorization and combination need to be consistent in a given context, but the precise nature of the categories and the principles of combination can vary according to the needs of the environment.

Facet analytical theory is a powerful tool for the analysis and management of vocabulary. To date it has mainly been applied to physical collections of documents and their surrogates, but it has been found to be an effective retrieval tool in a range of subject fields. The capacity which it has to create highly sophisticated structures for the accommodation of complex objects suggests that it is worth investigation as an organizational tool for digital materials, and that the results of such investigation would be knowledge structures of unparalleled utility and elegance.

References

1. Dewey, M. Abridged Decimal Classification and Relativ Index 3rd edition revised New York Forest Press 1926 p.8

2. Koch, T. The role of classification schemes in Internet resource description and

discovery; work package 3 of Telematics for Research project DESIRE (RE 1004) www.ukoln.ac.uk/metadata/desire/classification (Seen 29.08.01)

3. Universal Decimal Classification Standard Edition London; British Standards

28

Page 29: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

Institution 1993

4. Riesthuis, G. “Multilingual subject access and guidelines for the establishment and development of multilingual thesauri”. In Dynamism and stability in knowledge organization; proceedings of the sixth international ISKO conference 10-13 July 2000 Toronto, Canada edited by Clare Beghtol, Lynne C. Howarth, Nancy J. Williamson Wurtzburg: Ergon Verlag 2000.

5. Bliss, H. E. The organization of knowledge and the system of the sciences New York; H. W. Wilson 1929

6. Bliss, H. E. The organization of knowledge in libraries and the subject approach to books New York; H. W. Wilson 1933

7. Ranganathan, S. R. Prolegomena to library classification Madras; Madras Library Association 1937

8. Ranganathan, S. R. Elements of library classification Poona; N. K. Publishing House 1945

9. Dewey Decimal Classification 21st edition edited by Joan Mitchell New York; Forest Press 199?

10. Library of Congress. Cataloging Policy and Support Office. Library of Congress Classification Washington D.C.; The Library

11. Ranganathan, S. R. Colon Classification Madras; Madras Library Association 1933 (1st edition) 7th edition 1987

12. Christ’s College Cambridge Library website http://www.christs.cam.ac.uk/life/library/lcintro.shtml (Seen 29.08.01)

13. Godert, Winfried “Facet classification in online retrieval” International Classification 18(2) 1991 98-109 p.101

14. Ingwersen, Peter and Irene Wormell “ Ranganathan in the perspective of advanced information retrieval” Libri 42 (1992) 184-201 p.199

15. Coates, E. J. The British Catalogue of Music classification London; Council of the British National Bibliography 1960

16. Foskett, D. J. and Joy Foskett The London education classification; a thesaurus/classification of British educational terms 2nd edition London; London University, Institute of Education 1974 (original edition 1963)

17. Langridge, D. W. Your jazz collection Bingley 1970 includes a faceted classification for the literature of jazz pp 80-104.

18. Croghan, A. Classification of the performing arts The author 1968

29

Page 30: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

19. Broxis, P. F. “A faceted classification for arts” Chapter 3 in Organizing the arts Bingley 1970 Chapter 3

20. Daniel, Ruth and J. Mills A classification and thesaurus of library and information science London; Polytechnic of North London, School of Librarianship 1972

21. Vickery, B. C. Faceted classification; a guide to the construction and use of special schemes London; Aslib 1960

22. Vickery, B.C. Classification and indexing in science London; Butterworth 2nd edition 1975

23. Aitchison, Jean Thesaurofacet; a thesaurus and classification for engineering and related subjects English Electric Company 1969

24. Roberts, M.J. et al. Construction Industry Thesaurus Great Britain. Department of the Environment 1972

25. Mills, J. and Vanda Broughton Bliss Bibliographic Classification 2nd edition London; Butterworth and Bowker-Saur 1977-

26. Bliss, H. E. Bibliographic Classification New York; H. W. Wilson 1940-1953

27. McIlwaine, I. and Vanda Broughton “The Classification Research Group; then and now” Knowledge Organization 27(4) 2000 pp 195-199

28. Bliss Classification Association http://www/sid.cam.ac.uk/bca/bcahome.htm

29. Mills, J. and Vanda Broughton Bliss Bibliographic Classification 2nd edition Introduction and auxiliary schedules London; Butterworth 1977

30. Aitchison, Jean. and Alan Gilchrist Thesaurus construction; a practical manual 2nd edition London; Aslib 1987 [This is now in the third edition, by Jean Aitchison, Alan Gilchrist and David Bawden]

31. Aitchison, J. “A classification as a source for a thesaurus: the Bibliographic Classification of H. E. Bliss as a source of thesaurus terms and structure” Journal of Documentation 1986 42(3) pp 160-81

32. Library of Congress. Cataloging Policy and Support Office. Library of Congress Classification: H. Social Science Washington D. C.; The Library 1994 edition p. 625

33. Austin, Derek PRECIS : a manual of concept analysis and subject indexing London; British Library Bibliographic Services Division 1984

34. Arts and Humanities Data Service http://www.ahds.ac.uk (viewed 15.11.01)

35. Humbul Humanities Hub http://www.humbul.ac.uk (viewed 15.11.01)

30

Page 31: Draft paper for New review of hypermedia and multimediacourses.ischool.berkeley.edu/i202/f04/docs/Broughton faceted...  · Web viewThe library classification scheme was the first

36. Ellis, David and Ana Vasconcelos “The relevance of facet analysis for World Wide Web subject organization and searching” Journal of Internet cataloguing 2(3/4) 2000 97-114

37. UK Government Category List www.govtalk.gov.uk/interoperability/egif_document.asp?docnum=346 (viewed 15.11.01)

38. Duncan, E. B. “A concept-map thesaurus as a knowledge-based hypertext interface to a bibliographic database” in Informatics 10; prospects for intelligent retrieval edited by Kevin P. Jones London; Aslib 1990

39. Priss, U. “Comparing classification systems using facets” in Dynamism and stability in knowledge organization; proceedings of the sixth international ISKO conference 10-13 July 2000 Toronto Canada edited by Clare Beghtol, Lynne C. Howarth, Nancy J. Williamson Wurtzburg; ErgonVerlag 2000

40. Pollitt, A. S. Navigating n-dimensional information space with data and documents through view based searching Paper given at the BCS-IRSG 2nd annual colloquium 5-7th April 2000 Sidney Sussex College Cambridge http://irsg.eu.org/irsg2000outline/papers/pollitt.htm

31