text information storage and retrieval and the cds/isis program *** paul nieuwenhuysen...

96
Text information storage and Text information storage and retrieval retrieval and the CDS/ISIS program and the CDS/ISIS program *** Paul NIEUWENHUYSEN Paul NIEUWENHUYSEN [email protected] [email protected] University Library, Vrije University Library, Vrije Universiteit Brussel, Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Pleinlaan 2, B-1050 Brussel, Belgium Belgium

Upload: dina-parker

Post on 16-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Text information storage and retrieval Text information storage and retrieval and the CDS/ISIS programand the CDS/ISIS program

Text information storage and retrieval Text information storage and retrieval and the CDS/ISIS programand the CDS/ISIS program

***

Paul NIEUWENHUYSEN Paul NIEUWENHUYSEN [email protected]@vub.ac.be

University Library, Vrije Universiteit Brussel, University Library, Vrije Universiteit Brussel,

Pleinlaan 2, B-1050 Brussel, BelgiumPleinlaan 2, B-1050 Brussel, Belgium

What is a What is a database?database?What is a What is a database?database?

• A database is a collection of similar data records stored in A database is a collection of similar data records stored in a common file (or collection of files).a common file (or collection of files).

***

Software type =Software type =information retrieval softwareinformation retrieval software

Software type =Software type =information retrieval softwareinformation retrieval software

• Software for Software for information storage and retrieval information storage and retrieval

(ISR software)(ISR software)

• Text(-oriented) database management systems Text(-oriented) database management systems

(Text-DBMS)(Text-DBMS)

• Text information management systemsText information management systems

(TIMS)(TIMS)

• Document retrieval systemsDocument retrieval systems

• Document management systemsDocument management systems

***

Information retrieval: Information retrieval: via a database to the uservia a database to the user

Information retrieval: Information retrieval: via a database to the uservia a database to the user

***

Informationcontent

Informationcontent

Linear file Inverted file

Search engine

Search interface UserUser

Database

Comparison

Information retrieval: Information retrieval: the basic processes in search systemsthe basic processes in search systems

Information retrieval: Information retrieval: the basic processes in search systemsthe basic processes in search systems

Information problem

Representation

Query Indexed documents

Representation

Retrieved documents

Text documents

Evaluation and

feedback

***

Information retrieval systems: Information retrieval systems: many components make up a systemmany components make up a system

Information retrieval systems: Information retrieval systems: many components make up a systemmany components make up a system

• Any retrieval system is built up of many more or less Any retrieval system is built up of many more or less independent components.independent components.

• These components can be modified to increase the quality These components can be modified to increase the quality of the results more or less independently.of the results more or less independently.

***

Information retrieval systems: Information retrieval systems: important componentsimportant components

Information retrieval systems: Information retrieval systems: important componentsimportant components

***

the information content

system to describe formal aspects of information items

system to describe the subjects of information items

concrete descriptions of information items = application of the used information description systems

information storage and retrieval computer program(s)

computer system used for retrieval

type of medium or information carrier used for distribution

Information retrieval systems: Information retrieval systems: the information contentthe information content

Information retrieval systems: Information retrieval systems: the information contentthe information content

• The information content is the information that is created The information content is the information that is created or gathered by the producer.or gathered by the producer.

• The information content is independent of software and The information content is independent of software and of distribution media.of distribution media.

• The information content is input into the retrieval system The information content is input into the retrieval system usingusing

» a system (rules) to describe the formal aspectsa system (rules) to describe the formal aspects

» a system (rules) to describe the contents a system (rules) to describe the contents (classification, thesaurus,...)(classification, thesaurus,...)

***

Information retrieval systems: Information retrieval systems: media used for distributionmedia used for distribution

Information retrieval systems: Information retrieval systems: media used for distributionmedia used for distribution

• Hard copy Hard copy (for information retrieval systems only in the broad sense)(for information retrieval systems only in the broad sense)

» PrintPrint

» MicroficheMicrofiche

• For computers: For computers: (for information retrieval systems (for information retrieval systems strictu sensustrictu sensu))

» Magnetic tapeMagnetic tape

» Floppy disk; optical disk (CD-ROM, CD-i, Photo-CD,...)Floppy disk; optical disk (CD-ROM, CD-i, Photo-CD,...)

» OnlineOnline

***

Information retrieval systems: Information retrieval systems: the computer programthe computer program

Information retrieval systems: Information retrieval systems: the computer programthe computer program

The information retrieval program consists of several The information retrieval program consists of several modules, including:modules, including:

• The module that allows the creation of the The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s).inverted file(s) = index file(s) = dictionary file(s).

• The search engine provides the search features and power The search engine provides the search features and power that allow the inverted file(s) to be searched.that allow the inverted file(s) to be searched.

• The interface between the system and the user determines The interface between the system and the user determines how they (can) interact to search the database (using how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands).menus and/or icons and/or templates and/or commands).

***

What determines the results of a What determines the results of a search in a retrieval system?search in a retrieval system?

What determines the results of a What determines the results of a search in a retrieval system?search in a retrieval system?

• the information retrieval system the information retrieval system ( = contents + system)( = contents + system)

• the user of the retrieval system the user of the retrieval system and the search strategy applied to the systemand the search strategy applied to the system

***

Result of a searchResult of a search

Characteristics / definition of Characteristics / definition of structured text-informationstructured text-information

Characteristics / definition of Characteristics / definition of structured text-informationstructured text-information

• The text information is structured.The text information is structured.(files, records, fields, sub-fields, links/relations among (files, records, fields, sub-fields, links/relations among records,...)records,...)

• The length of records and fields can be “long”.The length of records and fields can be “long”.

• Some fields are multi-valued, Some fields are multi-valued, i.e. they occur more than once.i.e. they occur more than once.

***

Layered structure Layered structure of a databaseof a database

Layered structure Layered structure of a databaseof a database

Database

File

Records

Fields

Characters

+ in many systems:relations / links

between records

***

Structure of Structure of a bibliographic filea bibliographic file

Structure of Structure of a bibliographic filea bibliographic file

Record No. 1 Title Author 1: name + first name Author 2: ... Source Descriptor 1 Descriptor 2 ...

Record No. 2

Sub-fields

Repeated fields

***

Thesaurus: Thesaurus: descriptiondescriptionThesaurus: Thesaurus: descriptiondescription

• Thesaurus = Thesaurus =

» system to control a vocabulary + system to control a vocabulary +

» the contents of this vocabularythe contents of this vocabulary

• Thesaurus program = Thesaurus program =

program to create, manage, modify and/or search a program to create, manage, modify and/or search a thesaurus using a computerthesaurus using a computer

***

Thesaurus Thesaurus relationsrelations

Thesaurus Thesaurus relationsrelations

Term(s) with broader meaning

BT (= Broader Term)

RT (Related Term) UF (= Use For)Other term(s) Term Synonym(s)

NT (= Narrower Term)

Term(s) with narrower meaning

***

Thesaurus Thesaurus applicationsapplicationsThesaurus Thesaurus applicationsapplications

• To find/choose index terms to add these to items, To find/choose index terms to add these to items, when terms are taken from a controlled vocabularywhen terms are taken from a controlled vocabulary

• To find more and/or better terms to search a database To find more and/or better terms to search a database (to increase recall and precision)(to increase recall and precision)

• To find more and/or better terms during writingTo find more and/or better terms during writing

• To understand the meaning of a term, by inspecting To understand the meaning of a term, by inspecting

» the scope note of the term and/or the scope note of the term and/or

» the relations with other termsthe relations with other terms

***

Database systems: Database systems: why study this subject briefly ?why study this subject briefly ?

Database systems: Database systems: why study this subject briefly ?why study this subject briefly ?

• To achieve a better understanding of the inner workings To achieve a better understanding of the inner workings of the external information retrieval systems that you use, of the external information retrieval systems that you use, so that you can exploit these more efficientlyso that you can exploit these more efficiently

• To be able to evaluate the quality of database systems you To be able to evaluate the quality of database systems you are confronted with, so that you canare confronted with, so that you can

» make better choices among available systems, make better choices among available systems,

» offer constructive suggestions to the manager,offer constructive suggestions to the manager,

» ......

***

Database systems: Database systems: why study this subject in detail?why study this subject in detail?

Database systems: Database systems: why study this subject in detail?why study this subject in detail?

To acquire the knowledge and skills to create / set up / To acquire the knowledge and skills to create / set up / manage your own local database system on a computermanage your own local database system on a computer

**-

Database systems: Database systems: definitiondefinition

Database systems: Database systems: definitiondefinition

A database (management) system is a program or set of A database (management) system is a program or set of programs, providing a means by which a user can easily programs, providing a means by which a user can easily store and retrieve data in the form of “databases”.store and retrieve data in the form of “databases”.

***

Information retrieval software: Information retrieval software: related termsrelated terms

Information retrieval software: Information retrieval software: related termsrelated terms

• Software for Software for information storage and retrieval information storage and retrieval

(ISR software)(ISR software)

• Text(-oriented) database management systems Text(-oriented) database management systems

(Text-DBMS)(Text-DBMS)

• Text information management systemsText information management systems

(TIMS)(TIMS)

• Document retrieval systemsDocument retrieval systems

• Document management systemsDocument management systems

**-

Information retrieval software: Information retrieval software: applications (Part 1)applications (Part 1)

Information retrieval software: Information retrieval software: applications (Part 1)applications (Part 1)

Documents

Archived documents

Books / Documents

Objects / Books / ...

Patient’s histories

Clients / Potential clients

Courses / Teachers

Publications / ...

• Documentation centresDocumentation centres

• ArchivesArchives

• LibrariesLibraries

• MuseaMusea

• Medical filesMedical files

• Marketing departmentsMarketing departments

• SchoolsSchools

• Bibliographic databasesBibliographic databases

**-

Information retrieval software: Information retrieval software: applications (Part 2)applications (Part 2)

Information retrieval software: Information retrieval software: applications (Part 2)applications (Part 2)

• Meeting calendarsMeeting calendars

• Product informationProduct information

• LaboratoriesLaboratories

• Personal documentationPersonal documentation

• Patent officePatent office

• Co-operating Co-operating information networksinformation networks

• ......

Meetings = conferences

Product descriptions

Recipes

Documents

Patents

Documents / Persons / Institutes / Events / ...

**-

Cataloguing: Cataloguing: hard copy versus computer-based hard copy versus computer-based

Cataloguing: Cataloguing: hard copy versus computer-based hard copy versus computer-based

• Hard copyHard copy

» ““Input” , i.e. cataloguing, on cards determines directly the Input” , i.e. cataloguing, on cards determines directly the “ouput”, i.e. the format of the data on the card as presented “ouput”, i.e. the format of the data on the card as presented to the userto the user

» Summarized: INPUT=OUTPUTSummarized: INPUT=OUTPUT

• Computer-basedComputer-based

» Input in the database in fields allows later output in Input in the database in fields allows later output in various formats for presentationvarious formats for presentation

» Summarized: 1. INPUT, 2. various OUTPUTsSummarized: 1. INPUT, 2. various OUTPUTs

**-

Text-information management Text-information management systems: characteristics and definition systems: characteristics and definition

Text-information management Text-information management systems: characteristics and definition systems: characteristics and definition

The information in the database is text oriented.The information in the database is text oriented.Therefore, several features are required:Therefore, several features are required:

» ability to store relatively long blocks of textsability to store relatively long blocks of texts

» ability to retrieve items in which specific words or terms ability to retrieve items in which specific words or terms occur anywhereoccur anywhere

***

Text-information management: Text-information management: from free-form to structure from free-form to structure

Text-information management: Text-information management: from free-form to structure from free-form to structure

Free form text information without structureFree form text information without structure

Text database with information structured Text database with information structured in files, records, fields, sub-fields, in files, records, fields, sub-fields,

with links/relations among records,...with links/relations among records,...(Ideally, each fields is repeatable (Ideally, each fields is repeatable

= can be multi-valued, = can be multi-valued, = can occur more than once in each record.)= can occur more than once in each record.)

**-

Text-information management: Text-information management: types of software types of software

Text-information management: Text-information management: types of software types of software

Software typeSoftware type

• Word processing softwareWord processing software

• Free-form or structured Free-form or structured text information database text information database softwaresoftware

***

FeaturesFeatures

• Must be learnt anyway.Must be learnt anyway.Slow sequential searching.Slow sequential searching.

• Additional software to be Additional software to be purchased and learnt.purchased and learnt.Fast searching via Fast searching via index(es).index(es).

Advantages of structuredAdvantages of structuredtext-retrieval versus X-base systemstext-retrieval versus X-base systems

Advantages of structuredAdvantages of structuredtext-retrieval versus X-base systemstext-retrieval versus X-base systems

FeatureFeature

• Many long fields, forming long recordsMany long fields, forming long records

• Repeatable fieldsRepeatable fields

• SubfieldsSubfields

• Variable field lengthsVariable field lengths

• Fast searching any word in all fieldsFast searching any word in all fields

• Thesaurus to help searchingThesaurus to help searching

Text-retrieval

Yes

Yes

Yes

Yes

Yes

Yes

X-base systems

No

No

No

No

No

No

**-

Hierarchy Hierarchy in the use of a databasein the use of a database

Hierarchy Hierarchy in the use of a databasein the use of a database

Database structure

Input / Editing

Searching / Output

***

Functions of Functions of database management softwaredatabase management software

Functions of Functions of database management softwaredatabase management software

• Input / edit using keyboard or batch inputInput / edit using keyboard or batch input

• Indexing of the database(s) Indexing of the database(s)

• Browse / Search / Select / Retrieve data from databaseBrowse / Search / Select / Retrieve data from database

• Output Output (Sort / Display / Print to file / Print to paper) (Sort / Display / Print to file / Print to paper)

++

• Export / ImportExport / Import

***

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Which advantages offers a document management system

on computer?

Which advantages offers a document management system

on computer?

***

Advantages of a document system on Advantages of a document system on computer, for the user(s)computer, for the user(s)

Advantages of a document system on Advantages of a document system on computer, for the user(s)computer, for the user(s)

Access to information is easier.Access to information is easier.

Access to information is faster.Access to information is faster.

Online access is possible even when centre is closed.Online access is possible even when centre is closed.

Online access is possible from a distance.Online access is possible from a distance.

Integration in search module with data on loan status.Integration in search module with data on loan status.

More elements of the records can serve as search term.More elements of the records can serve as search term.

Combinations of search terms can be used.Combinations of search terms can be used.

Results /selections can be stored as computer files.Results /selections can be stored as computer files.

***

The CDS/ISIS text database The CDS/ISIS text database management programmanagement program

The CDS/ISIS text database The CDS/ISIS text database management programmanagement program

• Software to create and manage local, in-house databases Software to create and manage local, in-house databases with primarily structured text as contents with primarily structured text as contents (NOT numbers, graphics, sound,...)(NOT numbers, graphics, sound,...)

• Versions available forVersions available for

» MainframesMainframes (IBM)(IBM)

» Minicomputers Minicomputers (Digital VAX)(Digital VAX)

» Microcomputers Microcomputers (DOS )(DOS )

**-

Micro-CDS/ISIS: original main menu Micro-CDS/ISIS: original main menu on the displayon the display

Micro-CDS/ISIS: original main menu Micro-CDS/ISIS: original main menu on the displayon the display

*--

CDS/ISIS database definition CDS/ISIS database definition services: display menuservices: display menu

CDS/ISIS database definition CDS/ISIS database definition services: display menuservices: display menu

*--

CDS/ISIS database definition table: CDS/ISIS database definition table: display of an exampledisplay of an example

CDS/ISIS database definition table: CDS/ISIS database definition table: display of an exampledisplay of an example

*--

CDS/ISIS manual data entry, CDS/ISIS manual data entry, editing / input services: display menuediting / input services: display menu

CDS/ISIS manual data entry, CDS/ISIS manual data entry, editing / input services: display menuediting / input services: display menu

*--

Batch input / Batch input / ImportImport

Batch input / Batch input / ImportImport

• Is batch input possible?Is batch input possible?

• Is a format conversion program included or available?Is a format conversion program included or available?

• ......

**-

Activities related to Activities related to indexingindexing

Activities related to Activities related to indexingindexing

Activity•Intellectual, human indexing•Develop an automatic indexing method•Automatic indexing

Who does it?Database producer / Thesaurus producer

Database producer / Software features

Computer with program

Concrete actionAttribute subject terms to recordsMaking an index method file

Making inverted file(s)

**-

Indexes in books and databases: Indexes in books and databases: a comparisona comparison

Indexes in books and databases: Indexes in books and databases: a comparisona comparison

Invisible

PrintedIndex_term_1 page x1, y1, z1,...Index_term_2 page x2, y2, z2,......

Index_term_1 record nr. x1 / field type nr. x1 / field occurrence x1 / position x1record nr. y1 / field type nr. y1 / field occurrence x1 / position y1...

Index_term_2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2record nr. x2 / field type nr. x2 / field occurrence x2 / position x2...

...

BookBook

DatabaseDatabase

**-

Index in a text retrieval system Index in a text retrieval system (such as CDS/ISIS)(such as CDS/ISIS)

Index in a text retrieval system Index in a text retrieval system (such as CDS/ISIS)(such as CDS/ISIS)

Terminology: Index = Inverted file = Dictionary

database dictionary on display

database complete inverted file

**-

Methods of Methods of inverted file creationinverted file creation

Methods of Methods of inverted file creationinverted file creation

Word indexingSimple / automatic / no indication requiredLoss of word contextA field structure is not required Phrase indexingIndication of phrases during input is requiredRicher than separate wordsA field structure is not required Field indexingSimple / automatic / no indication requiredContext is better preservedA field structure is required

**-

CDS/ISIS inverted file services: CDS/ISIS inverted file services: display menudisplay menu

CDS/ISIS inverted file services: CDS/ISIS inverted file services: display menudisplay menu

*--

Automatic indexing Automatic indexing (file inversion)(file inversion)

Automatic indexing Automatic indexing (file inversion)(file inversion)

Possible?Obligatory?

**-

• Word indexing? with proximity indexing?Word indexing? with proximity indexing?• Field indexing?Field indexing?• Sub-field indexing?Sub-field indexing?• Phrase indexing?Phrase indexing?

Maximum length of index entry?Maximum length of index entry? List of stopwords available?List of stopwords available? Immediately after input or in batch? (Slow down...?)Immediately after input or in batch? (Slow down...?) Indexing speed?Indexing speed? Adding prefixes/tags possible?Adding prefixes/tags possible? Modification of indexing possible?Modification of indexing possible?

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Why can the index of a database be so large

in comparison with the size of the database?

Why can the index of a database be so large

in comparison with the size of the database?

**-

CDS/ISIS information retrieval CDS/ISIS information retrieval services: display menuservices: display menu

CDS/ISIS information retrieval CDS/ISIS information retrieval services: display menuservices: display menu

*--

CDS/ISIS information retrieval: CDS/ISIS information retrieval: example of a dictionary on the displayexample of a dictionary on the display

CDS/ISIS information retrieval: CDS/ISIS information retrieval: example of a dictionary on the displayexample of a dictionary on the display

*--

Output from a database Output from a database to various “devices”to various “devices”

Output from a database Output from a database to various “devices”to various “devices”

• to video displayto video display

• to printerto printer

• to computer file to computer file (“printing” to a file)(“printing” to a file)

**-

CDS/ISIS output (sorting and CDS/ISIS output (sorting and printing) services: display menuprinting) services: display menuCDS/ISIS output (sorting and CDS/ISIS output (sorting and

printing) services: display menuprinting) services: display menu

*--

Formatting of data Formatting of data within each record in outputwithin each record in output

Formatting of data Formatting of data within each record in outputwithin each record in output

• Independent of output device:Independent of output device:

» Determine the sequence of the fields in each record.Determine the sequence of the fields in each record.

» Omit specific fields from each record.Omit specific fields from each record.

» Add field names or tags to the fields in each record.Add field names or tags to the fields in each record.

» Indicate the search term(s) in each record.Indicate the search term(s) in each record.

• Dependent of output device:Dependent of output device:

» Specify character formats in each (sub)field: Specify character formats in each (sub)field:

typefacetypeface + + size size + + boldbold/italic//italic/underlineunderline

**-

Sorting / arranging of records Sorting / arranging of records in the whole outputin the whole output

Sorting / arranging of records Sorting / arranging of records in the whole outputin the whole output

• Can the user determine the sequence of the records?Can the user determine the sequence of the records?

• Which elements can be used as a basis for sorting?Which elements can be used as a basis for sorting?

• Can stopwords be omitted as a basis for sorting?Can stopwords be omitted as a basis for sorting?

• What is the maximum number of sort levels?What is the maximum number of sort levels?

• Can the user choose between ascending or descending Can the user choose between ascending or descending order?order?

• Can duplicate records be eliminated? (If yes: Can the Can duplicate records be eliminated? (If yes: Can the user determine the meaning of duplicate?)user determine the meaning of duplicate?)

• Can output formats (styles) be stored?Can output formats (styles) be stored?

**-

Thesaurus program module: Thesaurus program module: purposepurpose

Thesaurus program module: Thesaurus program module: purposepurpose

• Does the database management program offer a Does the database management program offer a thesaurus module which allows the user to create, modify, thesaurus module which allows the user to create, modify, store, and delete relations between terms used in the store, and delete relations between terms used in the database?database?

• This is mainly used to establish relations among This is mainly used to establish relations among controlled subject indexing terms. controlled subject indexing terms.

• If more than one controlled vocabulary is used, these If more than one controlled vocabulary is used, these should be managed separately.should be managed separately.

**-

Structure of a thesaurus database Structure of a thesaurus database record (Fields for “good” terms)record (Fields for “good” terms)

Structure of a thesaurus database Structure of a thesaurus database record (Fields for “good” terms)record (Fields for “good” terms)

• ““Good” termGood” term

• Controlled vocabulary to which the term belongs Controlled vocabulary to which the term belongs (if more than 1 is used in the same database)(if more than 1 is used in the same database)

• Scope note (= definition of the controlled term)Scope note (= definition of the controlled term)

• Date of creation or modification of the termDate of creation or modification of the term

• NotesNotes

**-

Structure of a thesaurus database Structure of a thesaurus database record (Fields for relations)record (Fields for relations)

Structure of a thesaurus database Structure of a thesaurus database record (Fields for relations)record (Fields for relations)

• BT (= broader term) BT (= broader term) term(s) with broader meaningterm(s) with broader meaning

• TT (= top term) TT (= top term) term highest in the hierarchyterm highest in the hierarchy

• NT (= narrower term) NT (= narrower term) term(s) with narrower meaningterm(s) with narrower meaning

• RT (= related term) RT (= related term) other term(s) related to this oneother term(s) related to this one

• UF (= use for) UF (= use for) synonym(s)synonym(s)

**-

Structure of a thesaurus database Structure of a thesaurus database record (Fields for forbidden terms)record (Fields for forbidden terms)Structure of a thesaurus database Structure of a thesaurus database

record (Fields for forbidden terms)record (Fields for forbidden terms)

• Forbidden termForbidden term

• US (= use instead) US (= use instead) “good” term in the controlled “good” term in the controlled vocabularyvocabulary

**-

Structure of a thesaurus database Structure of a thesaurus database record (Fields for candidate terms)record (Fields for candidate terms)Structure of a thesaurus database Structure of a thesaurus database

record (Fields for candidate terms)record (Fields for candidate terms)

• Candidate “good” term in the controlled vocabularyCandidate “good” term in the controlled vocabulary

• (Other fields as in the case of “good” terms)(Other fields as in the case of “good” terms)

**-

Structure of a multilingual thesaurus Structure of a multilingual thesaurus database recorddatabase record

Structure of a multilingual thesaurus Structure of a multilingual thesaurus database recorddatabase record

Each type of field in a thesaurus record occurs for each Each type of field in a thesaurus record occurs for each language.language.

**-

Thesaurus program: Thesaurus program: desirable properties (Part 1)desirable properties (Part 1)

Thesaurus program: Thesaurus program: desirable properties (Part 1)desirable properties (Part 1)

• Multilingual user interface Multilingual user interface = menus and messages in more than 1 language= menus and messages in more than 1 language

• Multilingual contents = terms in more than 1 languageMultilingual contents = terms in more than 1 language

• When a term in the thesaurus database is added, changed When a term in the thesaurus database is added, changed or deleted, the program automatically makes the or deleted, the program automatically makes the corresponding changes throughout the whole thesaurus corresponding changes throughout the whole thesaurus database, there where that term occursdatabase, there where that term occurs

• The program controls the creation of The program controls the creation of impossible (= forbidden) or undesirable relationsimpossible (= forbidden) or undesirable relations

**-

Thesaurus program: Thesaurus program: desirable properties (Part 2)desirable properties (Part 2)

Thesaurus program: Thesaurus program: desirable properties (Part 2)desirable properties (Part 2)

• Can the thesaurus contents be formatted and printed or Can the thesaurus contents be formatted and printed or sent to file?sent to file?

• Can more than 1 thesaurus be managed, linked to the Can more than 1 thesaurus be managed, linked to the same database?same database?

• Can a thesaurus database can be used with more than 1 Can a thesaurus database can be used with more than 1 primary database?primary database?

• Can the program signal the presence of orphan terms Can the program signal the presence of orphan terms (= terms without relation)?(= terms without relation)?

**-

Thesaurus program: integration with Thesaurus program: integration with input/editing of the primary databaseinput/editing of the primary databaseThesaurus program: integration with Thesaurus program: integration with input/editing of the primary databaseinput/editing of the primary database

How simply and quickly can the user How simply and quickly can the user

» search the thesaurus during manual input/editing? search the thesaurus during manual input/editing? (for instance to use it as an authority list)(for instance to use it as an authority list)

» copy a term from a thesaurus and paste into a database copy a term from a thesaurus and paste into a database record?record?

» copy a term from the database and paste into a thesaurus?copy a term from the database and paste into a thesaurus?

» ......

**-

Thesaurus program: integration with Thesaurus program: integration with searching of the primary databasesearching of the primary database

Thesaurus program: integration with Thesaurus program: integration with searching of the primary databasesearching of the primary database

• Can the user browse the thesaurus during a search in the Can the user browse the thesaurus during a search in the database?database?

• Can the program automatically formulate a query, when Can the program automatically formulate a query, when the user selects terms in the thesaurus module?the user selects terms in the thesaurus module?

• Does the program allow to include easily and quickly Does the program allow to include easily and quickly synonyms, narrower terms and broader terms in a synonyms, narrower terms and broader terms in a query?query?

• ......

**-

Automatic creation, deletion or Automatic creation, deletion or adaptation of the reciprocal relationadaptation of the reciprocal relation

Automatic creation, deletion or Automatic creation, deletion or adaptation of the reciprocal relationadaptation of the reciprocal relation

Does a change by the user of a relation in one record cause Does a change by the user of a relation in one record cause an automatic change by the thesaurus program of the an automatic change by the thesaurus program of the reciprocal relation in the corresponding record of the reciprocal relation in the corresponding record of the thesaurus database? Examples:thesaurus database? Examples:

» change of BT changes NT in the corresponding recordchange of BT changes NT in the corresponding record

» change of NT changes BT in the corresponding recordchange of NT changes BT in the corresponding record

» change of RT changes RT in the corresponding recordchange of RT changes RT in the corresponding record

» change of UF changes US in the corresponding recordchange of UF changes US in the corresponding record

» change of US changes UF in the corresponding recordchange of US changes UF in the corresponding record

**-

Automatic control of the creation of Automatic control of the creation of impossible or undesirable relationsimpossible or undesirable relationsAutomatic control of the creation of Automatic control of the creation of impossible or undesirable relationsimpossible or undesirable relations

Does the thesaurus program avoid the creation of Does the thesaurus program avoid the creation of impossible or undesirable relations, or does it warn the impossible or undesirable relations, or does it warn the user? Examples of this kind of relations:user? Examples of this kind of relations:

» circular hierarchy (a NT b, b NT c, c NT a, or longer)circular hierarchy (a NT b, b NT c, c NT a, or longer)

» circular synonym relation (a UF b, b UF a)circular synonym relation (a UF b, b UF a)

» iterative synonym relations (a US b, b US c, or longer)iterative synonym relations (a US b, b US c, or longer)

» incomplete relations (a RT b, while b does not exist)incomplete relations (a RT b, while b does not exist)

» term related to itself (for instance: a NT a)term related to itself (for instance: a NT a)

» ......

**-

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: propertiesfor CDS/ISIS: properties

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: propertiesfor CDS/ISIS: properties• It is an additional program in CDS/ISIS Pascal languageIt is an additional program in CDS/ISIS Pascal language

• Usage is free of charge, as in the case of CDS/ISISUsage is free of charge, as in the case of CDS/ISIS

• Thesaurus database management is based on CDS/ISISThesaurus database management is based on CDS/ISIS

• The thesaurus program, as well as CDS/ISIS, offers a The thesaurus program, as well as CDS/ISIS, offers a user interface in English, French, and Spanishuser interface in English, French, and Spanish

• The contents of a thesaurus database is trilingual : The contents of a thesaurus database is trilingual : each term in English, French, and Spanish each term in English, French, and Spanish (each one replaceable by another language)(each one replaceable by another language)

*--

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: the relations among termsCDS/ISIS: the relations among terms

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: the relations among termsCDS/ISIS: the relations among terms

• The available relations are: US, UF, NT, BT, TT, RTThe available relations are: US, UF, NT, BT, TT, RT

• Unlimited number of occurrences for each type of Unlimited number of occurrences for each type of relations in each recordrelations in each record

• After a change of a relation, the program automatically After a change of a relation, the program automatically adapts the corresponding relation in the corresponding adapts the corresponding relation in the corresponding thesaurus term records thesaurus term records

*--

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: control of relationsCDS/ISIS: control of relations

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: control of relationsCDS/ISIS: control of relations

The program avoids the creation of some impossible or The program avoids the creation of some impossible or undesirable relations:undesirable relations:

» circular synonym relation circular synonym relation (a UF b, b UF a)(a UF b, b UF a)

» iterative synonym relations iterative synonym relations (a US b, b US c, or longer)(a US b, b US c, or longer)

» incomplete relations incomplete relations (a RT b, while b does not exist)(a RT b, while b does not exist)

*--

Trilingual thesaurus for CDS/ISIS: Trilingual thesaurus for CDS/ISIS: integration with searchingintegration with searching

Trilingual thesaurus for CDS/ISIS: Trilingual thesaurus for CDS/ISIS: integration with searchingintegration with searching

• The user can browse the thesaurus during a search in the The user can browse the thesaurus during a search in the primary database.primary database.

• The program automatically formulates a query in the The program automatically formulates a query in the primary database, when the user selects terms in the primary database, when the user selects terms in the thesaurus module.thesaurus module.

• The program allows to include easily and quickly The program allows to include easily and quickly synonyms, narrower terms and broader terms in a query.synonyms, narrower terms and broader terms in a query.

• The thesaurus database can be used for searching with The thesaurus database can be used for searching with more than 1 primary database.more than 1 primary database.

*--

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: further propertiesfor CDS/ISIS: further properties

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: further propertiesfor CDS/ISIS: further properties• In each record describing a term, a field for a scope note In each record describing a term, a field for a scope note

is present.is present.

• A field for date of term creation is present.A field for date of term creation is present.

• Several printout formats are included.Several printout formats are included.

*--

How to obtain the trilingual thesaurus How to obtain the trilingual thesaurus program for CDS/ISIS?program for CDS/ISIS?

How to obtain the trilingual thesaurus How to obtain the trilingual thesaurus program for CDS/ISIS?program for CDS/ISIS?

• the national distributor in your countrythe national distributor in your country

• UNESCO Headquarters, UNESCO Headquarters, General Information Programme, 1 rue Miollis, Paris, General Information Programme, 1 rue Miollis, Paris, FranceFrance

• ......

*--

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: conclusionsfor CDS/ISIS: conclusions

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: conclusionsfor CDS/ISIS: conclusions- - NegativeNegative::

Not well integrated with the input/editing module of Not well integrated with the input/editing module of CDS/ISISCDS/ISIS

+ + PositivePositive::Exceptionally interesting price/quality ratioExceptionally interesting price/quality ratio

*--

Security / privacy / protectionSecurity / privacy / protectionof databasesof databases

Security / privacy / protectionSecurity / privacy / protectionof databasesof databases

• Password for searching Password for searching

specific database(s) and / or fields and / or recordspecific database(s) and / or fields and / or record

• Password for editing Password for editing

specific database(s) and / or fields and / or recordsspecific database(s) and / or fields and / or records

• Password for changing Password for changing

» database structuredatabase structure

» input and modification work sheetsinput and modification work sheets

» sort and print formats of data in recordssort and print formats of data in records

» sort and print formats of records in a selectionsort and print formats of records in a selection

**-

Security / privacy / protectionSecurity / privacy / protectionprovided by DOSprovided by DOS

Security / privacy / protectionSecurity / privacy / protectionprovided by DOSprovided by DOS

DOS can make filesDOS can make files

» read-onlyread-only

» hiddenhidden

*--

Security / privacy / protectionSecurity / privacy / protectionin CDS/ISISin CDS/ISIS

Security / privacy / protectionSecurity / privacy / protectionin CDS/ISISin CDS/ISIS

• SYSPAR.PAR file (entry 0) asks for a password, SYSPAR.PAR file (entry 0) asks for a password, which can limit access to a particular which can limit access to a particular

» database database

» set of worksheetsset of worksheets

» set of menusset of menus

» set of additional CDS/ISIS programsset of additional CDS/ISIS programs

• Using the read-only version, named ISISCD.EXE, Using the read-only version, named ISISCD.EXE, prevents modifications.prevents modifications.

• Menus can be changed or removed to prevent access.Menus can be changed or removed to prevent access.

*--

Passwords and Passwords and usage trackingusage trackingPasswords and Passwords and usage trackingusage tracking

• Does the use of passwords linked to users or user groups Does the use of passwords linked to users or user groups allow usage tracking by a systems manager?allow usage tracking by a systems manager?

““Usage” = for instance, number and types of search and/or Usage” = for instance, number and types of search and/or edit actions.edit actions.

• This can be useful for studies and system management.This can be useful for studies and system management.

**-

Data export Data export in the case of CDS/ISISin the case of CDS/ISIS

Data export Data export in the case of CDS/ISISin the case of CDS/ISIS

CDS/ISISDatabase Contents

Database structure

Other CDS/ISISuser with same databasestructure

Other CDS/ISISuser with same databasestructure

“Export”of data

OtherCDS/ISISuser withoutdatabase

OtherCDS/ISISuser withoutdatabase

Otherdatabasemanagementsystem

Otherdatabasemanagementsystem

“Print” data to file

Copy of all database files

*--

Manual versus batch import Manual versus batch import of data in a database of data in a database

Manual versus batch import Manual versus batch import of data in a database of data in a database

Information itemsInformation items

Manualinput

Batch input

**-

Conversion and batch input Conversion and batch input in the case of a CDS/ISIS databasein the case of a CDS/ISIS database

Conversion and batch input Conversion and batch input in the case of a CDS/ISIS databasein the case of a CDS/ISIS database

File with database records in ASCII with field tags

Fangorn program + Conversion specification file

File with records in format of the CDS/ISIS database

Import module in CDS/ISIS

Records in the CDS/ISIS database

*--

Format conversion programFormat conversion programFangornFangorn

Format conversion programFormat conversion programFangornFangorn

• Authors: Authors: Besemer and NieuwenhuysenBesemer and Nieuwenhuysen

• Available via anonymous ftp fromAvailable via anonymous ftp from

» PCWS1.SCI.SNS.ITPCWS1.SCI.SNS.IT

» ftp.vub.ac.be in the directory ftp.vub.ac.be in the directory \pub\projects\Docinfo\paul\cursus\isis\\pub\projects\Docinfo\paul\cursus\isis\

» ……

*--

Specification of a format conversion Specification of a format conversion in the case of Fangorn for CDS/ISISin the case of Fangorn for CDS/ISISSpecification of a format conversion Specification of a format conversion in the case of Fangorn for CDS/ISISin the case of Fangorn for CDS/ISIS

*--

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Which software packages for storage and retrieval of structured text

do YOU know?

Which software packages for storage and retrieval of structured text

do YOU know?

**-

Microcomputers software packages for Microcomputers software packages for for structured text retrieval: examplesfor structured text retrieval: examples

Microcomputers software packages for Microcomputers software packages for for structured text retrieval: examplesfor structured text retrieval: examples

**-Examples

• askSamaskSam

• Bib-SearchBib-Search

• CAIRSCAIRS

• Cardbox-PlusCardbox-Plus

• CDS / ISISCDS / ISIS

• HeadfastHeadfast

• IdeaListIdeaList

• InmagicInmagic

• Notes (Lotus / IBM)Notes (Lotus / IBM)

• Personal Librarian Personal Librarian

• Pro-CitePro-Cite

• Reference ManagerReference Manager

• StrixStrix

• STATUSSTATUS

• Topic (Verity)Topic (Verity)

• ......

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

How can you use a word processing program

together with a text retrieval system?

How can you use a word processing program

together with a text retrieval system?

**-

Word processing program Word processing program to assist a retrieval programto assist a retrieval programWord processing program Word processing program

to assist a retrieval programto assist a retrieval program

To polish text data before import in the database managed by the retrieval program

To inspect output to printer before real printing

To accept output from the retrieval program for further and better formatting, followed by printing

**-

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Which benefits offers a field structure

to databases?

Which benefits offers a field structure

to databases?

**-

Field structure in records: Field structure in records: benefits concerning inputbenefits concerning input

Field structure in records: Field structure in records: benefits concerning inputbenefits concerning input

• The indication of fields in input worksheets guides the The indication of fields in input worksheets guides the input.input.

• Default values can be assigned to fields which can avoid Default values can be assigned to fields which can avoid errors and can make input faster.errors and can make input faster.

• The existence of fields allows control of the contents The existence of fields allows control of the contents format of each specific field during input.format of each specific field during input.

• ......

**-

Field structure in records: Field structure in records: benefits concerning searchingbenefits concerning searching

Field structure in records: Field structure in records: benefits concerning searchingbenefits concerning searching

• User can limit search to specific fields.User can limit search to specific fields.

• Field type adds information to contents.Field type adds information to contents.

• Field-indexing keeps data together in index.Field-indexing keeps data together in index.

• ......

**-

Field structure in records: Field structure in records: benefits concerning outputbenefits concerning outputField structure in records: Field structure in records: benefits concerning outputbenefits concerning output

• Field structure makes output easier to understand.Field structure makes output easier to understand.

• In output, each field can be indicated with tag/prefix.In output, each field can be indicated with tag/prefix.

• Records can be sorted based on contents of a field.Records can be sorted based on contents of a field.

• In output, the fields can be sorted in each record.In output, the fields can be sorted in each record.

• In output, some fields can be omitted.In output, some fields can be omitted.

• ......

**-

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Besides all the benefits offered by a field structure in a database,

which problems does this cause?

Besides all the benefits offered by a field structure in a database,

which problems does this cause?

**-

Field structure in records: Field structure in records: problems problems (Part 1)(Part 1)

Field structure in records: Field structure in records: problems problems (Part 1)(Part 1)

• In the short term, it is more expensive and time In the short term, it is more expensive and time consuming, than handling less structured data.consuming, than handling less structured data.

• Initially, the database manager who wants to create a new Initially, the database manager who wants to create a new database has to make decisions: database has to make decisions:

» which fields to create to subdivide the database records, which fields to create to subdivide the database records,

» which field tags or names to use for the internal which field tags or names to use for the internal housekeeping of the database by the chosen database housekeeping of the database by the chosen database management software package.management software package.

**-

Field structure in records: Field structure in records: problems problems (Part 2)(Part 2)

Field structure in records: Field structure in records: problems problems (Part 2)(Part 2)

• The exchange of data, i.e. importing data in a database, The exchange of data, i.e. importing data in a database, which have been exported from another database, is which have been exported from another database, is hindered when the databases structures are not identical hindered when the databases structures are not identical or compatible.or compatible.

• ......

**-

Exchange formats and standards Exchange formats and standards for text database systemsfor text database systems

Exchange formats and standards Exchange formats and standards for text database systemsfor text database systems

• Usage and aims:Usage and aims:

» to allow efficient exchange of information among to allow efficient exchange of information among databases without loss of structural informationdatabases without loss of structural information

» to guide database managers in the creation of a database to guide database managers in the creation of a database structure (records divided in fields and subfields)structure (records divided in fields and subfields)

• Examples: (MARC = machine readable catalogue)Examples: (MARC = machine readable catalogue)

» LC-MARC (=Library of Congress MARC); UNIMARCLC-MARC (=Library of Congress MARC); UNIMARC

» Common Communication Format (of UNESCO)Common Communication Format (of UNESCO)

» SGMLSGML

***

Common Communication Format Common Communication Format (CCF): description(CCF): description

Common Communication Format Common Communication Format (CCF): description(CCF): description

• Developed by the Developed by the Unesco - General Information Programme Unesco - General Information Programme for international applicationfor international application

• Includes a system of numeric tags indicating Includes a system of numeric tags indicating

» the location of fields and subfields in the recordsthe location of fields and subfields in the records

» the meaning of the fields and subfieldsthe meaning of the fields and subfields

**-

Common Communication Format Common Communication Format (CCF): availability(CCF): availability

Common Communication Format Common Communication Format (CCF): availability(CCF): availability

Published and made available free of charge by the Published and made available free of charge by the Unesco - General Information ProgrammeUnesco - General Information Programme

» Printed manualsPrinted manuals

» Printed implementation notesPrinted implementation notes

» Example CDS/ISIS database structured according to the Example CDS/ISIS database structured according to the Common Communication FormatCommon Communication Format

**-

Exchange of data among systems: Exchange of data among systems: requirementsrequirements

Exchange of data among systems: Exchange of data among systems: requirementsrequirements

• Subject thesaurus (relation-structure + contents)Subject thesaurus (relation-structure + contents)

• Subject classification scheme + level of usageSubject classification scheme + level of usage

• Contents of fields (and subfields) in the records (in the Contents of fields (and subfields) in the records (in the case of bibliographic databases: cataloguing input rules)case of bibliographic databases: cataloguing input rules)

• Database structure: records, fields, subfields,... Database structure: records, fields, subfields,... as seen by the database manageras seen by the database manager

• Version of the program for database managementVersion of the program for database management

• Type of program for database managementType of program for database management

• Alphabet used for the dataAlphabet used for the data

**-

Compatibility among databases: Compatibility among databases: an examplean example

Compatibility among databases: Compatibility among databases: an examplean example

• Library of Congress Subject Headings (LCSH) Library of Congress Subject Headings (LCSH) (a thesaurus)(a thesaurus)

• Universal Decimal Classification (UDC)Universal Decimal Classification (UDC)

• Anglo American Cataloguing Rules (AACR)Anglo American Cataloguing Rules (AACR)

• Common Communication Format (CCF)Common Communication Format (CCF)

• Version 3.0Version 3.0

• CDS/ISIS programCDS/ISIS program

• Extension of ASCII by IBMExtension of ASCII by IBM

ISO standard

for record

storage !

ISO standard

for record

storage !

**-Example