thesauri usage in information retrieval systems: example of lista and eric database thesaurus...
TRANSCRIPT
Thesauri usage in information retrieval systems: example of LISTA
and ERIC database thesaurus
Kristina FeldvariDepartmant of Information Sciences, Faculty of Philosphy in
OsijekLorenza Jägera 9, Osijek, Croatia
How many of you use thesaurus to refine your retrieval?
Content
• Why thesaurus?
• Thesaurus and IR systems
• User comprehension of thesaurus
• Comparison of ERIC and LISTA database thesaurus:
1) Which functions do thesauri support?
2) How are thesauri displayed in databases?
• Conclusion
Why thesaurus?
• Lack of quality controlmain problem
• Thesaurus is a vocabulary of key words, i.e., a standardized set of terms and phrases authorized for use in an indexing system to describe a subject area or information domain (Librarian Lexicon, 1984.)
Thesaurus and IR systems
• limits and controls the diversity of natural languages
- expression that should be used for each concept (D.Bawden, 2001.)
• guides indexing and retrieval based on controlled as well as natural language indexing (M. Lykke Nielsen, 2004.)
Applications of thesaurus in storage and retrieval
1) To serve as a term authority for indexers, so that only "acceptable" terms are employed by indexers.
2) To enable indexers quickly to find the "right" term to signify a concept in mind-"right" in the sense that the term must not only connote the proper concept but also must be appropriately specific (or general) with respect to the information being indexed.
3) To serve as a means of validating the results of the indexing effort, from the viewpoint of correctness of spelling, to insure that non-preferred synonyms are not employed by indexers, and to "flag" any terms newly required by the system.
4) To enable the addition of cross-references between terms in any publication and to validate such cross-references to guarantee against circularity and ¨blindness¨ .
5) To enable appropriate formulation of queries put to either printed or computerized indexes.
6) To provide a starting point for other systems which require a vocabulary significantly similar to the one encompassed by the thesaurus at hand.
7) To encourage consistent use of terminology by authors, abstractors, and other originators of information
User comprehension of thesaurus
Three main questions:
1. Thesaurus interface design
2. Processing options
3. End-user warrant
ERIC database thesaurus:main features
• Alphabetical listing of terms
• Browsing the Thesaurus by 41 categories
• Boolean operators (basic and advanced search) to refine retrieval and possibility of truncation
ERIC database thesaurus:main features
• Seven types of cross- references are used: Scope Note (SN), Use For (UF) and Use (USE) references, Narrower Terms (NT), Broader Terms (BT), Related Terms (RT) and Parenthetical Qualifiers
• Also contains: Record Type; Category; Use Term; Add Date and Posting
LISTA database thesaurus: main features
• 3 types of displays: Term begins with, Term contains and Relevancy ranked display
• Six types of cross- references are used: ScopeNote (SN), Use For (UF) and Use (USE) references, Narrower Terms (NT), Broader Terms (BT), and Related Terms (RT)
• Boolean operators to refine retrieval and truncation
ERIC database thesaurus: advantages and deficiences
Advantages:
• Basic thesaurus introduction
• Search tips
• Help topics and tutorials, ¨users feedback¨
• Thesaurus updates list
• Category searchable
• Containing: record type, use term, added date and posting
Deficiences:
• Lack of relavance rating
• Some terminology not well explained (e.g. ¨n/a¨)
LISTA database thesaurus:advantages and deficiences
Advantages:1. Cross-reference USE (USE)
appears in all displays 2. Possibility of ¨exploding¨ the
term3. Relavance ratingDeficiences:1. Lack of thesaurus
introduction, search tips and tutorials
2. Lack of terminology explanation
3. Lack of homograph identification
Conclusion
• Information sources are growing enormously- need for more effective information retrieval
• Boolean operators and keyword searching in retrieval are not enough because of linguistic problems that can occur
• Thesaurus copes with these problems -vital retrieval tool in databases
• The main problem - limited users’
thesauri comprehension
Thank you for your attention!
Kristina FeldvariDepartmant of Information Sciences, Faculty of
Philosphy in OsijekLorenza Jägera 9, Osijek, Croatia