part 1 : what is a thesaurus ? concept and samples · cape town, june 2006 part 1 : what is a...
TRANSCRIPT
Cape Town, June 2006
PART 1 : What is a thesaurus ?Concept and samples
Christine Laaboudi-SpoidenPublications Office of the European Communities
EUR-LEX Unit – Documentary section
Cape Town, June 2006
EUR-Lex – Searching information
EUR-LEX http://eur-lex.europa.eu/en/index.htmdirect free access to European Union law
• the treaties, legislation, case-law and legislativeproposals
– Official Journal of the European Union– Official Journal L – Legislation– Official Journal C – Information and notices– Official Journal – Special editions– European Court Reports– Documents of the institutions– Consolidated texts
Cape Town, June 2006
EUR-Lex : Searching information
COMPUTER CRIME• Title and text: computer crime 40 Hits
COMPUTER RELATED CRIME• Title and text: computer crime 58 Hits
CYBERCRIME• Title and text: cybercrime 55 Hits
CYBER CRIME• Title and text: cyber crime 48 Hits
COMPUTER CRIME, CYBERCRIME, CYBERCRIME (Boolean - OR)• Title and text: computer crime, cybercrime 129
Hits USE OF SYNONYMS OR EQUIVALENT TERMS
Cape Town, June 2006
EUR-Lex sample –Bibliographic Notice
TERMES D’INDEXATION ou DESCRIPTEURSEUROVOC DESCRIPTORSINDEXING TERMSPREFERRED TERMS
CLASSIFICATION SCHEME
SUBJECT HEADINGS
Cape Town, June 2006
Indexing process
Indexing = Identify the conceptRepresented in a document
EUROVOC descriptor:information society, computer crime, personal data,electronic mail, confidentiality
For information retrieval (information request)Title and text: computer crime, cybercrime 129 Hits
Content Indexing = only 1 process !Searching = start again if the results are not
relevant to the question.
Cape Town, June 2006
Search Results
Relevant / Relevancy = relationship between adocument and a request.
– The document is relevant to the topic– It replies to the user’s request
Pertinence = relationship between a document andan information need.• Relevant and useful for a user• Relevant but the user doesn’t find it useful
(language, level of comprehensibility, type)
Irrelevant results = NOISENon-retrieved results = SILENCE
Cape Town, June 2006
Causes of searching failures
Two words don’t mean exactly the same thing
Enormous range of choices of words and expressions
No true synonyms, although words are often close inmeaning
Words are not clearly understood
Inconsistent use of words
Users are unlikely to choose all the relevant terms
The user might choose the terms used by the indexerwith a different understanding of meaning.
Cape Town, June 2006
Need of a controlled vocabulary
A controlled vocabulary = A consistent set ofwords/expressions, along with rules of usage, tobe followed when indexing / searching
Nature of indexing languageA list of terms acceptable to usersMechanisms for structuring and using those termsMinimize the ambiguity of isolated vocabulary that
may be out of context
Cape Town, June 2006
Out of context information
What means SENSITIVE AREA ?urbanmilitaryenvironmentalsensitive epidermis …
A sensitive area protected by special measures topreserve a highly vulnerable habitat (Eurovocthesaurus)
Cape Town, June 2006
Types of Vocabulary – Authority List
Simple list or index enumerating the termsavailable for indexing a collection of documents
Author names, organization names, Countries,E.g.• Library of Congress Authorities• ISO Country Codes
Cape Town, June 2006
Vocabulary control – Classification SchemeHeading / Caption
NotationClass
Sub-classes
Upper level
Lower level
EUR-Lex directory codes
Cape Town, June 2006
Vocabulary control – Classification Scheme
Systematic arrangement of entities/concepts intoclasses (group or categories)
group of concepts whose members share a commonfeature
vertical arrangement – level of specificityWords may appear in several classes
Cape Town, June 2006
Vocabulary control – Classification Scheme
Classes are identified bya heading/captiona notation (alphabetical and/or numerical code)
• Key for arranging items in physical libraries
Expressiveness (reflects the structure of thescheme)11.60.30.20 External relations / Commercial policy /
Trade arrangements / Common import arrangements
Cape Town, June 2006
EUR – Lex Directory Codes
Numerical classification of the “Directory ofCommunity legislation in force” and is used toindex legislation and preparatory acts.http://eur-lex.europa.eu/RECH_repertoire.do
20 principal chapters, each covering a specific areaof European Union activity.
Each descriptor is composed of eight digits• (principal chapter heading and up to three
subsequent subdivisions, each represented by twodigits)
Cape Town, June 2006
EUR-Lex – Subject Headings
One to maximum 5 descriptors based on thesubject-matter list of terms
The alphabetically structured list of over 200keywords is based on the subdivisions of thetreaties and the areas of activity of the institutions.
The descriptors are less specific than those of theDirectory code but provide a general overview ofthe content of the document.
Cape Town, June 2006
Thesaurus - Definition
ISO 2788 (1984)A structured list of expressions intended to
represent in unambiguous way the conceptualcontent of the document in a documentary systemand of the queries addressed to the system.
= NOUN, NOUN PHRASE
= INDEXING PROCESS
= ONE SINGLE INTERPRETATION
Cape Town, June 2006
BSI 8723 (2006)A controlled vocabulary in which concepts are
represented by descriptors, formally organized sothat paradigmatic relationships between theconcepts are made explicit,
and the descriptors are accompanied by lead-inentries for synonyms and quasi-synonyms.
The purpose of a thesaurus is• to guide both the indexer and the searcher to select
the same descriptor or combination of descriptors torepresent a given subject.
Thesaurus - Definition
= MUTUALLY EXCLUSIVE RELATIONSHIPS
= EQUIVALENCE
= INDEXING PROCESS
Cape Town, June 2006
Eurovoc - Scope
EurovocA multilingual thesaurus (hierarchical list of terms)Multidisciplinary vocabulary
• Community and national point of view• Parliamentary activities
Definition of conceptsSamples from Eurovoc
Cape Town, June 2006
04 POLITICS04 POLITICS08 INTERNATIONAL RELATIONS08 INTERNATIONAL RELATIONS10 EUROPEAN COMMUNITIES10 EUROPEAN COMMUNITIES12 LAW12 LAW16 ECONOMICS16 ECONOMICS20 TRADE20 TRADE24 FINANCE24 FINANCE28 SOCIAL QUESTIONS28 SOCIAL QUESTIONS32 EDUCATION AND COMMUNICATIONS32 EDUCATION AND COMMUNICATIONS36 SCIENCE36 SCIENCE40 BUSINESS AND COMPETITION40 BUSINESS AND COMPETITION44 EMPLOYMENT AND WORKING CONDITIONS44 EMPLOYMENT AND WORKING CONDITIONS48 TRANSPORT48 TRANSPORT52 ENVIRONMENT52 ENVIRONMENT56 AGRICULTURE, FORESTRY AND FISHERIES56 AGRICULTURE, FORESTRY AND FISHERIES60 AGRI60 AGRI--FOODSTUFFSFOODSTUFFS64 PRODUCTION, TECHNOLOGY AND RESEARCH64 PRODUCTION, TECHNOLOGY AND RESEARCH66 ENERGY66 ENERGY68 INDUSTRY68 INDUSTRY72 GEOGRAPHY72 GEOGRAPHY76 INTERNATIONAL ORGANISATIONS76 INTERNATIONAL ORGANISATIONS
Eurovoc - Coverage21 FIELDS = HEADINGS
127 MICROTHESAURUS= CLASSES
0806 international affairs0811 cooperation policy0816 international balance0821 defence
Cape Town, June 2006
Eurovoc – Contextual information
DESCRIPTOR
MT - MICROTHESAURUS (MAIN CLASS)
UF (USED FOR) - NON-DESCRIPTORThis descriptor is USED FOR a non-descriptor
BT - BROADER TERM / GENERIC TERM
NT - NARROWER TERM / SPECIFIC TERM
RT – RELATED TERM
Cape Town, June 2006
Eurovoc – RelationshipsTOP TERM = higher in the hierarchy
SCOPE NOTE (SN) =Usage or definition note
NT1
NT3 Hierarchical relationship(MT, BT, NT)
Associative relationship(RT)
Equivalence relationship(USE, UF)
Cape Town, June 2006
Vocabulary Control – Thesaurus
The scope of a descriptor is limited to a singlemeaning (unambiguous)
• Nouns or Noun phrases• Pre-coordination of concepts
The context is provided by :• The hierarchical relationships (MT, BT, NT)• The scope note (SN)
– (state the chosen meaning or indicate other meaningsexcluded for indexing purposes)
A concept is represented by two or more synonyms• One term selected as a descriptor (indexing term)• Equivalents = non-descriptors
– (lead-in entries or references to the descriptor – USE, UF)
Cape Town, June 2006
Vocabulary control - Targets
Represents the general conceptual structure of a subjectarea and presents a guide to the user of an index
Reflects closely the literature vocabulary and the user’sown technical usage
Employs pre-coordinated phrases to reduce false dropsto minimum
• Venetian Blind
Controls synonyms and near-synonyms in order toincrease the consistency
Only one term from a list of similar terms will be used inindexing
Horizontal and vertical relationships among terms(cross-references)
Cape Town, June 2006
Classification & Thesaurus - Difference
Classification
Single preferred location (physical libraries)• Directory code:
03.60.55.00 Agriculture / Products subject to marketorganisation / Wine
• Post-coordination of concepts Eurovoc
Admits relationships as hierarchicalwine
MT 6021 beverages and sugarBT1 alcoholic beverage
BT2 beverageNT1 bottled wineNT1 champagneNT1 flavoured wineNT1 fortified wine
Cape Town, June 2006
Indexing systems - Types
Greater time and effortsCost is important
Automatic indexing
The Indexer determines thescope of the document andassigns descriptors from acontrolled vocabulary
Descriptors identify theconcepts expressed by thedocuments
Natural language or free-textindexing
Subject heading list,thesaurus, classification,taxonomy
Intellectual effort
All descriptors are taken fromthe text itself
Assigned-term systemDerived-term system
Cape Town, June 2006
PART 2 : EUROVOC THESAURUS
Christine Laaboudi-SpoidenPublications Office of the European Communities
EUR-LEX Unit – Documentary section
Cape Town, June 2006
Eurovoc 4.2 - Languages
http://europa.eu/eurovoc/http://europa.eu/eurovoc/:: Official EU LanguagesOfficial EU Languages
Acceeding countriesAcceeding countriesBGBG -- Bulgarian, ROBulgarian, RO –– RomanianRomanian
Candidate countryCandidate countryHRHR –– CroatianCroatian
Local sitesLocal sites Other languagesOther languages
Albanese, Ukranian, Russian,Albanese, Ukranian, Russian,Georgian, SerbianGeorgian, Serbian
Regional languages :Regional languages : basque, catalanbasque, catalan
LV
SVIT
FIFR
SKEN
SIET(*)
PTEL
PLDE
NLDA
HUCS
LTES
Cape Town, June 2006
Eurovoc 4.2 in figures
36363542ASSOCIATIVERELATIONSHIPS
66696510GENERIC RELATIONSHIPS
66456501DESCRIPTORS
127127MICROTHESAURI
2121DOMAINS
Eurovoc 4.2Eurovoc 4.1
Cape Town, June 2006
Eurovoc – fields most frequently used
1817
1110
97
54444
3222
1111
0 5 10 15 20
04121028081652202432447248566636406876
Fields
Number of users
76 - INTERNATIONAL ORGANISATIONS68 – INDUSTRY40 – BUSINESS AND COMPETITION36 – SCIENCE66 – ENERGY56 – AGRICULTURE, FORESTRY AND FISHERIES48 – TRANSPORT72 – GEOGRAPHY44 – EMPLOYMENT AND WORKING CONDITIONS32 – EDUCATION AND COMMUNICATIONS24 – FINANCE20 – TRADE52 – ENVIRONMENT16 – ECONOMICS08 – INTERNATIONAL RELATIONS28 – SOCIAL QUESTIONS10 – EUROPEAN COMMUNITIES12 – LAW04 – POLITICS
Cape Town, June 2006
Eurovoc – Polyhierarchical relationship
Main rule : Descriptors belong to one category (1BT, 1 MT)
Exception : Descriptors from Domains 72 & 76Field 72 : GeographyField 76 : International Organizations
Cape Town, June 2006
Eurovoc - Advantages
Multilingualism Indexation in the documentalist’s languageSearch in the user’s language
Update18 months
CooperationNational parliamentsCandidate descriptors
Normalisation ISO 2788 & 5964
Cape Town, June 2006
Eurovoc - Limits
Generic vocabulary, not specific
Don’t cover national specificities
Cape Town, June 2006
Eurovoc - Display
FormatsPrinted – paper versionWeb site http://europa.eu/eurovoc/XML Files (provided to licensees)PDF Files to download
Types of displayAlphabeticalThematic
• Alphabetical listing by field/domain
Cape Town, June 2006
Eurovoc – Thematic display
Microthesauri
Top Term / Broader Term
Specific TermsNT1 – NT2
Related TermsAlphabetical index ofdescriptors/non-descriptors
of the current field
Cape Town, June 2006
Eurovoc – Terminology of the fieldAlphabetical index of
descriptors and non-descriptors
Cape Town, June 2006
Eurovoc - History1982 :
• comparative study of the existing documentary languagesat the European Commission and the European Parliament
1984 : first edition• seven languages (DA, DE, EN, FR, EL, IT, NL)
1987 : 2nd edition• + ES, PT
1995 : 3rd edition - 1999 : 3.1 edition• + SE, FI
2002 : 4.0 edition - 2004 : 4.1 edition2005 : 4.2 edition
• 17 languages2006 : 4.3 edition
• 21 langues
Cape Town, June 2006
Eurovoc - Users
National parliamentsEuropean institutions (European Parliament,
Publications Office, Court of Justice)Private users = Eurovoc License holders (licence
Eurovoc)
Cape Town, June 2006
Eurovoc – Users
16
65
43
2 2
0
2
4
6
8
10
12
14
16
Total
NationalP arliament
NationalA dministration
EU Institutions
Consultants
Universities
Private User
Research Institutes
Cape Town, June 2006
Eurovoc – Users
1% 6% 3%
56%
14%
20%
Transla tors
Informatics
Termino loguesLingui sts
Libraria nsDocum entalis ts
Researchers
Other
Cape Town, June 2006
Eurovoc - Licenses (1)
15
25
44
05
101520253035404550
2003 2004 2005
Number of Licences
Licence s
Cape Town, June 2006
Eurovoc – Licenses (2)
14 2 3 4 4
18
33
3
0
5
10
15
20
25
30
35
Academic Commercial Translation Indexing
2004
2005
2006
Cape Town, June 2006
PART 3 : EUROVOC MAINTENANCE
Christine Laaboudi-SpoidenPublications Office of the European Communities
EUR-LEX Unit – Documentary section
Cape Town, June 2006
Eurovoc - Maintenance
2 interinstitutional committees
Maintenance committee• Commission, Council, Parliament, Court of Justice,
Court of Auditors
Steering committee• Commission, Council, Parliament, Court of Justice,
Court of Auditors
Eurovoc Maintenance TeamPublications Office
Cape Town, June 2006
Eurovoc - Steering committee
Supervises the Eurovoc project• Objectives, priorities, overall timetable• Resources and budget
Officially adopts each new version
Chair by a representative of the EuropeanParliament
Cape Town, June 2006
Eurovoc – The maintenance committee
Examines and votes on the proposals for updatingthe thesaurus
Decides on the amendments to be made
Chair by the Publications Office
Meets twice a year
Cape Town, June 2006
Eurovoc – The maintenance team
Location: Publications Office
Collects and examines the proposals made by allusers
Coordinate the work of the Maintenance Committee
Responsible for IT developments, translationmonitoring, web site
Works through a maintenance interface
Cape Town, June 2006
Eurovoc – Maintenance processThe European Parliament
– Collects, examines and filters the proposals from the nationalparliaments
The Maintenance Team– Collects the proposals made by all users (E.P, licensees,
OPOCE)– Manage the proposals through the maintenance system
The Maintenance Committee– Votes on the various proposals– Decides on the final amendments
The Maintenance Team– New descriptors and amendments are sent to the E.C
translation
The Maintenance Committee– Review the multilingual draft version
The Steering Committee– Officially adopts the new version
Cape Town, June 2006
EUROVOC – The maintenance interface
https://webgate.cec.eu.int/eurovoc/maintUsersEU Institutions : Members of the maintenance
committee, TranslatorsNational parliaments
FeaturesPropose Candidate descriptors, amendmentsTranslation moduleA dedicated layer for each user
Cape Town, June 2006
EUROVOC – Maintenance
How to propose new concepts / amendments
Eurovoc maintenance form (web site)
Email to [email protected]
CANDIDATE DESCRIPTOR
Cape Town, June 2006
EUROVOC – Maintenance
Criteria’s of acceptance / non acceptance ofcandidates descriptors
Acceptance : Creation necessary :
• European Food Safety Authority (new europeanorganism)
• Greater Poland province in Regions of Poland inMT7211 (new regions to incorporate)
New concept interesting and useful• Access to healthcare• selfregulation
Cape Town, June 2006
EUROVOC – Maintenance
Criteria’s of acceptance / non acceptance ofcandidates descriptors
Non acceptance : Descriptor already existing under another form
• Second home secondary home• Community Customs Code exists as a non-
descriptor of « Customs regulations »
Concept which can be obtained in combining twoor three descriptors already created (• European Refugee Fund EC fund + aid to
regufees
Cape Town, June 2006
EUROVOC – Maintenance
Criteria’s of acceptance / non acceptance ofcandidates descriptors
Non acceptance : Term too specific (not enough used)
• Arctic agriculture
Term too national (not useful for the other users)• Popular school (in SV)
Term too vague• Right to peace• Small states
Cape Town, June 2006
PART 4 : INDEXING AND SEARCHINGWITH EUROVOC & the EP Library
Christine Laaboudi-SpoidenPublications Office of the European Communities
EUR-LEX Unit – Documentary sectionIsabelle Gautier – European Parliament - Library
Cape Town, June 2006
INDEXING AND SEARCHINGWITH EUROVOC
1. Content analysis and subject determination :
Example from Eur-Lex database (Directive 50/2006)
Example from Eur-Lex database (Règlement 802/2006)
Cape Town, June 2006
INDEXING AND SEARCHINGWITH EUROVOC
1. Term selection in Eurovoc• Check the relationships (hierarchy and semantical
environment of a descriptor)• Definition of horizontal or vertical specificity• Translation of concepts into indexing terms : cases of
generic terms, compounds terms, lack of precision, propernames.
3. Depth of indexing :• Exhaustivity and selectivity
4. Making choice : indexing policy
Cape Town, June 2006
EUROVOC at EP LIBRARY
1999 : change of our data processing system of ourcatalogue ; involves a new indexing policy to managefor the library.
new catalogue => needs to develop a new consistency forindexing ;
to obtain this consistency, organization of a training for allindexers ;
creation of a Working Group in charge of the IndexingCoordination among the library.
Cape Town, June 2006
EUROVOC at EP LIBRARYThe Indexing Coordination Group
Working Group formed by indexers InformationSpecialists (nationalities and languagesdifferents) in charge of :
Writing an internal guide to use the practicalrules for indexing, this for the departement ;
Creating some updated lists (descriptors studiedand descriptors created for the Library) andtemplates (to propose a creation or amodification) useful for the colleagues
organizing regularly some meetings on theindexing policy and its implementation;
training the new colleagues.
Cape Town, June 2006
EUROVOC at EP LIBRARYThe Indexing Guide
Target : to obtain a better consistency of theindexing operation in the catalogue and a goodknowledge of the new data processing system.
Contents three parts : definition and basic rules for indexing ; the indexing policy in the library ; practical application in our catalogue.
Completed by some advised-sheets for indexing if itappears necessary.
Cape Town, June 2006
EUROVOC at EP LIBRARYIndexing Meetings
Target : the group studies the proposals of newdescriptors or modifications sent by thecolleagues ;
To answer to specific questions asked by thecolleagues ; to write if necessary some advised-sheets ;
questions are analysed by the group in somemeetings and presented in meetings at thedepartment level;
Advise and help role.
Cape Town, June 2006
EUROVOC at EP LIBRARYExamples of proposals received by the Group
Candidate-descriptor created (library level) : Community law-international lawMT 1231 international law - BT international lawSN influence du droit communautaire sur le droit
international et vice-versaCandidate-descriptor rejected : environmental damage principleAdvise to index with : environment impact + risk
preventionModification of a descriptor : polluter pays principleProposal to change the English term (in place of polluter
pays policy).
Cape Town, June 2006
EUROVOC at THE EP LIBRARYTraining
Training Organisation for new colleagues :
Internal with a presentation of : the thesaurus,the indexing guide, the indexing policy of thedepartment, indexing in our catalogue and littlepractical exercises ;
internal but an external trainer to review or totrain - if necessary – to index a group of people
external : as needs requested by indexers and iftraining available in the different countries.
Cape Town, June 2006
EUROVOC at EP LIBRARY
European Parliament’s role as member of MaintenanceCommittee :
Represents both the EP and the national parliaments at theMaintenance Committee ;
Receives as representative the proposals of the nationalparliaments users of the thesaurus ;
Filters the proposals (criteria's rejection : concept toonational or too specific or too vague) ;
Forwards the proposals of the department and of thenational parliaments to the Committee ;
organises regularly seminars with national Parliaments.