lbsc 670
DESCRIPTION
LBSC 670. Information Organization. Today. Guest Speaker –Jeremy York – HathiTrust Classification Thoughts and CV Overview & History Related concepts Examples A note on MARC specifications. Classification concpets. Aboutness , specificity, granularity - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/1.jpg)
LBSC 670
Information Organization
![Page 2: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/2.jpg)
Today
• Guest Speaker –Jeremy York – HathiTrust
• Classification Thoughts and CV– Overview & History– Related concepts– Examples
• A note on MARC specifications
![Page 3: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/3.jpg)
Classification concpets
Aboutness, specificity, granularity
“Words have power,“ - classification systems exist within a socio-political context
Classification methods Manual/automatic, Pre/Post coordinate, Hierarchical/faceted, formal/social
![Page 4: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/4.jpg)
CV overview
• What are controlled vocabularies?– Types– Basic concepts
• How are cv created and maintained– Metadata standards– Example Systems
• When does a CV turn into a KO?– Term Lists, Thesauri, Taxonomies,
Ontologies
![Page 5: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/5.jpg)
Controlled Vocabularies“organized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.” (Warner via Leise, Fast)
“the primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval” (ANSI Z39.19)
![Page 6: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/6.jpg)
Knowledge Organization• “tools that present the organized
interpretation of knowledge structures” (Hjørland)
• “classification schemes that organize materials at a general level…, subject headings that provide more detailed access, and authority files that control variant versions of key information” (Hodge)
![Page 7: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/7.jpg)
Uses of controlled vocabulary• Define scope, content, and context of a
body of knowledge
• Support discovery - Navigation, search, browsing
• Map information objects to user terminology
• Enforce term consistency and relationships
![Page 8: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/8.jpg)
A good CV. . .
• Removes ambiguity
• Defines relationships between things
• Contextualizes information
A+
![Page 9: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/9.jpg)
CV Concepts• Content Analysis
– Ambiguity– Synonymy– Exhaustivity– Specificity– Co-extensivity– Aboutness– Semantic structure– Warrant (User, Literary, Organization)
• Form Analysis– Linguistics– Grammar– Semiotics– Single / Multiple terms
• Indexing & Retrieval– Pre vs. Post Coordinate– Recall vs. Precision– Natural language processing (NLP)
http://bit.ly/lbsc_670_cv
![Page 10: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/10.jpg)
Content Analysis
• Ambiguity– Each term should relate to a single concept
• Synonymy– Each concept should be identified by a single entry
• Specificity– Using the most specific words or phrase expressing the subject
• Exhaustivity– The extent to which the entire document is indexed (Summarization,
depth)• Co-extensivity
– “Assign as many terms as needed to bring out the main theme, and according to guidelines sub-themes.” (p. 29, Lancaster)
– “nothing more, nothing less”• Semantic Structure
– Terms can be related with equivalence, hierarchy, or associated relationships (Use, See, NT, BT, RT)
![Page 11: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/11.jpg)
Content Analysis (2)• Aboutness = Subject/topic?
– Wilson (1968)• Author intent, topicality, relationship to other resources,
textual analysis– Farithorne (1969)
• Intentional aboutness (author), extensional aboutness (document)
– Maron (1977)• objective about (document), subjective about (user), and
retrieval about (information retrieval)– Hjorland (2001)
• “Closely related to theories of meaning, interpretation, and epistemology”
![Page 12: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/12.jpg)
Content Analysis (3)
• Wilson’s criteria for evaluating aboutness (1968)– Identify author’s purpose (intent)– Weigh the predominant topics, elements
(topical analysis)– Group/count a document’s use of concepts
and references (bibliometrics)– Identify essential elements (text analysis)
![Page 13: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/13.jpg)
Content Analysis (4)• Literary Warrant
– “The inclusion of a vocabulary term in a controlled vocabulary based on its appearance in one or more content items. For example, a medical text may use the term “oncology.” Based on literary warrant, that term would be included in the controlled vocabulary even though the general public uses the term “cancer.” (Glosso-Thesaurus)
• User Warrant– “The inclusion of a vocabulary term in a controlled vocabulary based
on use by users. Such terms can be identified through search log analysis or free listing.” (Glosso-Thesaurus)
• Organizational Warrant– “Justification for the...selection of a preferred term due to the
characteristics and context of the organization using the resource” (ANSI Z39.19)
![Page 14: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/14.jpg)
Form Analysis– Linguistics
• Synatx/Form (grammar)• Morphology (internal word structure)• Semantics (meaning)• Pragmatics, discourse analysis (word/phrase
use)– Semiotics
• study of signs/symbols – Lexical structure
• Document layout, markup, tags (think DOM)
![Page 15: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/15.jpg)
Indexing & Retrieval• Pre/Post-Coordinate
• Organization prior to retrieval• Organization at the point of retrieval
• Recall / Precision• Recall: Number of retrieved relevant docs / total number
of docs in collection• Precision: number or retrieved relevant docs / all relevant
docs in collection
• Natural language processing• Uses semantics and syntax to automatically distill
‘aboutness’
![Page 16: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/16.jpg)
Recall & Precision• A collection of 100
documents• Searches
– “Vocabularies”• Recall 100/100 = 1• Precision 100/100 = 1
– “Facet”• Recall 20/100= .2• Precision 20/28 = .71
– “OWL”• Recall 1/100 = .001• Precision 1/1 = 1
CV Entry # of docsControlled Vocabularies
100
Faceted analysis 20
Ontologies 5
OWL 1
RDF 3
Recall = # of docs retrieved / total # of docs in collection
Precision = # relevant of docs retrieved / total relevant # of docs in collection
![Page 17: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/17.jpg)
Types of Controlled Vocabularies
• Term Lists– Glossaries, Dictionaries, Gazetteers, Folksonomies
• Synonym rings– Z39.19 example– Oracle Text
• Taxonomies– Website navigation scheme
• Thesauri / Ontologies– Authority files, subject thesauri, topic maps
![Page 18: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/18.jpg)
http://www.taxotips.com/
![Page 19: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/19.jpg)
Thesauri & taxonomy examples
• List of vocabularies– http://www.slais.ubc.ca/resources/indexing/
database1.htm
– Taxonomy warehouse• Two Examples
– Health & Ageing Thesaurus– Thesaurus of Geographic names
![Page 20: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/20.jpg)
CV Structures
• Organization structures– Hierarchical systems
• Term Lists / Enumerative systems• Hierarchies• Tees
– Facets / Associative relationships– Folksonomies
![Page 21: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/21.jpg)
Hierarchies• Features
– Inclusiveness– “Is-a” relationship– Inheritance– Transitivity– Systematic– Mutually exclusive– Neccesary and
sufficient
From http://bit.ly/lbsc_670_cv
![Page 22: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/22.jpg)
Relationships• Equivalence ( Term Lists)
– “use”, “see”, “isVersionOf”, “isFormatOf”• Hierarchical (Thesauri, Taxonomies)
– Generic – “is a”– Partitive – “is part of”, “has part”, “has conceptual
part”, “member of”– Instance –
• Associative (Facets, Ontologies)– “isReferencedBy”, “isRequiredBy”, “hasDerivative”
![Page 23: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/23.jpg)
Faceted vocabularies
Multi-dimensional, multi-relationship driven, Subject, Object, Predicate
From http://bit.ly/lbsc_670_cv
![Page 24: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/24.jpg)
Folksonomy• Features
– Single level description
– Open vocabulary list
– User supplied/harvested tags
http://trendistic.indextank.com/
![Page 25: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/25.jpg)
Term List Examples
• Authority files – Maps to preferred terms– Library of Congress– Encoded Archival Context– Union List of Artist Names
• Glossaries/Dictionaries –Words & definitions, sometimes topic focused– Glosso-Thesaurus
• Folksonomies –– Contextualization, Trend discovery, Personal Information
• Synonym rings – Used for back-end equivalence in searching– Princeton Wordnet
![Page 26: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/26.jpg)
Choosing a framework• Use questions
– Who is your user, what are their needs?– What systems are your users familiar with?– Will this system be internal/external?
• Content questions– How extensive, defined is the information?– Is your subject matter static or fluid?– What organizational framework best describes your content?
• System Questions– What access are you trying to provide?– What external pressures exist?– What external entities/theories will interact with this system?
![Page 27: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/27.jpg)
Thesauri Definitions– “Guide to use of terms, showing
relationships between them, for the purpose of providing standardized, controlled vocabulary for information storage and retrieval”(Monash)
– “A list of words showing similarities, differences, dependencies, and other relationships to each other”(USG)
![Page 28: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/28.jpg)
Creating a CV (1)• Design methods
– Re-use existing, start with content & desired use ideas
– Committee / community approach• Top-down
– Concept driven• Bottom-up
– Document driven– Empirical approach
• Deductive approach– Select terms, create relationships, perform term control
• Inductive approach– Establish CV at outset, build hierarchies on as needed
basis
![Page 29: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/29.jpg)
Creating a CV (2)• Top-Down (deductive)
– Identify audience– Identify all topics, concepts, uses, and context of the domain– Sort topics identified into an appropriate organization scheme
(enumerative, hierarchical, faceted)– Solidify structure and clean up gaps & redundancies– Assign documents to categories, test retrieval
• Bottom-up (Inductive)– Identify audience– Survey documents for topics/concepts.– Build system on the fly – let content drive structure and limits
of system– Identify gap & redundancies in system– Test retrieval
![Page 30: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/30.jpg)
Creating a CV (3)• Think about scope, use, content, maintenance• Gather Terms
– Based on existing systems, content– Based on user needs/expectations– Investigate issues of specificity, exhaustivity, granularity
• Build hierarchies, relationships– Broader/narrower terms, Related terms, Use/Use for, see/see
also• Establish Rules• Implement• Evaluate• Maintain
http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary
![Page 31: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/31.jpg)
Evaluating a CV
• Goals• Determine if the CV solves retrieval needs of
user/system• Determine if CV matches user’s content
model/term expectations• Methods
• Expert evaluation of CV• User based card sorting compared to actual CV• Identification of non-included documents• Analysis of use of system - HCI
![Page 32: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/32.jpg)
CV Maintenance• Primary responsibility
– Editor, board, committee• New terms
– Is it really new or a different view– What is the proper form & placement
• Modified terms– Include a change log– Use a “USE” reference to point to new term
• Deleted terms– Unused / Overused terms– May want to keep for historical retrieval purposed
• Modification history– Use modification notes, date/time stamps
![Page 33: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/33.jpg)
Case study - MeSH
• http://www.nlm.nih.gov/bsd/disted/video/
![Page 34: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/34.jpg)
Thesauri Concepts• Preferred terms• Non-preferred terms• Semantic relations between terms• How to apply terms (guidelines,
rules)• Scope notes• Adding terms (How to produce terms
that are not listed explicitly in the thesaurus)
![Page 35: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/35.jpg)
Common thesaural identifiers• SN Scope Note
– Instruction, e.g. don’t invert phrases• USE Use (another term in preference to
this one)• UF Used For• BT Broader Term• NT Narrower Term• RT Related Term
![Page 36: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/36.jpg)
Thesauri Guides• National Information Standards
Organization. (2005). Guidelines for the construction, format, and management of monolingual thesauri. ANSI/NISO Z39.19-2005. Bethesda, MD: NISO Press. – http://www.niso.org/standards/resources/Z39-19-
2005.pdf?CFID=5559601&CFTOKEN=31747314
• Aitchison, Jean & Gilchirist, Alan. Thesaurus Construction: A Practical Guide. 3rd ed. London: Aslib, 1997.
• Willpower Information Management Consultants– http://www.willpower.demon.co.uk/thesprin.htm
![Page 37: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/37.jpg)
Thesaurus Exploration
• http://www.getty.edu/research/tools/vocabularies/tgn/
• Protégé introduction and tour– What is protégé?– What is it used for?– How will we use it this semester?
![Page 38: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/38.jpg)
When is a CV an Ontology?
• “The study of being or existence”• “A conceptualization of a
specification” (Gruber)• “An ontology formally defines a
common set of terms that are used to describe and represent a domain.” (OWL)
![Page 39: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/39.jpg)
Webster’s Dictionary• Webster’s Third New International
Dictionary defines Ontology as:1. A science or study of being, specifically
a branch of metaphysics* relating to the nature and relations of being.
2. A theory concerning the kinds of entities and specifically the kinds of abstract entities that are to be admitted to a language system.
*Metaphysics: Nature of being “or” existence.
![Page 40: LBSC 670](https://reader035.vdocument.in/reader035/viewer/2022062410/56816556550346895dd7d53f/html5/thumbnails/40.jpg)
Next Week• Work time for Protégé• Exploration of ontologies