taxonomy development and digital projects
DESCRIPTION
Presentation from ALA Midwinter 2009 (American Library Association) meeting as part of the Networked Resources and Metadata Interest Group (NRMIG). A discussion on taxonomy development lead by Laura Dorricott a Taxonomy Project Delivery Manger with Dow Jones Taxonomy Services on Sunday, January 25th 2009. Corresponding Blog post with notes from session by Laura available here: http://synapticacentral.com/content/notes-session-taxonomy-development-and-digital-projectsTRANSCRIPT
© Copyright 2009 Dow Jones and Company
Taxonomy Development and Digital Projects
Laura Dorricott
Project Delivery Manager, Taxonomy Services
Dow Jones Client Solutions
January 25, 2009
Networked Resources and Metadata Interest Group
ALA Midwinter 2009
|© Copyright 2009 Dow Jones and Company
Introduction
Laura Dorricott, Project Delivery Manager, Taxonomy Services, Dow Jones Client Solutions
IHS, Inc. – Indexer and Lexicographer
Synapse – 1995-2005
Taxonomist and Operations Director
Dow Jones – 2005 – Project Delivery Manager
|© Copyright 2009 Dow Jones and Company
Information management needs – What do we do with this???
American
Theo LeSieg
Theodore Seuss Geisel
Children’s writer
March 2, 1904
Springfield, MA
Articles about “Dr. Seuss
Dr. Seuss
|© Copyright 2009 Dow Jones and Company
© 2007, Dow Jones
Taxonomy’s Evolutionary Path
Dictionaries& Flat Lists
HierarchicalTaxonomies
ControlledVocabularyThesauri
Ontologies
StructuredAuthority Files
Taxonomies are the building blocks for ontologies and ontologies are semantic
representations of the real world in all its rich diversity.
Taxonomy is evolving
organically…
|© Copyright 2009 Dow Jones and Company
Definitions of Controlled Vocabularies
List:
“Sometimes called a pick list, a limited set of terms arranged as a simple alphabetical list or in some other logically evident way.”
Synonym ring:
“A group of terms that are considered equivalent for the purposes of retrieval.”
Taxonomy:
“A collection of controlled vocabulary terms organized into a hierarchical structure. Each term has one or more parent/child (broader/narrower) relationships to each other term.”
Thesaurus:
“A controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. Relationship indicators should be employed reciprocally.”
|© Copyright 2009 Dow Jones and Company
Next Generation
Ontology:
“A controlled vocabulary developed to bridge the gap between the real
world and the information world, by striving to exactly model and
control all the fundamentals of information concepts with the goal
of building a new class of intelligent technologies and knowledge
systems.”
|© Copyright 2009 Dow Jones and Company
Purposes of Controlled Vocabularies
Translation Consistency
Provide a framework of concepts that accurately represents the real world.*
Indication of semantic relationships Hierarchical arrangement to assist browsing Search and retrieval
• Improve precision and recall• Reduce search time
* Real world includes physical objects, databases, digital content, and abstract domains of knowledge
|© Copyright 2009 Dow Jones and Company
SEARCH
|© Copyright 2009 Dow Jones and Company
Keyword Search
Keyword searching is insufficient People do not always know what they want People all have different “keywords” People don’t perform complex keyword searches One word can have many meanings
Two or more words can share the same meaning
|© Copyright 2009 Dow Jones and Company
one thing can have many different names
Dr. Peter Roget
one word can mean very different things
|© Copyright 2009 Dow Jones and Company
|© Copyright 2009 Dow Jones and Company
Taxonomy helps people filter out the noise and discover the relevant
things regardless of what they are called.
|© Copyright 2009 Dow Jones and Company
NAVIGATE
|© Copyright 2009 Dow Jones and Company
Search and Navigation are not
alternative solutions, they are
complementary solutions
Users expect both
|© Copyright 2009 Dow Jones and Company
Points of view…
|© Copyright 2009 Dow Jones and Company
one point of view…
|© Copyright 2009 Dow Jones and Company
another point of view…
|© Copyright 2009 Dow Jones and Company
Different audiences will have different views and
good navigation will serve all of them.
|© Copyright 2009 Dow Jones and Company
Building a Taxonomy or Controlled Vocabulary
Now that we know what taxonomies and controlled vocabularies are and can see some of the reasons we need them – what do we do next???
|© Copyright 2009 Dow Jones and Company
Building a Taxonomy or Controlled Vocabulary
Basic issues and principles
One word can have multiple meanings (ambiguity) Two words can share the same meaning (synonymy) Semantic relationships Facets Warrant Structures Metadata
|© Copyright 2009 Dow Jones and Company
Ambiguity
Polysemes (homonyms, homographs)
cranes (birds)cranes (equipment)
Mercury (planet)Mercury (god)Mercury (car)Mercury (metal)
Ambiguity
|© Copyright 2009 Dow Jones and Company
Synonymy
Two words with the same or similar meaning Popular vs. scientific names Generic vs. trade names Slang vs. traditional terms Dialectical variants
Near-synonyms Lexical variants Generic postings
Synonymy
|© Copyright 2009 Dow Jones and Company
Semantic Relationships
Basic Types: Equivalence (USE/UF) Hierarchical (BT/NT) Associative (RT/RT)
Represented by standard codes/symbols
Reciprocity
Semantic Relationships
|© Copyright 2009 Dow Jones and Company
Hierarchical Relationships
Allow for browsable structures Information discovery Search expansion Three types:
Generic Instance Whole-part
|© Copyright 2009 Dow Jones and Company
Hierarchical Relationships
Between a class and its members
“IsA” relationship
A cactus IsA succulent plant, therefore:
succulent plants NT cacti
Generic Hierarchical Relationships
|© Copyright 2009 Dow Jones and Company
Hierarchical Relationships
Between a general category of things or events and an individual instance of that category
Instance is often a proper noun
Also an “IsA” relationship type
Example: mountains NT Rocky Mountains
Instance Hierarchical Relationship
|© Copyright 2009 Dow Jones and Company
Hierarchical Relationships
One concept inherently included in another
Examples: Systems and organs of the body Geographic locations Corporate, social, or political structures
Whole Part Hierarchical Relationships
|© Copyright 2009 Dow Jones and Company
Polyhierachy
Concept logically fits into two different hierarchical structures
Advantage of electronic structures, allows for different viewpoints
Example: Biochemistry
BT biologyBT chemistry
|© Copyright 2009 Dow Jones and Company
Associative Relationships
May suggest additional terms for indexing or searching
Between terms in the same hierarchyOverlapping sibling termsDerivational relationships
Between terms in different hierarchiesMany typesExamples: Process/agent; Action/property;
Cause/effect
|© Copyright 2009 Dow Jones and Company
Form of Terms
Single word or compound terms
Grammatical forms: Nouns and noun phrases Singular / plural
Capitalization Predominantly lowercase characters, except for proper
names, acronyms, trade names, etc.
Punctuation
|© Copyright 2009 Dow Jones and Company
2007 Factiva, Inc. All Rights Reserved.
Standards
•“Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies,” ANSI/NISO Z39 19-2005
•“Z39 50: A Primer on the Protocol,” ANSI/NISO Z39 50
•“Structured Vocabularies for Information Retrieval. Guide. Definitions, Symbols and Abbreviations,” BS 8723-1:2005
•“Structured Vocabularies for Information Retrieval. Guide. Thesauri,” BS 8723-2:2005
•“Guidelines for the Establishment and Development of Multilingual Thesauri,” ISO 5964-1985
•“Guidelines for the Establishment and Development of Monolingual Thesauri,” ISO 2788-1986
•Web Ontology Language (OWL) Overview
Standards
|© Copyright 2009 Dow Jones and Company
2007 Factiva, Inc. All Rights Reserved.
Value Proposition
“40% of corporate users…cannot find the information they need to do their jobs on their intranets.”
Susan Feldman, “The High Cost of Not Finding Information,” KMWorld, March 2004
Value Proposition, or “So what?”
|© Copyright 2009 Dow Jones and Company
Low productivity
High frustration
Little leverage of information
assets
Too many search results
Too many irrelevant hitsThe more precise
I get the more I miss
End-user search illiteracy Multilingual
content
Ambiguous results
Information retrieval issues within companies
|© Copyright 2009 Dow Jones and Company
The controlled vocabulary value proposition
Unlock the value of internal and external content to:
Improve productivity
“Stop searching, start finding”
Reduce cost
Make existing content actionable, not dormant
Avoid reinventing wheels
Gain competitive advantage
Be better informed, act quicker
|© Copyright 2009 Dow Jones and Company
Controlled vocabulary’s role in portal success
Drive usage Improve user experience, leverage portal
investment Drive cultural change
Help develop a common language Support information exchange/reuse
Leverage information management skills Turn information officers into information
architects
|© Copyright 2009 Dow Jones and Company
Value Proposition
Taxonomies make it easier to find information so people are more likely to use intranets and extranets. This results in better return on the time and effort already invested in these intranets and extranets.
Taxonomies improve “hit” rates - people find what they need Everyone has experienced irrelevant results from internet search engines
because • Two or more words or terms can be used to represent a single concept
salinity/saltiness • Two or more words that have the same spelling can represent different
concepts Mercury (planet) Mercury (metal) Mercury (automobile)
Taxonomies eliminate much of this problem
People spend less time searching and more time finding
With a common taxonomy across the organization, knowledge can be more readily shared, reused and repurposed
|© Copyright 2009 Dow Jones and Company
Controlled vocabulary can help reduce costs and increase revenue
Taxonomies can help organizations save money
Reduces the number of hours spent seeking information. Hierarchical relationships allow users to easily narrow or broaden searches as well as look for related information.
Improves productivity by reusing and repurposing content
A taxonomy can help increase revenue Increase customer satisfaction by improving
search efficiency findability Relevance
Provide timely information with up to date terminology Provide more precise information retrieval