implementing a taxonomy in a content management portal
Post on 27-Jan-2015
108 Views
Preview:
DESCRIPTION
TRANSCRIPT
Implementing a Taxonomy in a Content Management Portal
Content Week 2005Miami, Florida
Monday, January 31, 2005Workshop H
2:45pm – 4:45 pm
Marjorie M.K. HlavaAccess Innovations, Inc.
505-998-0800mhlava@accessinn.com
www.accessinn.com
Introductions
• Name• Project• Expectations for these two short hours• Please fill in the sign up sheet• Would you like
– 1. Copy of this presentation?– 2. Sample software?– 3. Other information?
Copyright © 2005 Access Innovations, Inc.
What will we talk about this afternoon?
• 1.Definitions• 2.Where taxonomy fits in the Information Circle• 3.Where to use a taxonomy• 4.Taxonomies for Communities of Practice• 5.Surrounding theories and applications• 6.How to build and maintain• 7.How is used in enterprise information
Thesaurus Master
Data Feed
MAIto add Metadata
Database Management System Add
Metadata using MAI
Search
Inverted File
Implementing a Taxonomy in a Content Management Portal
Copyright © 2005 Access Innovations, Inc.
1. Definitions
Copyright © 2005 Access Innovations, Inc.
What is a taxonomy?
• A hierarchical thesaurus with authority terms applied at the final node
• A browse-able web interface• A Linnaean System• A browse- able list with the term instance at
the final leaf
Copyright © 2005 Access Innovations, Inc.
Types of Taxonomies
• Naming and organizing things into groups that share similar characteristics
• 1. Flat – just a list• 2. Hierarchical
– Taxonomic view• 3. Faceted
– Sorted by a single charasteristic – Metadata - Dublin Core– COSATI -GILS
• 4. Thesaurus– Term records– Database backend– Easier to modify and maintain
Copyright © 2005 Access Innovations, Inc.
Taxonomy in meta data
• Definition– Taxonomy is a thesaurus in its hierarchical view
with the authority files applied at the final nodes– It allows the browse-able front end to a portal– It provides keyword and name access to the
content in the portal
Copyright © 2005 Access Innovations, Inc.
Taxonomy definition
• A taxonomy is a thesaurus in hierarchical view with authority file terms added at the final nodes
• Thesaurus• Authority file• Hierarchical form• Final nodes
Copyright © 2005 Access Innovations, Inc.
Thesaurus
• Concepts • Methods• Procedures
• Cognitive approach• The knowledge capture piece• The topics or subjects
Copyright © 2005 Access Innovations, Inc.
Authority file
• People• Places• Things
• The tangible approach• Concrete Entities
Copyright © 2005 Access Innovations, Inc.
Hierarchical view
• Gives the Portal view• The view of all the preferred terms in
categorized order• An outline of the thesaurus
Copyright © 2005 Access Innovations, Inc.
Final Nodes• The last position on the hierarchical tree
– Taxonomy• concept
– narrower terms» final node - people, place or thing term» document instance» Letter to George Wiesman Dec 12, 2003» Technical report number TR-1039» Museum artifact 1706 wodden wagon wheel
Copyright © 2005 Access Innovations, Inc.
Term Records – the Database Part
• Associative terms– Related terms
• Equivalence terms– Preferred and non preferred– Use and used for– Synonyms
• Hierarchical terms– Broader narrower terms– Parent Child
Copyright © 2005 Access Innovations, Inc.
Other term record fields
• Scope notes• Cross references• History• Term Status• Category• User defined
Copyright © 2005 Access Innovations, Inc.
2. Where does a taxonomy fit in the information circle?
Copyright © 2005 Access Innovations, Inc.
Information Circle - Overview
Taxonomy
User
Content
Output
Copyright © 2005 Access Innovations, Inc.
Content
Taxonomy
User
Content
Output
•Web Pages•White Papers•Research Reports•Licensed Data Feeds•Intranet•Internal Reports•Lotus Notes files•Databases•Public Relations Documents/Press Releases•Market Research Reports•Customer Relationship Management (CRM)•HR Files•Accounting/Financial Records•Legal Documents•Patents•Museum artifacts
Copyright © 2005 Access Innovations, Inc.
Taxonomy
User
Content
Output
Content – cont’d
HTML – Meta name / KeywordsDB – Field / Meta tag / ElementXML – Entity table for valid values
Content Creation:
Copyright © 2005 Access Innovations, Inc.
Taxonomy
Taxonomy
User
Content
Output
Taxonomy is applied to new and existing content:
Meta Tags
Thesaurus TermsAuthority Terms
DateAuthor
Descriptionetc.
Rule Base Taxonomy
Copyright © 2005 Access Innovations, Inc.
Taxonomy – cont’d
Taxonomy
User
Content
Output
Index data - Manually - Automatically
Suggest new candidate terms
Review
Copyright © 2005 Access Innovations, Inc.
Output
Taxonomy
User
Content
Output
Searchable Data
- Internal Data - External Data
Copyright © 2005 Access Innovations, Inc.
User
Taxonomy
User
Content
Output
Web Browsing/Searching
Database Browsing/Searching
Query Resolution
Copyright © 2005 Access Innovations, Inc.
User – cont’d
Taxonomy
Output
User
Content
User Input - Suggested Candidate Terms - New Documents
Reports Based on User Search - Search Logs - Null Hits (These will also suggest new candidate terms)
Copyright © 2005 Access Innovations, Inc.
New Content
Taxonomy
User
NewContent
Output
The cycle begins again
Copyright © 2005 Access Innovations, Inc.
Information Circle - Overview
Taxonomy
User
Content
Output
Copyright © 2005 Access Innovations, Inc.
3. Where to use a taxonomy
• Link the Taxonomy and Indexing • Always in sync with the industry• Keep up to date with terminology• Automatically index the old data• Filter newsfeeds• Search using the Taxonomy• File using the taxonomy• Spell check using the taxonomy• Link to translation system• Catalog using the taxonomy• Index a book
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Thesaurus Master
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
Database Management
System - Add Metadata
using MAI
Search
Inverted File
AadvarkAlligator
AppleAdvantage
….Zebra
Record locatorAccessinn.com/12345/demofile/recid15
Database records
Each with many
elements
Portal Searching
Copyright © 2005 Access Innovations, Inc.
Search
Inverted File
AadvarkAlligatorApple
Advantage….
Zebra
Record locatorAccessinn.com/12345/demofile/recid15
Database records
Each with many elements
Portal Searching
Many data bases can be reached
Copyright © 2005 Access Innovations, Inc.
4. Taxonomies forCommunities of Practice
Copyright © 2005 Access Innovations, Inc.
Taxonomies in a Community of Practice
• Nature of Communities of Practice (CoP)• Taxonomies in context• Value of taxonomies• Creating a taxonomy• Applying the taxonomy
Copyright © 2005 Access Innovations, Inc.
Nature of CoPs
• Free flowing, loosely structured
• Simple, ad hoc categorization
• Active CoPs need organization
• Search tends to be hit-or-miss
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Copyright © 2005 Access Innovations, Inc.
Taxonomies in Context
A taxonomy aspires to be:• a correlation of the different functional, regional
and (possibly) national languages used by a community of practice
• a support mechanism for navigation• a support tool for search engines and knowledge
maps• an authority for tagging documents and other
information objects• a knowledge base in its own right
Reference: “Taxonomies: the vital tool of information architecture”, www.tfpl.com
Copyright © 2005 Access Innovations, Inc.
Value of Taxonomies
• Improves organization & structure• Facilitates navigation• Facilitates knowledge discovery• Reduces effort• Saves time
“Taxonomies are better created by professional indexers or librarians than by domain experts.”
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Copyright © 2005 Access Innovations, Inc.
Naval Postgraduate School’s Homeland Security Taxonomy (1)
Copyright © 2005 Access Innovations, Inc.
Naval Postgraduate School’s Homeland Security Taxonomy (2)
Copyright © 2005 Access Innovations, Inc.
IBM Insight graphical view
Copyright © 2005 Access Innovations, Inc.
Applying a Taxonomy (1)
Manually• Add terms into
meta data fields• Design
navigation & site indexes with taxonomy hierarchy
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Incorporating Hierarchical Classification from a Taxonomy
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Applying a Taxonomy (2)
System integration• Search & retrieval
systems• Auto-assignment
of metadata• Categorization
systems
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Applying the Taxonomy to a Digital Library
Web portal
Locally held
documents
Public repositories
Commercial data sources
Agency data sources
INTERNET (public)
spiders
Meta-Search Tool
Filtered content
Search engineSearch engine
Search engine
Search engineSearch engine
Automated categorization
Library catalogs
Search engine
Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
Copyright © 2005 Access Innovations, Inc.
5. Surrounding theories and applications
Copyright © 2005 Access Innovations, Inc.
Other Vocabulary types
• Uncontrolled lists• Classification System• Subject headings• Controlled vocabulary
– usually synonyms and spelling• Authority files• Thesaurus• Taxonomy
Copyright © 2005 Access Innovations, Inc.
Uncontrolled list - define
• Add terms as they occur• No cross reference• Simple flat structure
Copyright © 2005 Access Innovations, Inc.
Controlled term lists - defined
• State the preferred terms • Provide allowed term entry• Heavily cross referenced• Not generally hierarchical• Popular• Easy to create
Copyright © 2005 Access Innovations, Inc.
Controlled term list - format
• Cars – use Automobiles
• Personal Computer – use Microcomputer
Copyright © 2005 Access Innovations, Inc.
Classification vs Subject Headings
• Classification– single spot or placement– browse physical list– often a numbering system– clear hierarchy– no or few cross references
Copyright © 2005 Access Innovations, Inc.
Classification vs Subject Headings
• Subject headings– generic search– hidden classification system– related terms and cross references in heavy use– Usually the inverted form
• cells, electric
– Alphabetic access
Copyright © 2005 Access Innovations, Inc.
Authority systems - defined
• Lists of terms in the preferred format for use• Frequently have cross references• Widely available• Frequently coded lists• Brand names
Copyright © 2005 Access Innovations, Inc.
Authority lists - examples
• ISO Country Name and Code– International Standards Organization
• ISO Language list• NAICS (SIC)
– Standard Industrial Classification Code (SIC)– Replaced by– North American Industrial Classification System
(NAICS)
Copyright © 2005 Access Innovations, Inc.
What is a thesaurus?
• Jessica L. Milstead. All Rights Reserved• “For writers, it is a tool like Roget’s one with words grouped
and classified to help select the best word to convey a specific nuance of meaning.
• For indexers and searchers, it is an information storage and retrieval tool: a listing of words and phrases authorized for use in an indexing system, together with relationships, variants and synonyms, and aids to navigation through the thesaurus”
• www.jelem.com
Copyright © 2005 Access Innovations, Inc.
Thesaurus - defined
• For information retrieval 1960’s– indexing either intellectual or automatic– in searching– searching but not indexing– indexing but not searching– hierarchical view for searching
Copyright © 2005 Access Innovations, Inc.
Thesaurus - defined
• Monolingual - standard– British – English - ISO 5578– American – English –ANSI/NISO Z39.19
• Multilingual – standard ISO 5579– concept mapping– Eurovoc
• Discipline or Mission based - ad hoc
Copyright © 2005 Access Innovations, Inc.
Thesaurus -standard format
• Main Entries• Top Terms - TT• Broader Terms - BT • Narrower Terms - NT• RELATED TERMS - RT• Scope Notes - SN• History - HI• Date term added/changed - DA
Copyright © 2005 Access Innovations, Inc.
Standards
• Monolingual– NISO / ANSI – Z39.19– ISO 5578
• Multilingual– ISO 5579
Copyright © 2005 Access Innovations, Inc.
ISO Standards
• Set up already - easy to adopt• Multiple broader terms• The standards outline procedures
– ISO -better for implementation– NISO much better reading
Copyright © 2005 Access Innovations, Inc.
Why do we index ?
• Improve precision– define scope of terms
• Improve recall– different terms for same concept
• Guide to a field of expertise• Learning tool• Richer expression
Copyright © 2005 Access Innovations, Inc.
Uses ?
• Indexing*– …process by which subject terms or classification symbols
are assigned to concepts in documents
– A thesaurus is also known as an indexing language
– * not the building of the inverted file in computer sense of indexing
Copyright © 2005 Access Innovations, Inc.
What are we controlling ?
• Synonyms– different terms same concept
• Polysemes or Homonyms– same word different meanings– Lead– Reading
Copyright © 2005 Access Innovations, Inc.
How ?
• Meaning– delineation of scope of a term
• Term equivalence– linking of synonyms
• Disambiguation of homonyms– lead (metal)– lead (element)– lead (management)
Copyright © 2005 Access Innovations, Inc.
Precision options
• Language specificity• Coordination• Compound terms - level of
precoordination• Homographs and scope notes• Word distance indication
Copyright © 2005 Access Innovations, Inc.
Precision options
• Structural relationships• Links and roles• Treatment and aspect codes• Weighting
Copyright © 2005 Access Innovations, Inc.
Disambiguation
Bill Invoice
Bill Legislative
Bill Sport
Bill Person
Copyright © 2005 Access Innovations, Inc.
Disambiguation
Bills Invoices
Bills Legislation
Bill Animal
Bill Person
PT
NT BT
RTRT
BTNT
Copyright © 2005 Access Innovations, Inc.
6. How to build and maintain a taxonomy
Copyright © 2005 Access Innovations, Inc.
How to build a taxonomy
• Collect the terms• Pull out authority terms• Organize into arrays• Choose top terms• Organize hierarchically• Flesh out term records• Test, review, and edit
Copyright © 2005 Access Innovations, Inc.
Or said another way …
• Define scope• Collect terms and relationships• Identify existing taxonomies• Identify resources• Create & refine taxonomy• Apply taxonomy• Review and update
Copyright © 2005 Access Innovations, Inc.
Maintain
• Steady stream of terms– Web logs – Null sets– New announcements– Indexing team– Library– Records managers– Etc.
• Candidate terms• Out of date is nearly useless
Copyright © 2005 Access Innovations, Inc.
Best Results Measures
• Accuracy• Productivity• Hits, Misses and Noise• Precision (Recall)• Relevance• Ease of set up• Time to production
Copyright © 2005 Access Innovations, Inc.
Integration
• Thesaurus– full featured– multiple views– multiple versions– multiple languages
• Automatic indexing– filtering– assisted
• Data Harmony MAI and Thesaurus Master
Copyright © 2005 Access Innovations, Inc.
Visual Taxonomy
• Ways to look– Hierarchical– Alphabetic – by term– Ring diagrams– Topic maps– Related terms
Visual Taxonomy
Copyright © 2005 Access Innovations, Inc.
API to Many Systems for CMS
Copyright © 2005 Access Innovations, Inc.
Apply to the meta data
• Automatic application?• Spider setting internally• External web crawls – use all aliases• Filter data• Enhance search experience
Copyright © 2005 Access Innovations, Inc.
Meta data
• The fields• The elements
– Class codes– Title– Author– Plaintiff– Product– subject / topic
• Meta Name Keywords in HTML
Copyright © 2005 Access Innovations, Inc.
Copyright © 2005 Access Innovations, Inc.
7. How Taxonomies are used in Enterprise Information
Copyright © 2005 Access Innovations, Inc.
Brand is repeated in several spots and tied to search as well
Another way of listing brands
Category list from taxonomy is tied to brand
list and product list
Category code from the taxonomy is tied to the brand
list and the product list
Copyright © 2005 Access Innovations, Inc.
Enterprise Taxonomy Management• Consistent application across entire site• Synonyms are used interchangeably• User doesn’t need to know the taxonomy• Pop up view is helpful• Site map for construction and browsing• Allows hidden sections for internal use
Copyright © 2005 Access Innovations, Inc.
Taxonomies
• Form the basis for knowledge sharing• Add value to discussion• Allow deeper retrieval • Are straightforward to create• Require on-going maintenance
Copyright © 2005 Access Innovations, Inc.
Your Taxonomy
• There is too much information to pile it on the floor.
• It fits in many places in the information flow
Copyright © 2005 Access Innovations, Inc.
Data Feed
Thesaurus Master
MAIto add Metadata
Database Management System Add
Metadata using MAI
Search
Inverted File
Implementing a Taxonomy in a Content Management Portal
Copyright © 2005 Access Innovations, Inc.
Thank you for your time!Questions?
Marjorie M.K. Hlava
Access Innovations, Inc.
505-998-0800
mhlava@accessinn.com
www.accessinn.com
top related