implementing a taxonomy in a content management portal

98
Implementing a Taxonomy in a Content Management Portal Content Week 2005 Miami, Florida Monday, January 31, 2005 Workshop H 2:45pm – 4:45 pm Marjorie M.K. Hlava Access Innovations, Inc. 505-998-0800 [email protected] www.accessinn.com

Upload: accessinnovations

Post on 27-Jan-2015

105 views

Category:

Technology


1 download

DESCRIPTION

On the uses and implementation of taxonomy on the Web, with a particular focus on the taxonomy as part of an enterprise information environment. Presented by Marjorie M.K. Hlava during Content Week 2005 in Miami, Florida.

TRANSCRIPT

Page 1: Implementing a Taxonomy in a Content Management Portal

Implementing a Taxonomy in a Content Management Portal

Content Week 2005Miami, Florida

Monday, January 31, 2005Workshop H

2:45pm – 4:45 pm

Marjorie M.K. HlavaAccess Innovations, Inc.

[email protected]

www.accessinn.com

Page 2: Implementing a Taxonomy in a Content Management Portal

Introductions

• Name• Project• Expectations for these two short hours• Please fill in the sign up sheet• Would you like

– 1. Copy of this presentation?– 2. Sample software?– 3. Other information?

Page 3: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

What will we talk about this afternoon?

• 1.Definitions• 2.Where taxonomy fits in the Information Circle• 3.Where to use a taxonomy• 4.Taxonomies for Communities of Practice• 5.Surrounding theories and applications• 6.How to build and maintain• 7.How is used in enterprise information

Page 4: Implementing a Taxonomy in a Content Management Portal

Thesaurus Master

Data Feed

MAIto add Metadata

Database Management System Add

Metadata using MAI

Search

Inverted File

Implementing a Taxonomy in a Content Management Portal

Page 5: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

1. Definitions

Page 6: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

What is a taxonomy?

• A hierarchical thesaurus with authority terms applied at the final node

• A browse-able web interface• A Linnaean System• A browse- able list with the term instance at

the final leaf

Page 7: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Types of Taxonomies

• Naming and organizing things into groups that share similar characteristics

• 1. Flat – just a list• 2. Hierarchical

– Taxonomic view• 3. Faceted

– Sorted by a single charasteristic – Metadata - Dublin Core– COSATI -GILS

• 4. Thesaurus– Term records– Database backend– Easier to modify and maintain

Page 8: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomy in meta data

• Definition– Taxonomy is a thesaurus in its hierarchical view

with the authority files applied at the final nodes– It allows the browse-able front end to a portal– It provides keyword and name access to the

content in the portal

Page 9: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomy definition

• A taxonomy is a thesaurus in hierarchical view with authority file terms added at the final nodes

• Thesaurus• Authority file• Hierarchical form• Final nodes

Page 10: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Thesaurus

• Concepts • Methods• Procedures

• Cognitive approach• The knowledge capture piece• The topics or subjects

Page 11: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Authority file

• People• Places• Things

• The tangible approach• Concrete Entities

Page 12: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Hierarchical view

• Gives the Portal view• The view of all the preferred terms in

categorized order• An outline of the thesaurus

Page 13: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Final Nodes• The last position on the hierarchical tree

– Taxonomy• concept

– narrower terms» final node - people, place or thing term» document instance» Letter to George Wiesman Dec 12, 2003» Technical report number TR-1039» Museum artifact 1706 wodden wagon wheel

Page 14: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Term Records – the Database Part

• Associative terms– Related terms

• Equivalence terms– Preferred and non preferred– Use and used for– Synonyms

• Hierarchical terms– Broader narrower terms– Parent Child

Page 15: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Other term record fields

• Scope notes• Cross references• History• Term Status• Category• User defined

Page 16: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

2. Where does a taxonomy fit in the information circle?

Page 17: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Information Circle - Overview

Taxonomy

User

Content

Output

Page 18: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Content

Taxonomy

User

Content

Output

•Web Pages•White Papers•Research Reports•Licensed Data Feeds•Intranet•Internal Reports•Lotus Notes files•Databases•Public Relations Documents/Press Releases•Market Research Reports•Customer Relationship Management (CRM)•HR Files•Accounting/Financial Records•Legal Documents•Patents•Museum artifacts

Page 19: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomy

User

Content

Output

Content – cont’d

HTML – Meta name / KeywordsDB – Field / Meta tag / ElementXML – Entity table for valid values

Content Creation:

Page 20: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomy

Taxonomy

User

Content

Output

Taxonomy is applied to new and existing content:

Meta Tags

Thesaurus TermsAuthority Terms

DateAuthor

Descriptionetc.

Rule Base Taxonomy

Page 21: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomy – cont’d

Taxonomy

User

Content

Output

Index data - Manually - Automatically

Suggest new candidate terms

Review

Page 22: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Output

Taxonomy

User

Content

Output

Searchable Data

- Internal Data - External Data

Page 23: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

User

Taxonomy

User

Content

Output

Web Browsing/Searching

Database Browsing/Searching

Query Resolution

Page 24: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

User – cont’d

Taxonomy

Output

User

Content

User Input - Suggested Candidate Terms - New Documents

Reports Based on User Search - Search Logs - Null Hits (These will also suggest new candidate terms)

Page 25: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

New Content

Taxonomy

User

NewContent

Output

The cycle begins again

Page 26: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Information Circle - Overview

Taxonomy

User

Content

Output

Page 27: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

3. Where to use a taxonomy

• Link the Taxonomy and Indexing • Always in sync with the industry• Keep up to date with terminology• Automatically index the old data• Filter newsfeeds• Search using the Taxonomy• File using the taxonomy• Spell check using the taxonomy• Link to translation system• Catalog using the taxonomy• Index a book

Page 28: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Page 29: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Page 30: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Page 31: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Thesaurus Master

Page 32: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Page 33: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Database Management

System - Add Metadata

using MAI

Search

Inverted File

AadvarkAlligator

AppleAdvantage

….Zebra

Record locatorAccessinn.com/12345/demofile/recid15

Database records

Each with many

elements

Portal Searching

Page 34: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Search

Inverted File

AadvarkAlligatorApple

Advantage….

Zebra

Record locatorAccessinn.com/12345/demofile/recid15

Database records

Each with many elements

Portal Searching

Many data bases can be reached

Page 35: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

4. Taxonomies forCommunities of Practice

Page 36: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomies in a Community of Practice

• Nature of Communities of Practice (CoP)• Taxonomies in context• Value of taxonomies• Creating a taxonomy• Applying the taxonomy

Page 37: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Nature of CoPs

• Free flowing, loosely structured

• Simple, ad hoc categorization

• Active CoPs need organization

• Search tends to be hit-or-miss

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Page 38: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomies in Context

A taxonomy aspires to be:• a correlation of the different functional, regional

and (possibly) national languages used by a community of practice

• a support mechanism for navigation• a support tool for search engines and knowledge

maps• an authority for tagging documents and other

information objects• a knowledge base in its own right

Reference: “Taxonomies: the vital tool of information architecture”, www.tfpl.com

Page 39: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Value of Taxonomies

• Improves organization & structure• Facilitates navigation• Facilitates knowledge discovery• Reduces effort• Saves time

“Taxonomies are better created by professional indexers or librarians than by domain experts.”

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Page 40: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Naval Postgraduate School’s Homeland Security Taxonomy (1)

Page 41: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Naval Postgraduate School’s Homeland Security Taxonomy (2)

Page 42: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

IBM Insight graphical view

Page 43: Implementing a Taxonomy in a Content Management Portal
Page 44: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Applying a Taxonomy (1)

Manually• Add terms into

meta data fields• Design

navigation & site indexes with taxonomy hierarchy

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Page 45: Implementing a Taxonomy in a Content Management Portal

Incorporating Hierarchical Classification from a Taxonomy

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Page 46: Implementing a Taxonomy in a Content Management Portal

Applying a Taxonomy (2)

System integration• Search & retrieval

systems• Auto-assignment

of metadata• Categorization

systems

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Page 47: Implementing a Taxonomy in a Content Management Portal

Applying the Taxonomy to a Digital Library

Web portal

Locally held

documents

Public repositories

Commercial data sources

Agency data sources

INTERNET (public)

spiders

Meta-Search Tool

Filtered content

Search engineSearch engine

Search engine

Search engineSearch engine

Automated categorization

Library catalogs

Search engine

Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA

Page 48: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

5. Surrounding theories and applications

Page 49: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Other Vocabulary types

• Uncontrolled lists• Classification System• Subject headings• Controlled vocabulary

– usually synonyms and spelling• Authority files• Thesaurus• Taxonomy

Page 50: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Uncontrolled list - define

• Add terms as they occur• No cross reference• Simple flat structure

Page 51: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Controlled term lists - defined

• State the preferred terms • Provide allowed term entry• Heavily cross referenced• Not generally hierarchical• Popular• Easy to create

Page 52: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Controlled term list - format

• Cars – use Automobiles

• Personal Computer – use Microcomputer

Page 53: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Classification vs Subject Headings

• Classification– single spot or placement– browse physical list– often a numbering system– clear hierarchy– no or few cross references

Page 54: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Classification vs Subject Headings

• Subject headings– generic search– hidden classification system– related terms and cross references in heavy use– Usually the inverted form

• cells, electric

– Alphabetic access

Page 55: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Authority systems - defined

• Lists of terms in the preferred format for use• Frequently have cross references• Widely available• Frequently coded lists• Brand names

Page 56: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Authority lists - examples

• ISO Country Name and Code– International Standards Organization

• ISO Language list• NAICS (SIC)

– Standard Industrial Classification Code (SIC)– Replaced by– North American Industrial Classification System

(NAICS)

Page 57: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

What is a thesaurus?

• Jessica L. Milstead. All Rights Reserved• “For writers, it is a tool like Roget’s one with words grouped

and classified to help select the best word to convey a specific nuance of meaning.

• For indexers and searchers, it is an information storage and retrieval tool: a listing of words and phrases authorized for use in an indexing system, together with relationships, variants and synonyms, and aids to navigation through the thesaurus”

• www.jelem.com

Page 58: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Thesaurus - defined

• For information retrieval 1960’s– indexing either intellectual or automatic– in searching– searching but not indexing– indexing but not searching– hierarchical view for searching

Page 59: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Thesaurus - defined

• Monolingual - standard– British – English - ISO 5578– American – English –ANSI/NISO Z39.19

• Multilingual – standard ISO 5579– concept mapping– Eurovoc

• Discipline or Mission based - ad hoc

Page 60: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Thesaurus -standard format

• Main Entries• Top Terms - TT• Broader Terms - BT • Narrower Terms - NT• RELATED TERMS - RT• Scope Notes - SN• History - HI• Date term added/changed - DA

Page 61: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Standards

• Monolingual– NISO / ANSI – Z39.19– ISO 5578

• Multilingual– ISO 5579

Page 62: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

ISO Standards

• Set up already - easy to adopt• Multiple broader terms• The standards outline procedures

– ISO -better for implementation– NISO much better reading

Page 63: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Why do we index ?

• Improve precision– define scope of terms

• Improve recall– different terms for same concept

• Guide to a field of expertise• Learning tool• Richer expression

Page 64: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Uses ?

• Indexing*– …process by which subject terms or classification symbols

are assigned to concepts in documents

– A thesaurus is also known as an indexing language

– * not the building of the inverted file in computer sense of indexing

Page 65: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

What are we controlling ?

• Synonyms– different terms same concept

• Polysemes or Homonyms– same word different meanings– Lead– Reading

Page 66: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

How ?

• Meaning– delineation of scope of a term

• Term equivalence– linking of synonyms

• Disambiguation of homonyms– lead (metal)– lead (element)– lead (management)

Page 67: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Precision options

• Language specificity• Coordination• Compound terms - level of

precoordination• Homographs and scope notes• Word distance indication

Page 68: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Precision options

• Structural relationships• Links and roles• Treatment and aspect codes• Weighting

Page 69: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Disambiguation

Bill Invoice

Bill Legislative

Bill Sport

Bill Person

Page 70: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Disambiguation

Bills Invoices

Bills Legislation

Bill Animal

Bill Person

PT

NT BT

RTRT

BTNT

Page 71: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

6. How to build and maintain a taxonomy

Page 72: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

How to build a taxonomy

• Collect the terms• Pull out authority terms• Organize into arrays• Choose top terms• Organize hierarchically• Flesh out term records• Test, review, and edit

Page 73: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Or said another way …

• Define scope• Collect terms and relationships• Identify existing taxonomies• Identify resources• Create & refine taxonomy• Apply taxonomy• Review and update

Page 74: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Maintain

• Steady stream of terms– Web logs – Null sets– New announcements– Indexing team– Library– Records managers– Etc.

• Candidate terms• Out of date is nearly useless

Page 75: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Best Results Measures

• Accuracy• Productivity• Hits, Misses and Noise• Precision (Recall)• Relevance• Ease of set up• Time to production

Page 76: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Integration

• Thesaurus– full featured– multiple views– multiple versions– multiple languages

• Automatic indexing– filtering– assisted

• Data Harmony MAI and Thesaurus Master

Page 77: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Visual Taxonomy

• Ways to look– Hierarchical– Alphabetic – by term– Ring diagrams– Topic maps– Related terms

Visual Taxonomy

Page 78: Implementing a Taxonomy in a Content Management Portal
Page 79: Implementing a Taxonomy in a Content Management Portal
Page 80: Implementing a Taxonomy in a Content Management Portal
Page 81: Implementing a Taxonomy in a Content Management Portal
Page 82: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

API to Many Systems for CMS

Page 83: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Apply to the meta data

• Automatic application?• Spider setting internally• External web crawls – use all aliases• Filter data• Enhance search experience

Page 84: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Meta data

• The fields• The elements

– Class codes– Title– Author– Plaintiff– Product– subject / topic

• Meta Name Keywords in HTML

Page 85: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Page 86: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

7. How Taxonomies are used in Enterprise Information

Page 87: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Brand is repeated in several spots and tied to search as well

Page 88: Implementing a Taxonomy in a Content Management Portal
Page 89: Implementing a Taxonomy in a Content Management Portal
Page 90: Implementing a Taxonomy in a Content Management Portal

Another way of listing brands

Page 91: Implementing a Taxonomy in a Content Management Portal

Category list from taxonomy is tied to brand

list and product list

Page 92: Implementing a Taxonomy in a Content Management Portal

Category code from the taxonomy is tied to the brand

list and the product list

Page 93: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Enterprise Taxonomy Management• Consistent application across entire site• Synonyms are used interchangeably• User doesn’t need to know the taxonomy• Pop up view is helpful• Site map for construction and browsing• Allows hidden sections for internal use

Page 94: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Taxonomies

• Form the basis for knowledge sharing• Add value to discussion• Allow deeper retrieval • Are straightforward to create• Require on-going maintenance

Page 95: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Your Taxonomy

• There is too much information to pile it on the floor.

• It fits in many places in the information flow

Page 96: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Page 97: Implementing a Taxonomy in a Content Management Portal

Data Feed

Thesaurus Master

MAIto add Metadata

Database Management System Add

Metadata using MAI

Search

Inverted File

Implementing a Taxonomy in a Content Management Portal

Page 98: Implementing a Taxonomy in a Content Management Portal

Copyright © 2005 Access Innovations, Inc.

Thank you for your time!Questions?

Marjorie M.K. Hlava

Access Innovations, Inc.

505-998-0800

[email protected]

www.accessinn.com