cyborg categorization the basics tom reamy knowledge architect intranet consultant

Post on 16-Dec-2015

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Cyborg Categorization The Basics

Tom Reamy

Knowledge Architect

Intranet Consultant

Categorization Explosion

Autonomy Semio Verity Inxight Topical Net Mohomine Simile H5Technologies YellowBrix

GammaSite MetaTagger Applied Semantics Sageware SmartLogik Quiver Stratify Vivisimo Other - Tacit

Categorization: Why Now?

Search Stinks Professionals spend more time looking

for information than using it. Solution: Browse and Search Buy Search to get Categorization Need a Taxonomy

Taxonomy: How

Old Answer: Manual– hire a bunch of librarians and IA’s– Costly, difficult to maintain

New Answer: – Automatic Categorization

A Better Answer:– Cyborg Categorization– Integrate Content Management, Search,Taxonomy – Integrate central IA’s and local authors

Auto-Categorization: the How

Automatic Methods Catalog by Example

– Training Sets (5-500)– Bag of Words or language and concepts

Statistical Clustering– Set of Documents & Taxonomy Level

Semi-Automatic: Rules

Auto-Categorization: the How

Next Generation Support Vector Machines Machine Learning World Knowledge

Incremental Improvement From 75% to 85%

Critical Issue: Integration

Automatic vs. Humanatic

Humans are better, but not as consistent– General bin, understandable mistakes– Bring outside contexts to the document

Purpose, similar documents, common sense

Automatic is faster and cheaper.– Faster yes, Cheaper ?– Cost of poorer quality categorization

Intranet: 20,000 users taking 60 seconds longer = $20,000 a week

Automatic vs Humanatic:News Feeds to Corporate Intranets

News Feeds and Content providers– uniform content, size and structure– professional writers– Simple or standard vocabulary

Corporate intranet– Wildly varied content– Mix of good, bad, and ugly writers– Tower of Babel: Acronyms, special meanings

The Answer is Cyborg

No one software has best of automatic Automatic Categorization is not Integration not Assimilation Human and Computer Integration Cyborg Integration and Content

Management, Search

Human - Computer Integration

Humans– Create top level taxonomy– Create rules, select training sets– Final Quality Control

Automatic– Provisional Categorization and Meta Data– Automatic Summarization

Combination– Integration of Rules, World Knowledge

Content Management & Search

Content Management– Distributed Work Flow: Central IA & local

authors– Collaborative Categorization– Taxonomic Publishing Model

Search– Support Browse and Seach– Real time clustering, categorizing– Collaborative filtering - by category

Lessons Learned

Out of the Box, Out of Your Mind Play well with others Brain surgery is fun! World revolves around you Quality counts and size matters Let a Hundred flowers Bloom The End

The END

Really.

top related