cyborg categorization the basics tom reamy knowledge architect intranet consultant

13
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Upload: augustus-hart

Post on 16-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Cyborg Categorization The Basics

Tom Reamy

Knowledge Architect

Intranet Consultant

Page 2: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Categorization Explosion

Autonomy Semio Verity Inxight Topical Net Mohomine Simile H5Technologies YellowBrix

GammaSite MetaTagger Applied Semantics Sageware SmartLogik Quiver Stratify Vivisimo Other - Tacit

Page 3: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Categorization: Why Now?

Search Stinks Professionals spend more time looking

for information than using it. Solution: Browse and Search Buy Search to get Categorization Need a Taxonomy

Page 4: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Taxonomy: How

Old Answer: Manual– hire a bunch of librarians and IA’s– Costly, difficult to maintain

New Answer: – Automatic Categorization

A Better Answer:– Cyborg Categorization– Integrate Content Management, Search,Taxonomy – Integrate central IA’s and local authors

Page 5: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Auto-Categorization: the How

Automatic Methods Catalog by Example

– Training Sets (5-500)– Bag of Words or language and concepts

Statistical Clustering– Set of Documents & Taxonomy Level

Semi-Automatic: Rules

Page 6: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Auto-Categorization: the How

Next Generation Support Vector Machines Machine Learning World Knowledge

Incremental Improvement From 75% to 85%

Critical Issue: Integration

Page 7: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Automatic vs. Humanatic

Humans are better, but not as consistent– General bin, understandable mistakes– Bring outside contexts to the document

Purpose, similar documents, common sense

Automatic is faster and cheaper.– Faster yes, Cheaper ?– Cost of poorer quality categorization

Intranet: 20,000 users taking 60 seconds longer = $20,000 a week

Page 8: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Automatic vs Humanatic:News Feeds to Corporate Intranets

News Feeds and Content providers– uniform content, size and structure– professional writers– Simple or standard vocabulary

Corporate intranet– Wildly varied content– Mix of good, bad, and ugly writers– Tower of Babel: Acronyms, special meanings

Page 9: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

The Answer is Cyborg

No one software has best of automatic Automatic Categorization is not Integration not Assimilation Human and Computer Integration Cyborg Integration and Content

Management, Search

Page 10: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Human - Computer Integration

Humans– Create top level taxonomy– Create rules, select training sets– Final Quality Control

Automatic– Provisional Categorization and Meta Data– Automatic Summarization

Combination– Integration of Rules, World Knowledge

Page 11: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Content Management & Search

Content Management– Distributed Work Flow: Central IA & local

authors– Collaborative Categorization– Taxonomic Publishing Model

Search– Support Browse and Seach– Real time clustering, categorizing– Collaborative filtering - by category

Page 12: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

Lessons Learned

Out of the Box, Out of Your Mind Play well with others Brain surgery is fun! World revolves around you Quality counts and size matters Let a Hundred flowers Bloom The End

Page 13: Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

The END

Really.